Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> getWrappedSplit() is incorrectly returning the first split


Copy link to this message
-
Re: getWrappedSplit() is incorrectly returning the first split
Thanks so much for finding this out.

I was using

@Override

public void prepareToRead(@SuppressWarnings("rawtypes")
RecordReaderreader, PigSplit split)

 throws IOException {

 this.in = reader;

 partValues ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues();
in my loader that behaves like hcatalog for delimited text in hive. That
returns me same partvalues for all the values. I hacked it with something
else. But, I think I must have hit this case. I will confirm. Thanks again
for reporting this.

Thanks,

Aniket

On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[EMAIL PROTECTED]> wrote:

> Yes, please. Thanks!
>
> On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[EMAIL PROTECTED]> wrote:
>
> > Jira opened.
> >
> > I can attempt to submit a patch as this seems like a fairly straight
> > forward fix.
> >
> > https://issues.apache.org/jira/browse/PIG-2462
> >
> >
> > Thanks
> > Alex R
> >
> > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[EMAIL PROTECTED]>
> wrote:
> >
> > > Sounds like a bug. I guess no one ever rely on specific split info
> > before.
> > > Please open a Jira.
> > >
> > > Daniel
> > >
> > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Additionally it looks like PigRecordReader is not incrementing the
> > index
> > > in
> > > > the PigSplit when dealing with CombinedInputFormat thus the index
> will
> > be
> > > > incorrect in either case.
> > > >
> > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > Ran into this today. Using trunk (0.11)
> > > > >
> > > > > If you are using a custom loader and are trying to get input split
> > > > > information In prepareToRead(), getWrappedSplit() is providing the
> > fist
> > > > > split instead of current.
> > > > >
> > > > > Checking the code confirms the suspicion:
> > > > >
> > > > > PigSplit.java:
> > > > >
> > > > >     public InputSplit getWrappedSplit() {
> > > > >         return wrappedSplits[0];
> > > > >     }
> > > > >
> > > > > Should be:
> > > > >     public InputSplit getWrappedSplit() {
> > > > >         return wrappedSplits[splitIndex];
> > > > >     }
> > > > >
> > > > >
> > > > > The side effect is that if you are trying to retrieve the current
> > split
> > > > > when pig is using CombinedInputFormat it incorrectly always returns
> > the
> > > > > first file in the list instead of the current one that its
> reading. I
> > > > have
> > > > > also confirmed it by outputing a log statement in the
> > prepareToRead():
> > > > >
> > > > >     @Override
> > > > >     public void prepareToRead(@SuppressWarnings("rawtypes")
> > > RecordReader
> > > > > reader, PigSplit split)
> > > > >             throws IOException {
> > > > >         String path > > > > >
> > > >
> > >
> >
> ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString();
> > > > >         partitions = getPartitions(table, path);
> > > > >         log.info("Preparing to read: " + path);
> > > > >         this.reader = reader;
> > > > >     }
> > > > >
> > > > > 2012-01-06 16:27:24,165 INFO
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
> > > > Current split being processed
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06
> > > > 16:27:24,180 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader:
> > Loaded
> > > > native gpl library2012-01-06 16:27:24,183 INFO
> > > > com.hadoop.compression.lzo.LzoCodec: Successfully loaded &
> initialized
> > > > native-lzo library [hadoop-lzo rev
> > > > 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06 16:27:24,189 INFO
> > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to
> > read:
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06
> > > > 16:27:28,053 INFO
> > > >

"...:::Aniket:::... Quetzalco@tl"
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB