|
Alex Rovner
2012-01-06, 21:50
Alex Rovner
2012-01-07, 06:21
Daniel Dai
2012-01-07, 23:14
Alex Rovner
2012-01-09, 18:48
Daniel Dai
2012-01-09, 19:06
Aniket Mokashi
2012-01-10, 00:44
Prashant Kommireddi
2012-01-10, 00:58
Jonathan Coveney
2012-01-10, 01:07
Alex Rovner
2012-01-10, 05:10
Aniket Mokashi
2012-01-10, 06:54
Daniel Dai
2012-01-10, 09:22
Alex Rovner
2012-01-10, 13:56
|
-
getWrappedSplit() is incorrectly returning the first splitAlex Rovner 2012-01-06, 21:50
Ran into this today. Using trunk (0.11)
If you are using a custom loader and are trying to get input split information In prepareToRead(), getWrappedSplit() is providing the fist split instead of current. Checking the code confirms the suspicion: PigSplit.java: public InputSplit getWrappedSplit() { return wrappedSplits[0]; } Should be: public InputSplit getWrappedSplit() { return wrappedSplits[splitIndex]; } The side effect is that if you are trying to retrieve the current split when pig is using CombinedInputFormat it incorrectly always returns the first file in the list instead of the current one that its reading. I have also confirmed it by outputing a log statement in the prepareToRead(): @Override public void prepareToRead(@SuppressWarnings("rawtypes") RecordReader reader, PigSplit split) throws IOException { String path ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); partitions = getPartitions(table, path); log.info("Preparing to read: " + path); this.reader = reader; } 2012-01-06 16:27:24,165 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+6187085 2012-01-06 16:27:24,180 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library 2012-01-06 16:27:24,183 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 2dd49ec41018ba4141b20edf28dbb43c0c07f373] 2012-01-06 16:27:24,189 INFO com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to read: hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005 2012-01-06 16:27:28,053 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00006:0+6181475 2012-01-06 16:27:28,056 INFO com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to read: hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005 Notice how the pig is correctly reporting the split but my "info" statement is always reporting the first input split vs current. Bug? Jira? Patch? Thanks Alex R +
Alex Rovner 2012-01-06, 21:50
-
Re: getWrappedSplit() is incorrectly returning the first splitAlex Rovner 2012-01-07, 06:21
Additionally it looks like PigRecordReader is not incrementing the index in
the PigSplit when dealing with CombinedInputFormat thus the index will be incorrect in either case. On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <[EMAIL PROTECTED]> wrote: > Ran into this today. Using trunk (0.11) > > If you are using a custom loader and are trying to get input split > information In prepareToRead(), getWrappedSplit() is providing the fist > split instead of current. > > Checking the code confirms the suspicion: > > PigSplit.java: > > public InputSplit getWrappedSplit() { > return wrappedSplits[0]; > } > > Should be: > public InputSplit getWrappedSplit() { > return wrappedSplits[splitIndex]; > } > > > The side effect is that if you are trying to retrieve the current split > when pig is using CombinedInputFormat it incorrectly always returns the > first file in the list instead of the current one that its reading. I have > also confirmed it by outputing a log statement in the prepareToRead(): > > @Override > public void prepareToRead(@SuppressWarnings("rawtypes") RecordReader > reader, PigSplit split) > throws IOException { > String path > ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); > partitions = getPartitions(table, path); > log.info("Preparing to read: " + path); > this.reader = reader; > } > > 2012-01-06 16:27:24,165 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06 16:27:24,180 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library2012-01-06 16:27:24,183 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06 16:27:24,189 INFO com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to read: hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06 16:27:28,053 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: Current split being processed hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00006:0+61814752012-01-06 16:27:28,056 INFO com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to read: hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005 > > > Notice how the pig is correctly reporting the split but my "info" > statement is always reporting the first input split vs current. > > Bug? Jira? Patch? > > Thanks > Alex R > +
Alex Rovner 2012-01-07, 06:21
-
Re: getWrappedSplit() is incorrectly returning the first splitDaniel Dai 2012-01-07, 23:14
Sounds like a bug. I guess no one ever rely on specific split info before.
Please open a Jira. Daniel On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <[EMAIL PROTECTED]> wrote: > Additionally it looks like PigRecordReader is not incrementing the index in > the PigSplit when dealing with CombinedInputFormat thus the index will be > incorrect in either case. > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <[EMAIL PROTECTED]> wrote: > > > Ran into this today. Using trunk (0.11) > > > > If you are using a custom loader and are trying to get input split > > information In prepareToRead(), getWrappedSplit() is providing the fist > > split instead of current. > > > > Checking the code confirms the suspicion: > > > > PigSplit.java: > > > > public InputSplit getWrappedSplit() { > > return wrappedSplits[0]; > > } > > > > Should be: > > public InputSplit getWrappedSplit() { > > return wrappedSplits[splitIndex]; > > } > > > > > > The side effect is that if you are trying to retrieve the current split > > when pig is using CombinedInputFormat it incorrectly always returns the > > first file in the list instead of the current one that its reading. I > have > > also confirmed it by outputing a log statement in the prepareToRead(): > > > > @Override > > public void prepareToRead(@SuppressWarnings("rawtypes") RecordReader > > reader, PigSplit split) > > throws IOException { > > String path > > > ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); > > partitions = getPartitions(table, path); > > log.info("Preparing to read: " + path); > > this.reader = reader; > > } > > > > 2012-01-06 16:27:24,165 INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > Current split being processed > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06 > 16:27:24,180 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded > native gpl library2012-01-06 16:27:24,183 INFO > com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev > 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06 16:27:24,189 INFO > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to read: > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06 > 16:27:28,053 INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > Current split being processed > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00006:0+61814752012-01-06 > 16:27:28,056 INFO com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: > Preparing to read: > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005 > > > > > > Notice how the pig is correctly reporting the split but my "info" > > statement is always reporting the first input split vs current. > > > > Bug? Jira? Patch? > > > > Thanks > > Alex R > > > +
Daniel Dai 2012-01-07, 23:14
-
Re: getWrappedSplit() is incorrectly returning the first splitAlex Rovner 2012-01-09, 18:48
Jira opened.
I can attempt to submit a patch as this seems like a fairly straight forward fix. https://issues.apache.org/jira/browse/PIG-2462 Thanks Alex R On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: > Sounds like a bug. I guess no one ever rely on specific split info before. > Please open a Jira. > > Daniel > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <[EMAIL PROTECTED]> wrote: > > > Additionally it looks like PigRecordReader is not incrementing the index > in > > the PigSplit when dealing with CombinedInputFormat thus the index will be > > incorrect in either case. > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <[EMAIL PROTECTED]> > wrote: > > > > > Ran into this today. Using trunk (0.11) > > > > > > If you are using a custom loader and are trying to get input split > > > information In prepareToRead(), getWrappedSplit() is providing the fist > > > split instead of current. > > > > > > Checking the code confirms the suspicion: > > > > > > PigSplit.java: > > > > > > public InputSplit getWrappedSplit() { > > > return wrappedSplits[0]; > > > } > > > > > > Should be: > > > public InputSplit getWrappedSplit() { > > > return wrappedSplits[splitIndex]; > > > } > > > > > > > > > The side effect is that if you are trying to retrieve the current split > > > when pig is using CombinedInputFormat it incorrectly always returns the > > > first file in the list instead of the current one that its reading. I > > have > > > also confirmed it by outputing a log statement in the prepareToRead(): > > > > > > @Override > > > public void prepareToRead(@SuppressWarnings("rawtypes") > RecordReader > > > reader, PigSplit split) > > > throws IOException { > > > String path > > > > > > ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); > > > partitions = getPartitions(table, path); > > > log.info("Preparing to read: " + path); > > > this.reader = reader; > > > } > > > > > > 2012-01-06 16:27:24,165 INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > > Current split being processed > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06 > > 16:27:24,180 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded > > native gpl library2012-01-06 16:27:24,183 INFO > > com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized > > native-lzo library [hadoop-lzo rev > > 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06 16:27:24,189 INFO > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to read: > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06 > > 16:27:28,053 INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > > Current split being processed > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00006:0+61814752012-01-06 > > 16:27:28,056 INFO com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: > > Preparing to read: > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005 > > > > > > > > > Notice how the pig is correctly reporting the split but my "info" > > > statement is always reporting the first input split vs current. > > > > > > Bug? Jira? Patch? > > > > > > Thanks > > > Alex R > > > > > > +
Alex Rovner 2012-01-09, 18:48
-
Re: getWrappedSplit() is incorrectly returning the first splitDaniel Dai 2012-01-09, 19:06
Yes, please. Thanks!
On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[EMAIL PROTECTED]> wrote: > Jira opened. > > I can attempt to submit a patch as this seems like a fairly straight > forward fix. > > https://issues.apache.org/jira/browse/PIG-2462 > > > Thanks > Alex R > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: > > > Sounds like a bug. I guess no one ever rely on specific split info > before. > > Please open a Jira. > > > > Daniel > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <[EMAIL PROTECTED]> > wrote: > > > > > Additionally it looks like PigRecordReader is not incrementing the > index > > in > > > the PigSplit when dealing with CombinedInputFormat thus the index will > be > > > incorrect in either case. > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <[EMAIL PROTECTED]> > > wrote: > > > > > > > Ran into this today. Using trunk (0.11) > > > > > > > > If you are using a custom loader and are trying to get input split > > > > information In prepareToRead(), getWrappedSplit() is providing the > fist > > > > split instead of current. > > > > > > > > Checking the code confirms the suspicion: > > > > > > > > PigSplit.java: > > > > > > > > public InputSplit getWrappedSplit() { > > > > return wrappedSplits[0]; > > > > } > > > > > > > > Should be: > > > > public InputSplit getWrappedSplit() { > > > > return wrappedSplits[splitIndex]; > > > > } > > > > > > > > > > > > The side effect is that if you are trying to retrieve the current > split > > > > when pig is using CombinedInputFormat it incorrectly always returns > the > > > > first file in the list instead of the current one that its reading. I > > > have > > > > also confirmed it by outputing a log statement in the > prepareToRead(): > > > > > > > > @Override > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > RecordReader > > > > reader, PigSplit split) > > > > throws IOException { > > > > String path > > > > > > > > > > ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); > > > > partitions = getPartitions(table, path); > > > > log.info("Preparing to read: " + path); > > > > this.reader = reader; > > > > } > > > > > > > > 2012-01-06 16:27:24,165 INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > > > Current split being processed > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06 > > > 16:27:24,180 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: > Loaded > > > native gpl library2012-01-06 16:27:24,183 INFO > > > com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized > > > native-lzo library [hadoop-lzo rev > > > 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06 16:27:24,189 INFO > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to > read: > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06 > > > 16:27:28,053 INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > > > Current split being processed > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00006:0+61814752012-01-06 > > > 16:27:28,056 INFO com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: > > > Preparing to read: > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005 > > > > > > > > > > > > Notice how the pig is correctly reporting the split but my "info" > > > > statement is always reporting the first input split vs current. > > > > > > > > Bug? Jira? Patch? > > > > > > > > Thanks > > > > Alex R > > > > > > > > > > +
Daniel Dai 2012-01-09, 19:06
-
Re: getWrappedSplit() is incorrectly returning the first splitAniket Mokashi 2012-01-10, 00:44
Thanks so much for finding this out.
I was using @Override public void prepareToRead(@SuppressWarnings("rawtypes") RecordReaderreader, PigSplit split) throws IOException { this.in = reader; partValues ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); in my loader that behaves like hcatalog for delimited text in hive. That returns me same partvalues for all the values. I hacked it with something else. But, I think I must have hit this case. I will confirm. Thanks again for reporting this. Thanks, Aniket On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[EMAIL PROTECTED]> wrote: > Yes, please. Thanks! > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[EMAIL PROTECTED]> wrote: > > > Jira opened. > > > > I can attempt to submit a patch as this seems like a fairly straight > > forward fix. > > > > https://issues.apache.org/jira/browse/PIG-2462 > > > > > > Thanks > > Alex R > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[EMAIL PROTECTED]> > wrote: > > > > > Sounds like a bug. I guess no one ever rely on specific split info > > before. > > > Please open a Jira. > > > > > > Daniel > > > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <[EMAIL PROTECTED]> > > wrote: > > > > > > > Additionally it looks like PigRecordReader is not incrementing the > > index > > > in > > > > the PigSplit when dealing with CombinedInputFormat thus the index > will > > be > > > > incorrect in either case. > > > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Ran into this today. Using trunk (0.11) > > > > > > > > > > If you are using a custom loader and are trying to get input split > > > > > information In prepareToRead(), getWrappedSplit() is providing the > > fist > > > > > split instead of current. > > > > > > > > > > Checking the code confirms the suspicion: > > > > > > > > > > PigSplit.java: > > > > > > > > > > public InputSplit getWrappedSplit() { > > > > > return wrappedSplits[0]; > > > > > } > > > > > > > > > > Should be: > > > > > public InputSplit getWrappedSplit() { > > > > > return wrappedSplits[splitIndex]; > > > > > } > > > > > > > > > > > > > > > The side effect is that if you are trying to retrieve the current > > split > > > > > when pig is using CombinedInputFormat it incorrectly always returns > > the > > > > > first file in the list instead of the current one that its > reading. I > > > > have > > > > > also confirmed it by outputing a log statement in the > > prepareToRead(): > > > > > > > > > > @Override > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > RecordReader > > > > > reader, PigSplit split) > > > > > throws IOException { > > > > > String path > > > > > > > > > > > > > > > ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); > > > > > partitions = getPartitions(table, path); > > > > > log.info("Preparing to read: " + path); > > > > > this.reader = reader; > > > > > } > > > > > > > > > > 2012-01-06 16:27:24,165 INFO > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > > > > Current split being processed > > > > > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06 > > > > 16:27:24,180 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: > > Loaded > > > > native gpl library2012-01-06 16:27:24,183 INFO > > > > com.hadoop.compression.lzo.LzoCodec: Successfully loaded & > initialized > > > > native-lzo library [hadoop-lzo rev > > > > 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06 16:27:24,189 INFO > > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing to > > read: > > > > > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06 > > > > 16:27:28,053 INFO > > > > "...:::Aniket:::... Quetzalco@tl" +
Aniket Mokashi 2012-01-10, 00:44
-
Re: getWrappedSplit() is incorrectly returning the first splitPrashant Kommireddi 2012-01-10, 00:58
Is this critical enough to make it back into 0.9.1?
-Prashant On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <[EMAIL PROTECTED]> wrote: > Thanks so much for finding this out. > > I was using > > @Override > > public void prepareToRead(@SuppressWarnings("rawtypes") > RecordReaderreader, PigSplit split) > > throws IOException { > > this.in = reader; > > partValues > > ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); > > > in my loader that behaves like hcatalog for delimited text in hive. That > returns me same partvalues for all the values. I hacked it with something > else. But, I think I must have hit this case. I will confirm. Thanks again > for reporting this. > > Thanks, > > Aniket > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[EMAIL PROTECTED]> wrote: > > > Yes, please. Thanks! > > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[EMAIL PROTECTED]> > wrote: > > > > > Jira opened. > > > > > > I can attempt to submit a patch as this seems like a fairly straight > > > forward fix. > > > > > > https://issues.apache.org/jira/browse/PIG-2462 > > > > > > > > > Thanks > > > Alex R > > > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[EMAIL PROTECTED]> > > wrote: > > > > > > > Sounds like a bug. I guess no one ever rely on specific split info > > > before. > > > > Please open a Jira. > > > > > > > > Daniel > > > > > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Additionally it looks like PigRecordReader is not incrementing the > > > index > > > > in > > > > > the PigSplit when dealing with CombinedInputFormat thus the index > > will > > > be > > > > > incorrect in either case. > > > > > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > > > Ran into this today. Using trunk (0.11) > > > > > > > > > > > > If you are using a custom loader and are trying to get input > split > > > > > > information In prepareToRead(), getWrappedSplit() is providing > the > > > fist > > > > > > split instead of current. > > > > > > > > > > > > Checking the code confirms the suspicion: > > > > > > > > > > > > PigSplit.java: > > > > > > > > > > > > public InputSplit getWrappedSplit() { > > > > > > return wrappedSplits[0]; > > > > > > } > > > > > > > > > > > > Should be: > > > > > > public InputSplit getWrappedSplit() { > > > > > > return wrappedSplits[splitIndex]; > > > > > > } > > > > > > > > > > > > > > > > > > The side effect is that if you are trying to retrieve the current > > > split > > > > > > when pig is using CombinedInputFormat it incorrectly always > returns > > > the > > > > > > first file in the list instead of the current one that its > > reading. I > > > > > have > > > > > > also confirmed it by outputing a log statement in the > > > prepareToRead(): > > > > > > > > > > > > @Override > > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > > RecordReader > > > > > > reader, PigSplit split) > > > > > > throws IOException { > > > > > > String path > > > > > > > > > > > > > > > > > > > > > ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); > > > > > > partitions = getPartitions(table, path); > > > > > > log.info("Preparing to read: " + path); > > > > > > this.reader = reader; > > > > > > } > > > > > > > > > > > > 2012-01-06 16:27:24,165 INFO > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > > > > > Current split being processed > > > > > > > > > > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06 > > > > > 16:27:24,180 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: > > > Loaded > > > > > native gpl library2012-01-06 16:27:24,183 INFO > > > > > com.hadoop.compression.lzo.LzoCodec: Successfully loaded & +
Prashant Kommireddi 2012-01-10, 00:58
-
Re: getWrappedSplit() is incorrectly returning the first splitJonathan Coveney 2012-01-10, 01:07
If it is affecting production jobs, I see no reason why we can't put the
fix into 0.9.2, though I sense that a vote will be coming soon for a 0.9.2 release, so a fix would have to come soon..the issues running the tests brought up in Bill's thread will have to be fixed before we can, though. I have a patch that's completely stopped because I can develop any new tests, and so on. 2012/1/9 Prashant Kommireddi <[EMAIL PROTECTED]> > Is this critical enough to make it back into 0.9.1? > > -Prashant > > On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <[EMAIL PROTECTED]> > wrote: > > > Thanks so much for finding this out. > > > > I was using > > > > @Override > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > RecordReaderreader, PigSplit split) > > > > throws IOException { > > > > this.in = reader; > > > > partValues > > > > > ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); > > > > > > in my loader that behaves like hcatalog for delimited text in hive. That > > returns me same partvalues for all the values. I hacked it with something > > else. But, I think I must have hit this case. I will confirm. Thanks > again > > for reporting this. > > > > Thanks, > > > > Aniket > > > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[EMAIL PROTECTED]> > wrote: > > > > > Yes, please. Thanks! > > > > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[EMAIL PROTECTED]> > > wrote: > > > > > > > Jira opened. > > > > > > > > I can attempt to submit a patch as this seems like a fairly straight > > > > forward fix. > > > > > > > > https://issues.apache.org/jira/browse/PIG-2462 > > > > > > > > > > > > Thanks > > > > Alex R > > > > > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Sounds like a bug. I guess no one ever rely on specific split info > > > > before. > > > > > Please open a Jira. > > > > > > > > > > Daniel > > > > > > > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > > > Additionally it looks like PigRecordReader is not incrementing > the > > > > index > > > > > in > > > > > > the PigSplit when dealing with CombinedInputFormat thus the index > > > will > > > > be > > > > > > incorrect in either case. > > > > > > > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > > > Ran into this today. Using trunk (0.11) > > > > > > > > > > > > > > If you are using a custom loader and are trying to get input > > split > > > > > > > information In prepareToRead(), getWrappedSplit() is providing > > the > > > > fist > > > > > > > split instead of current. > > > > > > > > > > > > > > Checking the code confirms the suspicion: > > > > > > > > > > > > > > PigSplit.java: > > > > > > > > > > > > > > public InputSplit getWrappedSplit() { > > > > > > > return wrappedSplits[0]; > > > > > > > } > > > > > > > > > > > > > > Should be: > > > > > > > public InputSplit getWrappedSplit() { > > > > > > > return wrappedSplits[splitIndex]; > > > > > > > } > > > > > > > > > > > > > > > > > > > > > The side effect is that if you are trying to retrieve the > current > > > > split > > > > > > > when pig is using CombinedInputFormat it incorrectly always > > returns > > > > the > > > > > > > first file in the list instead of the current one that its > > > reading. I > > > > > > have > > > > > > > also confirmed it by outputing a log statement in the > > > > prepareToRead(): > > > > > > > > > > > > > > @Override > > > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > > > RecordReader > > > > > > > reader, PigSplit split) > > > > > > > throws IOException { > > > > > > > String path > > > > > > > > > > > > > > > > > > > > > > > > > > > > ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); > > > > > > > partitions = getPartitions(table, path); +
Jonathan Coveney 2012-01-10, 01:07
-
Re: getWrappedSplit() is incorrectly returning the first splitAlex Rovner 2012-01-10, 05:10
I have already created the patch and tested with some of my jobs. I ran
into unit tests failure issues though as well. I can attach the patch to Jira tomorrow anyways to be applied once things are straightened out. Alex R On Mon, Jan 9, 2012 at 8:07 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote: > If it is affecting production jobs, I see no reason why we can't put the > fix into 0.9.2, though I sense that a vote will be coming soon for a 0.9.2 > release, so a fix would have to come soon..the issues running the tests > brought up in Bill's thread will have to be fixed before we can, though. I > have a patch that's completely stopped because I can develop any new tests, > and so on. > > 2012/1/9 Prashant Kommireddi <[EMAIL PROTECTED]> > > > Is this critical enough to make it back into 0.9.1? > > > > -Prashant > > > > On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <[EMAIL PROTECTED]> > > wrote: > > > > > Thanks so much for finding this out. > > > > > > I was using > > > > > > @Override > > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > RecordReaderreader, PigSplit split) > > > > > > throws IOException { > > > > > > this.in = reader; > > > > > > partValues > > > > > > > > > ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); > > > > > > > > > in my loader that behaves like hcatalog for delimited text in hive. > That > > > returns me same partvalues for all the values. I hacked it with > something > > > else. But, I think I must have hit this case. I will confirm. Thanks > > again > > > for reporting this. > > > > > > Thanks, > > > > > > Aniket > > > > > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[EMAIL PROTECTED]> > > wrote: > > > > > > > Yes, please. Thanks! > > > > > > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Jira opened. > > > > > > > > > > I can attempt to submit a patch as this seems like a fairly > straight > > > > > forward fix. > > > > > > > > > > https://issues.apache.org/jira/browse/PIG-2462 > > > > > > > > > > > > > > > Thanks > > > > > Alex R > > > > > > > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > > > Sounds like a bug. I guess no one ever rely on specific split > info > > > > > before. > > > > > > Please open a Jira. > > > > > > > > > > > > Daniel > > > > > > > > > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner < > [EMAIL PROTECTED] > > > > > > > > wrote: > > > > > > > > > > > > > Additionally it looks like PigRecordReader is not incrementing > > the > > > > > index > > > > > > in > > > > > > > the PigSplit when dealing with CombinedInputFormat thus the > index > > > > will > > > > > be > > > > > > > incorrect in either case. > > > > > > > > > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > > > > > > > > > Ran into this today. Using trunk (0.11) > > > > > > > > > > > > > > > > If you are using a custom loader and are trying to get input > > > split > > > > > > > > information In prepareToRead(), getWrappedSplit() is > providing > > > the > > > > > fist > > > > > > > > split instead of current. > > > > > > > > > > > > > > > > Checking the code confirms the suspicion: > > > > > > > > > > > > > > > > PigSplit.java: > > > > > > > > > > > > > > > > public InputSplit getWrappedSplit() { > > > > > > > > return wrappedSplits[0]; > > > > > > > > } > > > > > > > > > > > > > > > > Should be: > > > > > > > > public InputSplit getWrappedSplit() { > > > > > > > > return wrappedSplits[splitIndex]; > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > The side effect is that if you are trying to retrieve the > > current > > > > > split > > > > > > > > when pig is using CombinedInputFormat it incorrectly always > > > returns > > > > > the > > > > > > > > first file in the list instead of the current one that its > > > > reading. I +
Alex Rovner 2012-01-10, 05:10
-
Re: getWrappedSplit() is incorrectly returning the first splitAniket Mokashi 2012-01-10, 06:54
The change was added as part of PIG-1518. It has release notes-
"This change will not cause any backward compatibility issue except if a loader implementation makes use of the PigSplit object passed through the prepareToRead method where a rebuild of the loader might be necessary as PigSplit's definition has been modified. However, currently we know of no external use of the object. This change also requires the loader to be stateless across the invocations to the prepareToRead method. That is, the method should reset any internal states that are not affected by the RecordReader argument. Otherwise, this feature should be disabled. It looks like returning 0th split was done deliberately. Comments? Thanks, Aniket On Mon, Jan 9, 2012 at 9:10 PM, Alex Rovner <[EMAIL PROTECTED]> wrote: > I have already created the patch and tested with some of my jobs. I ran > into unit tests failure issues though as well. I can attach the patch to > Jira tomorrow anyways to be applied once things are straightened out. > > Alex R > > On Mon, Jan 9, 2012 at 8:07 PM, Jonathan Coveney <[EMAIL PROTECTED]> > wrote: > > > If it is affecting production jobs, I see no reason why we can't put the > > fix into 0.9.2, though I sense that a vote will be coming soon for a > 0.9.2 > > release, so a fix would have to come soon..the issues running the tests > > brought up in Bill's thread will have to be fixed before we can, though. > I > > have a patch that's completely stopped because I can develop any new > tests, > > and so on. > > > > 2012/1/9 Prashant Kommireddi <[EMAIL PROTECTED]> > > > > > Is this critical enough to make it back into 0.9.1? > > > > > > -Prashant > > > > > > On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Thanks so much for finding this out. > > > > > > > > I was using > > > > > > > > @Override > > > > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > > RecordReaderreader, PigSplit split) > > > > > > > > throws IOException { > > > > > > > > this.in = reader; > > > > > > > > partValues > > > > > > > > > > > > > > ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); > > > > > > > > > > > > in my loader that behaves like hcatalog for delimited text in hive. > > That > > > > returns me same partvalues for all the values. I hacked it with > > something > > > > else. But, I think I must have hit this case. I will confirm. Thanks > > > again > > > > for reporting this. > > > > > > > > Thanks, > > > > > > > > Aniket > > > > > > > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Yes, please. Thanks! > > > > > > > > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > > > Jira opened. > > > > > > > > > > > > I can attempt to submit a patch as this seems like a fairly > > straight > > > > > > forward fix. > > > > > > > > > > > > https://issues.apache.org/jira/browse/PIG-2462 > > > > > > > > > > > > > > > > > > Thanks > > > > > > Alex R > > > > > > > > > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > > > Sounds like a bug. I guess no one ever rely on specific split > > info > > > > > > before. > > > > > > > Please open a Jira. > > > > > > > > > > > > > > Daniel > > > > > > > > > > > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner < > > [EMAIL PROTECTED] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Additionally it looks like PigRecordReader is not > incrementing > > > the > > > > > > index > > > > > > > in > > > > > > > > the PigSplit when dealing with CombinedInputFormat thus the > > index > > > > > will > > > > > > be > > > > > > > > incorrect in either case. > > > > > > > > > > > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner < > > > [EMAIL PROTECTED]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Ran into this today. Using trunk (0.11) "...:::Aniket:::... Quetzalco@tl" +
Aniket Mokashi 2012-01-10, 06:54
-
Re: getWrappedSplit() is incorrectly returning the first splitDaniel Dai 2012-01-10, 09:22
Thanks Aniket to point out 1518.
I don't totally get the meaning of it, but after 1518, LoadFunc.prepareToRead will be invoked several times, each time on a different split. If one makes assumption that prepareToRead only be called once on each LoadFunc, it might become wrong. 1518 also changes the definition of PigSplit. If LoadFunc.prepareToRead makes use of PigSplit, it might become wrong. Daniel On Mon, Jan 9, 2012 at 10:54 PM, Aniket Mokashi <[EMAIL PROTECTED]> wrote: > The change was added as part of PIG-1518. It has release notes- > > "This change will not cause any backward compatibility issue except if a > loader implementation makes use of the PigSplit object passed through the > prepareToRead method where a rebuild of the loader might be necessary as > PigSplit's definition has been modified. However, currently we know of no > external use of the object. > > This change also requires the loader to be stateless across the invocations > to the prepareToRead method. That is, the method should reset any internal > states that are not affected by the RecordReader argument. > Otherwise, this feature should be disabled. > > It looks like returning 0th split was done deliberately. Comments? > > Thanks, > Aniket > > On Mon, Jan 9, 2012 at 9:10 PM, Alex Rovner <[EMAIL PROTECTED]> wrote: > > > I have already created the patch and tested with some of my jobs. I ran > > into unit tests failure issues though as well. I can attach the patch to > > Jira tomorrow anyways to be applied once things are straightened out. > > > > Alex R > > > > On Mon, Jan 9, 2012 at 8:07 PM, Jonathan Coveney <[EMAIL PROTECTED]> > > wrote: > > > > > If it is affecting production jobs, I see no reason why we can't put > the > > > fix into 0.9.2, though I sense that a vote will be coming soon for a > > 0.9.2 > > > release, so a fix would have to come soon..the issues running the tests > > > brought up in Bill's thread will have to be fixed before we can, > though. > > I > > > have a patch that's completely stopped because I can develop any new > > tests, > > > and so on. > > > > > > 2012/1/9 Prashant Kommireddi <[EMAIL PROTECTED]> > > > > > > > Is this critical enough to make it back into 0.9.1? > > > > > > > > -Prashant > > > > > > > > On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > Thanks so much for finding this out. > > > > > > > > > > I was using > > > > > > > > > > @Override > > > > > > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > > > RecordReaderreader, PigSplit split) > > > > > > > > > > throws IOException { > > > > > > > > > > this.in = reader; > > > > > > > > > > partValues > > > > > > > > > > > > > > > > > > > > ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); > > > > > > > > > > > > > > > in my loader that behaves like hcatalog for delimited text in hive. > > > That > > > > > returns me same partvalues for all the values. I hacked it with > > > something > > > > > else. But, I think I must have hit this case. I will confirm. > Thanks > > > > again > > > > > for reporting this. > > > > > > > > > > Thanks, > > > > > > > > > > Aniket > > > > > > > > > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > > > Yes, please. Thanks! > > > > > > > > > > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner < > [EMAIL PROTECTED] > > > > > > > > wrote: > > > > > > > > > > > > > Jira opened. > > > > > > > > > > > > > > I can attempt to submit a patch as this seems like a fairly > > > straight > > > > > > > forward fix. > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/PIG-2462 > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > Alex R > > > > > > > > > > > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > > > > > > > > > Sounds like a bug. I guess no one ever rely on specific split > > > info > > > > > > > before. +
Daniel Dai 2012-01-10, 09:22
-
Re: getWrappedSplit() is incorrectly returning the first splitAlex Rovner 2012-01-10, 13:56
Aniket,
Thanks for pointing out the related Jira item. I do not see any harm returning the correct splitIndex to the user instead of [0]. In case the combined input is disabled it will always be 0. In case it's enabled it should return the correct index(which it does not currently but will do so once my patch is applied). I have attached the patch to the Jira. I would appreciate it if some one can apply it and run the full suite of tests on it. I have ran as many tests as I can on my end. Thanks Alex On Tue, Jan 10, 2012 at 1:54 AM, Aniket Mokashi <[EMAIL PROTECTED]> wrote: > The change was added as part of PIG-1518. It has release notes- > > "This change will not cause any backward compatibility issue except if a > loader implementation makes use of the PigSplit object passed through the > prepareToRead method where a rebuild of the loader might be necessary as > PigSplit's definition has been modified. However, currently we know of no > external use of the object. > > This change also requires the loader to be stateless across the invocations > to the prepareToRead method. That is, the method should reset any internal > states that are not affected by the RecordReader argument. > Otherwise, this feature should be disabled. > > It looks like returning 0th split was done deliberately. Comments? > > Thanks, > Aniket > > On Mon, Jan 9, 2012 at 9:10 PM, Alex Rovner <[EMAIL PROTECTED]> wrote: > > > I have already created the patch and tested with some of my jobs. I ran > > into unit tests failure issues though as well. I can attach the patch to > > Jira tomorrow anyways to be applied once things are straightened out. > > > > Alex R > > > > On Mon, Jan 9, 2012 at 8:07 PM, Jonathan Coveney <[EMAIL PROTECTED]> > > wrote: > > > > > If it is affecting production jobs, I see no reason why we can't put > the > > > fix into 0.9.2, though I sense that a vote will be coming soon for a > > 0.9.2 > > > release, so a fix would have to come soon..the issues running the tests > > > brought up in Bill's thread will have to be fixed before we can, > though. > > I > > > have a patch that's completely stopped because I can develop any new > > tests, > > > and so on. > > > > > > 2012/1/9 Prashant Kommireddi <[EMAIL PROTECTED]> > > > > > > > Is this critical enough to make it back into 0.9.1? > > > > > > > > -Prashant > > > > > > > > On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > Thanks so much for finding this out. > > > > > > > > > > I was using > > > > > > > > > > @Override > > > > > > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > > > RecordReaderreader, PigSplit split) > > > > > > > > > > throws IOException { > > > > > > > > > > this.in = reader; > > > > > > > > > > partValues > > > > > > > > > > > > > > > > > > > > ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); > > > > > > > > > > > > > > > in my loader that behaves like hcatalog for delimited text in hive. > > > That > > > > > returns me same partvalues for all the values. I hacked it with > > > something > > > > > else. But, I think I must have hit this case. I will confirm. > Thanks > > > > again > > > > > for reporting this. > > > > > > > > > > Thanks, > > > > > > > > > > Aniket > > > > > > > > > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[EMAIL PROTECTED] > > > > > > wrote: > > > > > > > > > > > Yes, please. Thanks! > > > > > > > > > > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner < > [EMAIL PROTECTED] > > > > > > > > wrote: > > > > > > > > > > > > > Jira opened. > > > > > > > > > > > > > > I can attempt to submit a patch as this seems like a fairly > > > straight > > > > > > > forward fix. > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/PIG-2462 > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > Alex R > > > > > > > > > > > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > +
Alex Rovner 2012-01-10, 13:56
|