|
yaboulna@...
2012-12-11, 03:31
Bill Graham
2012-12-11, 07:27
yaboulna@...
2012-12-11, 17:37
Bill Graham
2012-12-12, 06:37
yaboulna@...
2012-12-12, 16:04
Bill Graham
2012-12-13, 07:07
|
-
Changing the schema before Storingyaboulna@... 2012-12-11, 03:31
Hello,
I'm using HBaseStorage and I want to change the layout of the schema before storage. Specifically I want to group some values into a tuple (thus reducing the number of repetitions of the row and column keys). Even though the JavaDoc gives an example that uses AS schema Grunt complains that it is not parsable. Here's what I am trying: STORE dataToStore INTO 'hbase://tableName' USING HBaseStorage('cf:tuple, cf:date') AS TOTUPLE(val1, val2, val3), date; Is this possible? Or do I have to do the transformation in a separate step: dataTransformed = FOREACH dataToStore GENERATE TOTUPLE(val1, val2, val3), date; In case of the latter, can Pig be told to merge this step with the next one? I tried a nested FOREACH where I can have an assignment operation, but I quickly found out that STORE is not supported within the FOREACH.. what was I thinking :). Thanks! -- Younos +
yaboulna@... 2012-12-11, 03:31
-
Re: Changing the schema before StoringBill Graham 2012-12-11, 07:27
The STORE command doesn't take the AS clause, that's to define the schema
at LOAD time. When storing, just prepare your relation with the the desired schema and then STORE it without the AS. You can do all the transformations you need to before the STORE and Pig will combine them all into as few logical processing steps as possible, so no need to worry about specifying many transformation statements. On Mon, Dec 10, 2012 at 7:31 PM, <[EMAIL PROTECTED]> wrote: > Hello, > > I'm using HBaseStorage and I want to change the layout of the schema > before storage. Specifically I want to group some values into a tuple (thus > reducing the number of repetitions of the row and column keys). > > Even though the JavaDoc gives an example that uses AS schema Grunt > complains that it is not parsable. Here's what I am trying: > > STORE dataToStore INTO 'hbase://tableName' USING HBaseStorage('cf:tuple, > cf:date') AS TOTUPLE(val1, val2, val3), date; > > Is this possible? Or do I have to do the transformation in a separate step: > > dataTransformed = FOREACH dataToStore GENERATE TOTUPLE(val1, val2, val3), > date; > > In case of the latter, can Pig be told to merge this step with the next > one? I tried a nested FOREACH where I can have an assignment operation, but > I quickly found out that STORE is not supported within the FOREACH.. what > was I thinking :). > > Thanks! > > -- Younos > > > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.* +
Bill Graham 2012-12-11, 07:27
-
Re: Changing the schema before Storingyaboulna@... 2012-12-11, 17:37
Hi Bill,
Thanks for your reply. Since this is the case then JavaDocs of the class needs to be fixed (see http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html). Also, I faced a bug that I worked around by explicit casting. For some reason all the objects passed to putNext are of type DataByteArray, while the schema reports their correct types (tuple(string, int, int), long). This causes a lot of ClassCastExceptions because DataByteArray cannot be cast to any other type. I worked around this by passing everything to the STORE as a DataByteArray. Cheers! Younos Quoting Bill Graham <[EMAIL PROTECTED]>: > The STORE command doesn't take the AS clause, that's to define the schema > at LOAD time. When storing, just prepare your relation with the the desired > schema and then STORE it without the AS. > > You can do all the transformations you need to before the STORE and Pig > will combine them all into as few logical processing steps as possible, so > no need to worry about specifying many transformation statements. > > > On Mon, Dec 10, 2012 at 7:31 PM, <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I'm using HBaseStorage and I want to change the layout of the schema >> before storage. Specifically I want to group some values into a tuple (thus >> reducing the number of repetitions of the row and column keys). >> >> Even though the JavaDoc gives an example that uses AS schema Grunt >> complains that it is not parsable. Here's what I am trying: >> >> STORE dataToStore INTO 'hbase://tableName' USING HBaseStorage('cf:tuple, >> cf:date') AS TOTUPLE(val1, val2, val3), date; >> >> Is this possible? Or do I have to do the transformation in a separate step: >> >> dataTransformed = FOREACH dataToStore GENERATE TOTUPLE(val1, val2, val3), >> date; >> >> In case of the latter, can Pig be told to merge this step with the next >> one? I tried a nested FOREACH where I can have an assignment operation, but >> I quickly found out that STORE is not supported within the FOREACH.. what >> was I thinking :). >> >> Thanks! >> >> -- Younos >> >> >> >> > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me at > [EMAIL PROTECTED] going forward.* > Best regards, Younos Aboulnaga Masters candidate David Cheriton school of computer science University of Waterloo http://cs.uwaterloo.ca E-Mail: [EMAIL PROTECTED] Mobile: +1 (519) 497-5669 +
yaboulna@... 2012-12-11, 17:37
-
Re: Changing the schema before StoringBill Graham 2012-12-12, 06:37
Thanks Younos for catching that and sorry that you got bit by it. That is
in fact a javadoc bug. I've just opened a JIRA for it: https://issues.apache.org/jira/browse/PIG-3092 http://pig.apache.org/docs/r0.10.0/basic.html#store Regarding the casting, what does describe look like of the relation you pass to the STORE statement and what do you class cast exceptions look like? Which caster are you using? The relation you pass to STORE should be a flat relation of values, unless you want to store the toString of a tuple as a single column in HBase. On Tue, Dec 11, 2012 at 9:37 AM, <[EMAIL PROTECTED]> wrote: > Hi Bill, > > Thanks for your reply. Since this is the case then JavaDocs of the class > needs to be fixed (see http://pig.apache.org/docs/r0.** > 10.0/api/org/apache/pig/**backend/hadoop/hbase/**HBaseStorage.html<http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html> > ). > > Also, I faced a bug that I worked around by explicit casting. For some > reason all the objects passed to putNext are of type DataByteArray, while > the schema reports their correct types (tuple(string, int, int), long). > This causes a lot of ClassCastExceptions because DataByteArray cannot be > cast to any other type. I worked around this by passing everything to the > STORE as a DataByteArray. > > Cheers! > Younos > > Quoting Bill Graham <[EMAIL PROTECTED]>: > > The STORE command doesn't take the AS clause, that's to define the schema >> at LOAD time. When storing, just prepare your relation with the the >> desired >> schema and then STORE it without the AS. >> >> You can do all the transformations you need to before the STORE and Pig >> will combine them all into as few logical processing steps as possible, so >> no need to worry about specifying many transformation statements. >> >> >> On Mon, Dec 10, 2012 at 7:31 PM, <[EMAIL PROTECTED]> wrote: >> >> Hello, >>> >>> I'm using HBaseStorage and I want to change the layout of the schema >>> before storage. Specifically I want to group some values into a tuple >>> (thus >>> reducing the number of repetitions of the row and column keys). >>> >>> Even though the JavaDoc gives an example that uses AS schema Grunt >>> complains that it is not parsable. Here's what I am trying: >>> >>> STORE dataToStore INTO 'hbase://tableName' USING HBaseStorage('cf:tuple, >>> cf:date') AS TOTUPLE(val1, val2, val3), date; >>> >>> Is this possible? Or do I have to do the transformation in a separate >>> step: >>> >>> dataTransformed = FOREACH dataToStore GENERATE TOTUPLE(val1, val2, val3), >>> date; >>> >>> In case of the latter, can Pig be told to merge this step with the next >>> one? I tried a nested FOREACH where I can have an assignment operation, >>> but >>> I quickly found out that STORE is not supported within the FOREACH.. what >>> was I thinking :). >>> >>> Thanks! >>> >>> -- Younos >>> >>> >>> >>> >>> >> >> -- >> *Note that I'm no longer using my Yahoo! email address. Please email me at >> [EMAIL PROTECTED] going forward.* >> >> > > > Best regards, > Younos Aboulnaga > > Masters candidate > David Cheriton school of computer science > University of Waterloo > http://cs.uwaterloo.ca > > E-Mail: [EMAIL PROTECTED] > Mobile: +1 (519) 497-5669 > > > > +
Bill Graham 2012-12-12, 06:37
-
Re: Changing the schema before Storingyaboulna@... 2012-12-12, 16:04
Hello Bill,
The bug didn't block me or waste any time. Regarding the cast, I can't regenerate the bug right now because I'm running a script, but I can answer your questions: 1) describe of the relation passed to store returns the generated schema name for the tuple, as described in: http://bb10.com/java-hadoop-pig-devel/2011-07/msg00237.html 2) I want to store all the values as a tuple under one key because I want to minimize the repetitions of the row and column keys. I didn't specify the caster, so I'm using the default whatever it is (I hope it is the binary one not the UTF8 one) 3) The class cast exception says that DataByteArray cannot be cast to Tuple Regards! -- Younos Quoting Bill Graham <[EMAIL PROTECTED]>: > Thanks Younos for catching that and sorry that you got bit by it. That is > in fact a javadoc bug. I've just opened a JIRA for it: > > https://issues.apache.org/jira/browse/PIG-3092 > http://pig.apache.org/docs/r0.10.0/basic.html#store > > Regarding the casting, what does describe look like of the relation you > pass to the STORE statement and what do you class cast exceptions look > like? Which caster are you using? > > The relation you pass to STORE should be a flat relation of values, unless > you want to store the toString of a tuple as a single column in HBase. > > > On Tue, Dec 11, 2012 at 9:37 AM, <[EMAIL PROTECTED]> wrote: > >> Hi Bill, >> >> Thanks for your reply. Since this is the case then JavaDocs of the class >> needs to be fixed (see http://pig.apache.org/docs/r0.** >> 10.0/api/org/apache/pig/**backend/hadoop/hbase/**HBaseStorage.html<http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html> >> ). >> >> Also, I faced a bug that I worked around by explicit casting. For some >> reason all the objects passed to putNext are of type DataByteArray, while >> the schema reports their correct types (tuple(string, int, int), long). >> This causes a lot of ClassCastExceptions because DataByteArray cannot be >> cast to any other type. I worked around this by passing everything to the >> STORE as a DataByteArray. >> >> Cheers! >> Younos >> >> Quoting Bill Graham <[EMAIL PROTECTED]>: >> >> The STORE command doesn't take the AS clause, that's to define the schema >>> at LOAD time. When storing, just prepare your relation with the the >>> desired >>> schema and then STORE it without the AS. >>> >>> You can do all the transformations you need to before the STORE and Pig >>> will combine them all into as few logical processing steps as possible, so >>> no need to worry about specifying many transformation statements. >>> >>> >>> On Mon, Dec 10, 2012 at 7:31 PM, <[EMAIL PROTECTED]> wrote: >>> >>> Hello, >>>> >>>> I'm using HBaseStorage and I want to change the layout of the schema >>>> before storage. Specifically I want to group some values into a tuple >>>> (thus >>>> reducing the number of repetitions of the row and column keys). >>>> >>>> Even though the JavaDoc gives an example that uses AS schema Grunt >>>> complains that it is not parsable. Here's what I am trying: >>>> >>>> STORE dataToStore INTO 'hbase://tableName' USING HBaseStorage('cf:tuple, >>>> cf:date') AS TOTUPLE(val1, val2, val3), date; >>>> >>>> Is this possible? Or do I have to do the transformation in a separate >>>> step: >>>> >>>> dataTransformed = FOREACH dataToStore GENERATE TOTUPLE(val1, val2, val3), >>>> date; >>>> >>>> In case of the latter, can Pig be told to merge this step with the next >>>> one? I tried a nested FOREACH where I can have an assignment operation, >>>> but >>>> I quickly found out that STORE is not supported within the FOREACH.. what >>>> was I thinking :). >>>> >>>> Thanks! >>>> >>>> -- Younos >>>> >>>> >>>> >>>> >>>> >>> >>> -- >>> *Note that I'm no longer using my Yahoo! email address. Please email me at >>> [EMAIL PROTECTED] going forward.* >>> >>> >> >> >> Best regards, >> Younos Aboulnaga >> >> Masters candidate >> David Cheriton school of computer science Best regards, Younos Aboulnaga Masters candidate David Cheriton school of computer science University of Waterloo http://cs.uwaterloo.ca E-Mail: [EMAIL PROTECTED] Mobile: +1 (519) 497-5669 +
yaboulna@... 2012-12-12, 16:04
-
Re: Changing the schema before StoringBill Graham 2012-12-13, 07:07
On Wed, Dec 12, 2012 at 8:04 AM, <[EMAIL PROTECTED]> wrote:
> Hello Bill, > > The bug didn't block me or waste any time. Regarding the cast, I can't > regenerate the bug right now because I'm running a script, but I can answer > your questions: > > 1) describe of the relation passed to store returns the generated schema > name for the tuple, as described in: http://bb10.com/java-hadoop-** > pig-devel/2011-07/msg00237.**html<http://bb10.com/java-hadoop-pig-devel/2011-07/msg00237.html> When you do TO_TUPLE try being explicit with the schema with an AS statement. > > > 2) I want to store all the values as a tuple under one key because I want > to minimize the repetitions of the row and column keys. I didn't specify > the caster, so I'm using the default whatever it is (I hope it is the > binary one not the UTF8 one) > Default caster is UTF8, which is what you want. > > 3) The class cast exception says that DataByteArray cannot be cast to Tuple > This is a result of something in your relations before the STORE, not HBaseStorage. It takes what's given to it, so if it's seeing DataByteArrays, something is producing them, possible a UDF. > > Regards! > > -- Younos > > Quoting Bill Graham <[EMAIL PROTECTED]>: > > Thanks Younos for catching that and sorry that you got bit by it. That is >> in fact a javadoc bug. I've just opened a JIRA for it: >> >> https://issues.apache.org/**jira/browse/PIG-3092<https://issues.apache.org/jira/browse/PIG-3092> >> http://pig.apache.org/docs/r0.**10.0/basic.html#store<http://pig.apache.org/docs/r0.10.0/basic.html#store> >> >> Regarding the casting, what does describe look like of the relation you >> pass to the STORE statement and what do you class cast exceptions look >> like? Which caster are you using? >> >> The relation you pass to STORE should be a flat relation of values, unless >> you want to store the toString of a tuple as a single column in HBase. >> >> >> On Tue, Dec 11, 2012 at 9:37 AM, <[EMAIL PROTECTED]> wrote: >> >> Hi Bill, >>> >>> Thanks for your reply. Since this is the case then JavaDocs of the class >>> needs to be fixed (see http://pig.apache.org/docs/r0.****<http://pig.apache.org/docs/r0.**> >>> 10.0/api/org/apache/pig/****backend/hadoop/hbase/****HBaseStorage.html< >>> http://pig.**apache.org/docs/r0.10.0/api/** >>> org/apache/pig/backend/hadoop/**hbase/HBaseStorage.html<http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html> >>> > >>> >>> ). >>> >>> Also, I faced a bug that I worked around by explicit casting. For some >>> reason all the objects passed to putNext are of type DataByteArray, while >>> the schema reports their correct types (tuple(string, int, int), long). >>> This causes a lot of ClassCastExceptions because DataByteArray cannot be >>> cast to any other type. I worked around this by passing everything to the >>> STORE as a DataByteArray. >>> >>> Cheers! >>> Younos >>> >>> Quoting Bill Graham <[EMAIL PROTECTED]>: >>> >>> The STORE command doesn't take the AS clause, that's to define the >>> schema >>> >>>> at LOAD time. When storing, just prepare your relation with the the >>>> desired >>>> schema and then STORE it without the AS. >>>> >>>> You can do all the transformations you need to before the STORE and Pig >>>> will combine them all into as few logical processing steps as possible, >>>> so >>>> no need to worry about specifying many transformation statements. >>>> >>>> >>>> On Mon, Dec 10, 2012 at 7:31 PM, <[EMAIL PROTECTED]> wrote: >>>> >>>> Hello, >>>> >>>>> >>>>> I'm using HBaseStorage and I want to change the layout of the schema >>>>> before storage. Specifically I want to group some values into a tuple >>>>> (thus >>>>> reducing the number of repetitions of the row and column keys). >>>>> >>>>> Even though the JavaDoc gives an example that uses AS schema Grunt >>>>> complains that it is not parsable. Here's what I am trying: >>>>> >>>>> STORE dataToStore INTO 'hbase://tableName' USING > *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.* +
Bill Graham 2012-12-13, 07:07
|