|
|
-
Re: Changing the schema before StoringBill Graham 2012-12-13, 07:07
On Wed, Dec 12, 2012 at 8:04 AM, <[EMAIL PROTECTED]> wrote:
> Hello Bill, > > The bug didn't block me or waste any time. Regarding the cast, I can't > regenerate the bug right now because I'm running a script, but I can answer > your questions: > > 1) describe of the relation passed to store returns the generated schema > name for the tuple, as described in: http://bb10.com/java-hadoop-** > pig-devel/2011-07/msg00237.**html<http://bb10.com/java-hadoop-pig-devel/2011-07/msg00237.html> When you do TO_TUPLE try being explicit with the schema with an AS statement. > > > 2) I want to store all the values as a tuple under one key because I want > to minimize the repetitions of the row and column keys. I didn't specify > the caster, so I'm using the default whatever it is (I hope it is the > binary one not the UTF8 one) > Default caster is UTF8, which is what you want. > > 3) The class cast exception says that DataByteArray cannot be cast to Tuple > This is a result of something in your relations before the STORE, not HBaseStorage. It takes what's given to it, so if it's seeing DataByteArrays, something is producing them, possible a UDF. > > Regards! > > -- Younos > > Quoting Bill Graham <[EMAIL PROTECTED]>: > > Thanks Younos for catching that and sorry that you got bit by it. That is >> in fact a javadoc bug. I've just opened a JIRA for it: >> >> https://issues.apache.org/**jira/browse/PIG-3092<https://issues.apache.org/jira/browse/PIG-3092> >> http://pig.apache.org/docs/r0.**10.0/basic.html#store<http://pig.apache.org/docs/r0.10.0/basic.html#store> >> >> Regarding the casting, what does describe look like of the relation you >> pass to the STORE statement and what do you class cast exceptions look >> like? Which caster are you using? >> >> The relation you pass to STORE should be a flat relation of values, unless >> you want to store the toString of a tuple as a single column in HBase. >> >> >> On Tue, Dec 11, 2012 at 9:37 AM, <[EMAIL PROTECTED]> wrote: >> >> Hi Bill, >>> >>> Thanks for your reply. Since this is the case then JavaDocs of the class >>> needs to be fixed (see http://pig.apache.org/docs/r0.****<http://pig.apache.org/docs/r0.**> >>> 10.0/api/org/apache/pig/****backend/hadoop/hbase/****HBaseStorage.html< >>> http://pig.**apache.org/docs/r0.10.0/api/** >>> org/apache/pig/backend/hadoop/**hbase/HBaseStorage.html<http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html> >>> > >>> >>> ). >>> >>> Also, I faced a bug that I worked around by explicit casting. For some >>> reason all the objects passed to putNext are of type DataByteArray, while >>> the schema reports their correct types (tuple(string, int, int), long). >>> This causes a lot of ClassCastExceptions because DataByteArray cannot be >>> cast to any other type. I worked around this by passing everything to the >>> STORE as a DataByteArray. >>> >>> Cheers! >>> Younos >>> >>> Quoting Bill Graham <[EMAIL PROTECTED]>: >>> >>> The STORE command doesn't take the AS clause, that's to define the >>> schema >>> >>>> at LOAD time. When storing, just prepare your relation with the the >>>> desired >>>> schema and then STORE it without the AS. >>>> >>>> You can do all the transformations you need to before the STORE and Pig >>>> will combine them all into as few logical processing steps as possible, >>>> so >>>> no need to worry about specifying many transformation statements. >>>> >>>> >>>> On Mon, Dec 10, 2012 at 7:31 PM, <[EMAIL PROTECTED]> wrote: >>>> >>>> Hello, >>>> >>>>> >>>>> I'm using HBaseStorage and I want to change the layout of the schema >>>>> before storage. Specifically I want to group some values into a tuple >>>>> (thus >>>>> reducing the number of repetitions of the row and column keys). >>>>> >>>>> Even though the JavaDoc gives an example that uses AS schema Grunt >>>>> complains that it is not parsable. Here's what I am trying: >>>>> >>>>> STORE dataToStore INTO 'hbase://tableName' USING > *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.* |