Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Changing the schema before Storing


+
yaboulna@... 2012-12-11, 03:31
+
Bill Graham 2012-12-11, 07:27
+
yaboulna@... 2012-12-11, 17:37
+
Bill Graham 2012-12-12, 06:37
+
yaboulna@... 2012-12-12, 16:04
Copy link to this message
-
Re: Changing the schema before Storing
Bill Graham 2012-12-13, 07:07
On Wed, Dec 12, 2012 at 8:04 AM, <[EMAIL PROTECTED]> wrote:

> Hello Bill,
>
> The bug didn't block me or waste any time. Regarding the cast, I can't
> regenerate the bug right now because I'm running a script, but I can answer
> your questions:
>
> 1) describe of the relation passed to store returns the generated schema
> name for the tuple, as described in: http://bb10.com/java-hadoop-**
> pig-devel/2011-07/msg00237.**html<http://bb10.com/java-hadoop-pig-devel/2011-07/msg00237.html>
When you do TO_TUPLE try being explicit with the schema with an AS
statement.
>
>
> 2) I want to store all the values as a tuple under one key because I want
> to minimize the repetitions of the row and column keys. I didn't specify
> the caster, so I'm using the default whatever it is (I hope it is the
> binary one not the UTF8 one)
>

Default caster is UTF8, which is what you want.
>
> 3) The class cast exception says that DataByteArray cannot be cast to Tuple
>

This is a result of something in your relations before the STORE, not
HBaseStorage. It takes what's given to it, so if it's seeing
DataByteArrays, something is producing them, possible a UDF.
>
> Regards!
>
> -- Younos
>
> Quoting Bill Graham <[EMAIL PROTECTED]>:
>
>  Thanks Younos for catching that and sorry that you got bit by it. That is
>> in fact a javadoc bug. I've just opened a JIRA for it:
>>
>> https://issues.apache.org/**jira/browse/PIG-3092<https://issues.apache.org/jira/browse/PIG-3092>
>> http://pig.apache.org/docs/r0.**10.0/basic.html#store<http://pig.apache.org/docs/r0.10.0/basic.html#store>
>>
>> Regarding the casting, what does describe look like of the relation you
>> pass to the STORE statement and what do you class cast exceptions look
>> like? Which caster are you using?
>>
>> The relation you pass to STORE should be a flat relation of values, unless
>> you want to store the toString of a tuple as a single column in HBase.
>>
>>
>> On Tue, Dec 11, 2012 at 9:37 AM, <[EMAIL PROTECTED]> wrote:
>>
>>  Hi Bill,
>>>
>>> Thanks for your reply. Since this is the case then JavaDocs of the class
>>> needs to be fixed (see http://pig.apache.org/docs/r0.****<http://pig.apache.org/docs/r0.**>
>>> 10.0/api/org/apache/pig/****backend/hadoop/hbase/****HBaseStorage.html<
>>> http://pig.**apache.org/docs/r0.10.0/api/**
>>> org/apache/pig/backend/hadoop/**hbase/HBaseStorage.html<http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html>
>>> >
>>>
>>> ).
>>>
>>> Also, I faced a bug that I worked around by explicit casting. For some
>>> reason all the objects passed to putNext are of type DataByteArray, while
>>> the schema reports their correct types (tuple(string, int, int), long).
>>> This causes a lot of ClassCastExceptions because DataByteArray cannot be
>>> cast to any other type. I worked around this by passing everything to the
>>> STORE as a DataByteArray.
>>>
>>> Cheers!
>>> Younos
>>>
>>> Quoting Bill Graham <[EMAIL PROTECTED]>:
>>>
>>>  The STORE command doesn't take the AS clause, that's to define the
>>> schema
>>>
>>>> at LOAD time. When storing, just prepare your relation with the the
>>>> desired
>>>> schema and then STORE it without the AS.
>>>>
>>>> You can do all the transformations you need to before the STORE and Pig
>>>> will combine them all into as few logical processing steps as possible,
>>>> so
>>>> no need to worry about specifying many transformation statements.
>>>>
>>>>
>>>> On Mon, Dec 10, 2012 at 7:31 PM, <[EMAIL PROTECTED]> wrote:
>>>>
>>>>  Hello,
>>>>
>>>>>
>>>>> I'm using HBaseStorage and I want to change the layout of the schema
>>>>> before storage. Specifically I want to group some values into a tuple
>>>>> (thus
>>>>> reducing the number of repetitions of the row and column keys).
>>>>>
>>>>> Even though the JavaDoc gives an example that uses AS schema Grunt
>>>>> complains that it is not parsable. Here's what I am trying:
>>>>>
>>>>> STORE dataToStore INTO 'hbase://tableName' USING
>

*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*