-Re: Changing the schema before Storing
Bill Graham 2012-12-13, 07:07
On Wed, Dec 12, 2012 at 8:04 AM, <[EMAIL PROTECTED]> wrote:
> Hello Bill,
> The bug didn't block me or waste any time. Regarding the cast, I can't
> regenerate the bug right now because I'm running a script, but I can answer
> your questions:
> 1) describe of the relation passed to store returns the generated schema
> name for the tuple, as described in: http://bb10.com/java-hadoop-**
When you do TO_TUPLE try being explicit with the schema with an AS
> 2) I want to store all the values as a tuple under one key because I want
> to minimize the repetitions of the row and column keys. I didn't specify
> the caster, so I'm using the default whatever it is (I hope it is the
> binary one not the UTF8 one)
Default caster is UTF8, which is what you want.
> 3) The class cast exception says that DataByteArray cannot be cast to Tuple
This is a result of something in your relations before the STORE, not
HBaseStorage. It takes what's given to it, so if it's seeing
DataByteArrays, something is producing them, possible a UDF.
> -- Younos
> Quoting Bill Graham <[EMAIL PROTECTED]>:
> Thanks Younos for catching that and sorry that you got bit by it. That is
>> in fact a javadoc bug. I've just opened a JIRA for it:
>> Regarding the casting, what does describe look like of the relation you
>> pass to the STORE statement and what do you class cast exceptions look
>> like? Which caster are you using?
>> The relation you pass to STORE should be a flat relation of values, unless
>> you want to store the toString of a tuple as a single column in HBase.
>> On Tue, Dec 11, 2012 at 9:37 AM, <[EMAIL PROTECTED]> wrote:
>> Hi Bill,
>>> Thanks for your reply. Since this is the case then JavaDocs of the class
>>> needs to be fixed (see http://pig.apache.org/docs/r0.****<http://pig.apache.org/docs/r0.**>
>>> Also, I faced a bug that I worked around by explicit casting. For some
>>> reason all the objects passed to putNext are of type DataByteArray, while
>>> the schema reports their correct types (tuple(string, int, int), long).
>>> This causes a lot of ClassCastExceptions because DataByteArray cannot be
>>> cast to any other type. I worked around this by passing everything to the
>>> STORE as a DataByteArray.
>>> Quoting Bill Graham <[EMAIL PROTECTED]>:
>>> The STORE command doesn't take the AS clause, that's to define the
>>>> at LOAD time. When storing, just prepare your relation with the the
>>>> schema and then STORE it without the AS.
>>>> You can do all the transformations you need to before the STORE and Pig
>>>> will combine them all into as few logical processing steps as possible,
>>>> no need to worry about specifying many transformation statements.
>>>> On Mon, Dec 10, 2012 at 7:31 PM, <[EMAIL PROTECTED]> wrote:
>>>>> I'm using HBaseStorage and I want to change the layout of the schema
>>>>> before storage. Specifically I want to group some values into a tuple
>>>>> reducing the number of repetitions of the row and column keys).
>>>>> Even though the JavaDoc gives an example that uses AS schema Grunt
>>>>> complains that it is not parsable. Here's what I am trying:
>>>>> STORE dataToStore INTO 'hbase://tableName' USING
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*