Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Loading LZOs With Some JSON


Copy link to this message
-
Re: Loading LZOs With Some JSON
Eli Finkelshteyn 2011-09-13, 16:33
Haha, yeah; that. I literally just got it to work when you emailed.
Thanks for all the help, Dmitriy!

Eli

On 9/13/11 12:30 PM, Dmitriy Ryaboy wrote:
> initial = LOAD 'some_file.lzo' USING
> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
> AS (col1, col2, col3, json_data*:chararray*);
>
> or
>
> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
> piggybank.JsonStringToMap((chararray) **json_data) AS mapped_json_data;
>
>
> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
> type;
>
>
> On Tue, Sep 13, 2011 at 8:51 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>wrote:
>
>> Correction: I forgot to run the JsonStringToMap function when writing my
>> last email, when I run that, I get the same error as before
>> (*org.apache.pig.data.**DataByteArray cannot be cast to
>> java.lang.String*).
>>
>> My full workflow is as follows:
>>
>>
>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>> AS (col1, col2, col3, json_data);
>> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
>> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;
>> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
>> type;
>> dump extracted;
>>
>> Any ideas?
>>
>> Eli
>>
>>
>> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>>
>>> Well, it's not throwing me errors anymore. Now it's just discarding the
>>> field. When I run it on two records where I've verified a field exists in
>>> the json, I get:
>>>
>>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).
>>>
>>> More specifically, my json is of the following form:
>>>
>>> {"foo":0,"bar":"hi"}
>>>
>>> On that, I'm running:
>>>
>>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>>> AS (col1, col2, col3, json_data);
>>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type;
>>> dump extracted;
>>>
>>> Which gives me the above warning along with:
>>>
>>> ()
>>> ()
>>>
>>> I also tried it without the cast to chararray, but received the same
>>> results. Should I be casting json_data as some other data type when I load
>>> it initially? Seems by default it's cast to a bytearray when I describe
>>> initial. Would that be a problem?
>>>
>>> Thanks for all the help so far!
>>>
>>> Eli
>>>
>>>
>>>
>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>>
>>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>>> theoretically).
>>>> The values are bytearrays. You are probably trying to treat them as
>>>> strings.
>>>>   You have to do stuff like this:
>>>>
>>>> x = foreach myrelation generate
>>>>    (chararray) mymap#'foo' as foo,
>>>>    (chararray) mymap#'bar' as bar;
>>>>
>>>>
>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>
>>>>   wrote:
>>>>
>>>>   Hmmm, now it gets past my mention of the function, but when I run a dump
>>>>> on
>>>>> generated information, I get:
>>>>>
>>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt
>>>>> -
>>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray
>>>>> cannot
>>>>> be cast to java.lang.String*
>>>>>
>>>>> Thanks for all the help so far!
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>>
>>>>>   You also want json-simple-1.1.jar
>>>>>>
>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>>> com<[EMAIL PROTECTED]>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>>>> guava-*.jar,
>>>>>>
>>>>>>> and
>>>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>>>> following
>>>>>>> error:
>>>>>>>
>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>>> ParseException
>>>>>>>
>>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****
>>>>>>> ParseException
>>>>>>>         at java.lang.Class.forName0(******Native Method)