Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Loading LZOs With Some JSON


Copy link to this message
-
Re: Loading LZOs With Some JSON
Eli Finkelshteyn 2011-09-13, 16:31
Sweet! Just got this working! For anyone with the same problem in the
future: apparently JsonStringToMap() *does not* like bytearrays. If you
simply cast your json as a chararray when you're loading, the error
disappears!

Eli

On 9/13/11 11:51 AM, Eli Finkelshteyn wrote:
> Correction: I forgot to run the JsonStringToMap function when writing
> my last email, when I run that, I get the same error as before
> (*org.apache.pig.data.DataByteArray cannot be cast to java.lang.String*).
>
> My full workflow is as follows:
>
> initial = LOAD 'some_file.lzo' USING
> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1,
> col2, col3, json_data);
> map = FOREACH initial GENERATE
> com.twitter.elephantbird.pig.piggybank.JsonStringToMap(json_data) AS
> mapped_json_data;
> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type'
> AS type;
> dump extracted;
>
> Any ideas?
>
> Eli
>
> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>> Well, it's not throwing me errors anymore. Now it's just discarding
>> the field. When I run it on two records where I've verified a field
>> exists in the json, I get:
>>
>> Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).
>>
>> More specifically, my json is of the following form:
>>
>> {"foo":0,"bar":"hi"}
>>
>> On that, I'm running:
>>
>> initial = LOAD 'some_file.lzo' USING
>> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1,
>> col2, col3, json_data);
>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS
>> type;
>> dump extracted;
>>
>> Which gives me the above warning along with:
>>
>> ()
>> ()
>>
>> I also tried it without the cast to chararray, but received the same
>> results. Should I be casting json_data as some other data type when I
>> load it initially? Seems by default it's cast to a bytearray when I
>> describe initial. Would that be a problem?
>>
>> Thanks for all the help so far!
>>
>> Eli
>>
>>
>>
>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>> theoretically).
>>> The values are bytearrays. You are probably trying to treat them as
>>> strings.
>>>   You have to do stuff like this:
>>>
>>> x = foreach myrelation generate
>>>    (chararray) mymap#'foo' as foo,
>>>    (chararray) mymap#'bar' as bar;
>>>
>>>
>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>  
>>> wrote:
>>>
>>>> Hmmm, now it gets past my mention of the function, but when I run a
>>>> dump on
>>>> generated information, I get:
>>>>
>>>> 2011-09-12 14:48:12,814 [main] ERROR
>>>> org.apache.pig.tools.grunt.**Grunt -
>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray
>>>> cannot
>>>> be cast to java.lang.String*
>>>>
>>>> Thanks for all the help so far!
>>>>
>>>> Eli
>>>>
>>>>
>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>
>>>>> You also want json-simple-1.1.jar
>>>>>
>>>>>
>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli
>>>>> Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>>> guava-*.jar,
>>>>>> and
>>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>>> following
>>>>>> error:
>>>>>>
>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>> ParseException
>>>>>>
>>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****
>>>>>> ParseException
>>>>>>         at java.lang.Class.forName0(****Native Method)
>>>>>>         at java.lang.Class.forName(Class.****java:247)
>>>>>>         at org.apache.pig.impl.****PigContext.resolveClassName(**
>>>>>> PigContext.java:426)
>>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:456)
>>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:508)
>>>>>>         at org.apache.pig.impl.****PigContext.****