Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Loading LZOs With Some JSON


Copy link to this message
-
Re: Loading LZOs With Some JSON
Eli Finkelshteyn 2011-09-13, 15:51
Correction: I forgot to run the JsonStringToMap function when writing my
last email, when I run that, I get the same error as before
(*org.apache.pig.data.DataByteArray cannot be cast to java.lang.String*).

My full workflow is as follows:

initial = LOAD 'some_file.lzo' USING
com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, col2,
col3, json_data);
map = FOREACH initial GENERATE
com.twitter.elephantbird.pig.piggybank.JsonStringToMap(json_data) AS
mapped_json_data;
extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
type;
dump extracted;

Any ideas?

Eli

On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
> Well, it's not throwing me errors anymore. Now it's just discarding
> the field. When I run it on two records where I've verified a field
> exists in the json, I get:
>
> Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).
>
> More specifically, my json is of the following form:
>
> {"foo":0,"bar":"hi"}
>
> On that, I'm running:
>
> initial = LOAD 'some_file.lzo' USING
> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1,
> col2, col3, json_data);
> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS
> type;
> dump extracted;
>
> Which gives me the above warning along with:
>
> ()
> ()
>
> I also tried it without the cast to chararray, but received the same
> results. Should I be casting json_data as some other data type when I
> load it initially? Seems by default it's cast to a bytearray when I
> describe initial. Would that be a problem?
>
> Thanks for all the help so far!
>
> Eli
>
>
>
> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>> theoretically).
>> The values are bytearrays. You are probably trying to treat them as
>> strings.
>>   You have to do stuff like this:
>>
>> x = foreach myrelation generate
>>    (chararray) mymap#'foo' as foo,
>>    (chararray) mymap#'bar' as bar;
>>
>>
>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>  
>> wrote:
>>
>>> Hmmm, now it gets past my mention of the function, but when I run a
>>> dump on
>>> generated information, I get:
>>>
>>> 2011-09-12 14:48:12,814 [main] ERROR
>>> org.apache.pig.tools.grunt.**Grunt -
>>> ERROR 2997: Unable to recreate exception from backed error:
>>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray
>>> cannot
>>> be cast to java.lang.String*
>>>
>>> Thanks for all the help so far!
>>>
>>> Eli
>>>
>>>
>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>
>>>> You also want json-simple-1.1.jar
>>>>
>>>>
>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli
>>>> Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>
>>>>> wrote:
>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>> guava-*.jar,
>>>>> and
>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>> following
>>>>> error:
>>>>>
>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>> ParseException
>>>>>
>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****
>>>>> ParseException
>>>>>         at java.lang.Class.forName0(****Native Method)
>>>>>         at java.lang.Class.forName(Class.****java:247)
>>>>>         at org.apache.pig.impl.****PigContext.resolveClassName(**
>>>>> PigContext.java:426)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromSpec(**
>>>>> PigContext.java:456)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromSpec(**
>>>>> PigContext.java:508)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromAlias(**
>>>>> PigContext.java:531)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**