Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Loading LZOs With Some JSON


Copy link to this message
-
Re: Loading LZOs With Some JSON
Correction: I forgot to run the JsonStringToMap function when writing my
last email, when I run that, I get the same error as before
(*org.apache.pig.data.DataByteArray cannot be cast to java.lang.String*).

My full workflow is as follows:

initial = LOAD 'some_file.lzo' USING
com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, col2,
col3, json_data);
map = FOREACH initial GENERATE
com.twitter.elephantbird.pig.piggybank.JsonStringToMap(json_data) AS
mapped_json_data;
extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
type;
dump extracted;

Any ideas?

Eli

On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
> Well, it's not throwing me errors anymore. Now it's just discarding
> the field. When I run it on two records where I've verified a field
> exists in the json, I get:
>
> Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).
>
> More specifically, my json is of the following form:
>
> {"foo":0,"bar":"hi"}
>
> On that, I'm running:
>
> initial = LOAD 'some_file.lzo' USING
> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1,
> col2, col3, json_data);
> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS
> type;
> dump extracted;
>
> Which gives me the above warning along with:
>
> ()
> ()
>
> I also tried it without the cast to chararray, but received the same
> results. Should I be casting json_data as some other data type when I
> load it initially? Seems by default it's cast to a bytearray when I
> describe initial. Would that be a problem?
>
> Thanks for all the help so far!
>
> Eli
>
>
>
> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>> theoretically).
>> The values are bytearrays. You are probably trying to treat them as
>> strings.
>>   You have to do stuff like this:
>>
>> x = foreach myrelation generate
>>    (chararray) mymap#'foo' as foo,
>>    (chararray) mymap#'bar' as bar;
>>
>>
>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>  
>> wrote:
>>
>>> Hmmm, now it gets past my mention of the function, but when I run a
>>> dump on
>>> generated information, I get:
>>>
>>> 2011-09-12 14:48:12,814 [main] ERROR
>>> org.apache.pig.tools.grunt.**Grunt -
>>> ERROR 2997: Unable to recreate exception from backed error:
>>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray
>>> cannot
>>> be cast to java.lang.String*
>>>
>>> Thanks for all the help so far!
>>>
>>> Eli
>>>
>>>
>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>
>>>> You also want json-simple-1.1.jar
>>>>
>>>>
>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli
>>>> Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>
>>>>> wrote:
>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>> guava-*.jar,
>>>>> and
>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>> following
>>>>> error:
>>>>>
>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>> ParseException
>>>>>
>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****
>>>>> ParseException
>>>>>         at java.lang.Class.forName0(****Native Method)
>>>>>         at java.lang.Class.forName(Class.****java:247)
>>>>>         at org.apache.pig.impl.****PigContext.resolveClassName(**
>>>>> PigContext.java:426)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromSpec(**
>>>>> PigContext.java:456)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromSpec(**
>>>>> PigContext.java:508)
>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>> instantiateFuncFromAlias(**
>>>>> PigContext.java:531)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)
>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB