Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Loading LZOs With Some JSON


Copy link to this message
-
Re: Loading LZOs With Some JSON
Haha, yeah; that. I literally just got it to work when you emailed.
Thanks for all the help, Dmitriy!

Eli

On 9/13/11 12:30 PM, Dmitriy Ryaboy wrote:
> initial = LOAD 'some_file.lzo' USING
> com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
> AS (col1, col2, col3, json_data*:chararray*);
>
> or
>
> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
> piggybank.JsonStringToMap((chararray) **json_data) AS mapped_json_data;
>
>
> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
> type;
>
>
> On Tue, Sep 13, 2011 at 8:51 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>wrote:
>
>> Correction: I forgot to run the JsonStringToMap function when writing my
>> last email, when I run that, I get the same error as before
>> (*org.apache.pig.data.**DataByteArray cannot be cast to
>> java.lang.String*).
>>
>> My full workflow is as follows:
>>
>>
>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>> AS (col1, col2, col3, json_data);
>> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**
>> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;
>> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS
>> type;
>> dump extracted;
>>
>> Any ideas?
>>
>> Eli
>>
>>
>> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>>
>>> Well, it's not throwing me errors anymore. Now it's just discarding the
>>> field. When I run it on two records where I've verified a field exists in
>>> the json, I get:
>>>
>>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).
>>>
>>> More specifically, my json is of the following form:
>>>
>>> {"foo":0,"bar":"hi"}
>>>
>>> On that, I'm running:
>>>
>>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')
>>> AS (col1, col2, col3, json_data);
>>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type;
>>> dump extracted;
>>>
>>> Which gives me the above warning along with:
>>>
>>> ()
>>> ()
>>>
>>> I also tried it without the cast to chararray, but received the same
>>> results. Should I be casting json_data as some other data type when I load
>>> it initially? Seems by default it's cast to a bytearray when I describe
>>> initial. Would that be a problem?
>>>
>>> Thanks for all the help so far!
>>>
>>> Eli
>>>
>>>
>>>
>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>>
>>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>>> theoretically).
>>>> The values are bytearrays. You are probably trying to treat them as
>>>> strings.
>>>>   You have to do stuff like this:
>>>>
>>>> x = foreach myrelation generate
>>>>    (chararray) mymap#'foo' as foo,
>>>>    (chararray) mymap#'bar' as bar;
>>>>
>>>>
>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>
>>>>   wrote:
>>>>
>>>>   Hmmm, now it gets past my mention of the function, but when I run a dump
>>>>> on
>>>>> generated information, I get:
>>>>>
>>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt
>>>>> -
>>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray
>>>>> cannot
>>>>> be cast to java.lang.String*
>>>>>
>>>>> Thanks for all the help so far!
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>>
>>>>>   You also want json-simple-1.1.jar
>>>>>>
>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.****
>>>>>> com<[EMAIL PROTECTED]>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,
>>>>>> guava-*.jar,
>>>>>>
>>>>>>> and
>>>>>>> piggybank.jar, and then trying to use that UDF, but getting the
>>>>>>> following
>>>>>>> error:
>>>>>>>
>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>>> ParseException
>>>>>>>
>>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****
>>>>>>> ParseException
>>>>>>>         at java.lang.Class.forName0(******Native Method)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB