Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - elephantbird JsonLoader doesn't like gz?


Copy link to this message
-
Re: elephantbird JsonLoader doesn't like gz?
Dexin Wang 2011-05-18, 18:26
Or is it because I'm using Pig 0.6 where gz format is not supported? I'll
run this on aws EMR which only pig 0.6 is supported. I have to use later
version of Pig?

On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Anyone using Twitter's elephantbird library? I was using its JsonLoader and
> got this error:
>
> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string
> Unexpected character () at position 0.
> at org.json.simple.parser.Yylex.yylex(Unknown Source)
> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>
> But if I manually gunzip the file to a clear text json file, JsonLoader
> works fine.
>
> Again this fails:
>
> raw_json = LOAD 'cc.json.gz' USING
> com.twitter.elephantbird.pig.load.JsonLoader();
>
> this works:
>
> $ gunzip cc.json.gz
> raw_json = LOAD 'cc.json' USING
> com.twitter.elephantbird.pig.load.JsonLoader();
>
> Any suggestions for this? Or is there any other json loader library out
> there? I can write my own but would rather use one if already exists.
>
> Thanks,
>
> Dexin
>