Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> elephantbird JsonLoader doesn't like gz?


Copy link to this message
-
Re: elephantbird JsonLoader doesn't like gz?
Or is it because I'm using Pig 0.6 where gz format is not supported? I'll
run this on aws EMR which only pig 0.6 is supported. I have to use later
version of Pig?

On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Anyone using Twitter's elephantbird library? I was using its JsonLoader and
> got this error:
>
> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode
> string
> Unexpected character () at position 0.
> at org.json.simple.parser.Yylex.yylex(Unknown Source)
> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> at org.json.simple.parser.JSONParser.parse(Unknown Source)
>
> But if I manually gunzip the file to a clear text json file, JsonLoader
> works fine.
>
> Again this fails:
>
> raw_json = LOAD 'cc.json.gz' USING
> com.twitter.elephantbird.pig.load.JsonLoader();
>
> this works:
>
> $ gunzip cc.json.gz
> raw_json = LOAD 'cc.json' USING
> com.twitter.elephantbird.pig.load.JsonLoader();
>
> Any suggestions for this? Or is there any other json loader library out
> there? I can write my own but would rather use one if already exists.
>
> Thanks,
>
> Dexin
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB