Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> elephantbird JsonLoader doesn't like gz?


Copy link to this message
-
Re: elephantbird JsonLoader doesn't like gz?
Turns out it's only a problem if I run it in local mode, running it in
cluster doesn't have this problem. I'm using EB1.2.5.

Wonder how you fix the problem since it seems it's not EB problem. Or are
you gunzipping it in EB load function?

On Wed, May 18, 2011 at 8:43 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Which version of EB are you using? I recently fixed this for someone,
> I believe it's been in every version since 1.2.3
>
> D
>
> On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <[EMAIL PROTECTED]> wrote:
> > Or is it because I'm using Pig 0.6 where gz format is not supported? I'll
> > run this on aws EMR which only pig 0.6 is supported. I have to use later
> > version of Pig?
> >
> > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hi,
> >>
> >> Anyone using Twitter's elephantbird library? I was using its JsonLoader
> and
> >> got this error:
> >>
> >> WARN  com.twitter.elephantbird.pig.load.JsonLoader - Could not
> json-decode
> >> string
> >> Unexpected character () at position 0.
> >> at org.json.simple.parser.Yylex.yylex(Unknown Source)
> >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
> >>  at org.json.simple.parser.JSONParser.parse(Unknown Source)
> >> at org.json.simple.parser.JSONParser.parse(Unknown Source)
> >>
> >> But if I manually gunzip the file to a clear text json file, JsonLoader
> >> works fine.
> >>
> >> Again this fails:
> >>
> >> raw_json = LOAD 'cc.json.gz' USING
> >> com.twitter.elephantbird.pig.load.JsonLoader();
> >>
> >> this works:
> >>
> >> $ gunzip cc.json.gz
> >> raw_json = LOAD 'cc.json' USING
> >> com.twitter.elephantbird.pig.load.JsonLoader();
> >>
> >> Any suggestions for this? Or is there any other json loader library out
> >> there? I can write my own but would rather use one if already exists.
> >>
> >> Thanks,
> >>
> >> Dexin
> >>
> >
>