|
|
-
elephantbird JsonLoader doesn't like gz?
Dexin Wang 2011-05-18, 18:12
Hi,
Anyone using Twitter's elephantbird library? I was using its JsonLoader and got this error:
WARN com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode string Unexpected character () at position 0. at org.json.simple.parser.Yylex.yylex(Unknown Source) at org.json.simple.parser.JSONParser.nextToken(Unknown Source) at org.json.simple.parser.JSONParser.parse(Unknown Source) at org.json.simple.parser.JSONParser.parse(Unknown Source)
But if I manually gunzip the file to a clear text json file, JsonLoader works fine.
Again this fails:
raw_json = LOAD 'cc.json.gz' USING com.twitter.elephantbird.pig.load.JsonLoader();
this works:
$ gunzip cc.json.gz raw_json = LOAD 'cc.json' USING com.twitter.elephantbird.pig.load.JsonLoader();
Any suggestions for this? Or is there any other json loader library out there? I can write my own but would rather use one if already exists.
Thanks,
Dexin
-
Re: elephantbird JsonLoader doesn't like gz?
Dexin Wang 2011-05-18, 18:26
Or is it because I'm using Pig 0.6 where gz format is not supported? I'll run this on aws EMR which only pig 0.6 is supported. I have to use later version of Pig?
On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <[EMAIL PROTECTED]> wrote:
> Hi, > > Anyone using Twitter's elephantbird library? I was using its JsonLoader and > got this error: > > WARN com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode > string > Unexpected character () at position 0. > at org.json.simple.parser.Yylex.yylex(Unknown Source) > at org.json.simple.parser.JSONParser.nextToken(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > at org.json.simple.parser.JSONParser.parse(Unknown Source) > > But if I manually gunzip the file to a clear text json file, JsonLoader > works fine. > > Again this fails: > > raw_json = LOAD 'cc.json.gz' USING > com.twitter.elephantbird.pig.load.JsonLoader(); > > this works: > > $ gunzip cc.json.gz > raw_json = LOAD 'cc.json' USING > com.twitter.elephantbird.pig.load.JsonLoader(); > > Any suggestions for this? Or is there any other json loader library out > there? I can write my own but would rather use one if already exists. > > Thanks, > > Dexin >
-
Re: elephantbird JsonLoader doesn't like gz?
Dmitriy Ryaboy 2011-05-19, 03:43
Which version of EB are you using? I recently fixed this for someone, I believe it's been in every version since 1.2.3
D
On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <[EMAIL PROTECTED]> wrote: > Or is it because I'm using Pig 0.6 where gz format is not supported? I'll > run this on aws EMR which only pig 0.6 is supported. I have to use later > version of Pig? > > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> Anyone using Twitter's elephantbird library? I was using its JsonLoader and >> got this error: >> >> WARN com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode >> string >> Unexpected character () at position 0. >> at org.json.simple.parser.Yylex.yylex(Unknown Source) >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source) >> at org.json.simple.parser.JSONParser.parse(Unknown Source) >> at org.json.simple.parser.JSONParser.parse(Unknown Source) >> >> But if I manually gunzip the file to a clear text json file, JsonLoader >> works fine. >> >> Again this fails: >> >> raw_json = LOAD 'cc.json.gz' USING >> com.twitter.elephantbird.pig.load.JsonLoader(); >> >> this works: >> >> $ gunzip cc.json.gz >> raw_json = LOAD 'cc.json' USING >> com.twitter.elephantbird.pig.load.JsonLoader(); >> >> Any suggestions for this? Or is there any other json loader library out >> there? I can write my own but would rather use one if already exists. >> >> Thanks, >> >> Dexin >> >
-
Re: elephantbird JsonLoader doesn't like gz?
Dexin Wang 2011-05-19, 04:32
Turns out it's only a problem if I run it in local mode, running it in cluster doesn't have this problem. I'm using EB1.2.5.
Wonder how you fix the problem since it seems it's not EB problem. Or are you gunzipping it in EB load function?
On Wed, May 18, 2011 at 8:43 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Which version of EB are you using? I recently fixed this for someone, > I believe it's been in every version since 1.2.3 > > D > > On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <[EMAIL PROTECTED]> wrote: > > Or is it because I'm using Pig 0.6 where gz format is not supported? I'll > > run this on aws EMR which only pig 0.6 is supported. I have to use later > > version of Pig? > > > > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <[EMAIL PROTECTED]> > wrote: > > > >> Hi, > >> > >> Anyone using Twitter's elephantbird library? I was using its JsonLoader > and > >> got this error: > >> > >> WARN com.twitter.elephantbird.pig.load.JsonLoader - Could not > json-decode > >> string > >> Unexpected character () at position 0. > >> at org.json.simple.parser.Yylex.yylex(Unknown Source) > >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source) > >> at org.json.simple.parser.JSONParser.parse(Unknown Source) > >> at org.json.simple.parser.JSONParser.parse(Unknown Source) > >> > >> But if I manually gunzip the file to a clear text json file, JsonLoader > >> works fine. > >> > >> Again this fails: > >> > >> raw_json = LOAD 'cc.json.gz' USING > >> com.twitter.elephantbird.pig.load.JsonLoader(); > >> > >> this works: > >> > >> $ gunzip cc.json.gz > >> raw_json = LOAD 'cc.json' USING > >> com.twitter.elephantbird.pig.load.JsonLoader(); > >> > >> Any suggestions for this? Or is there any other json loader library out > >> there? I can write my own but would rather use one if already exists. > >> > >> Thanks, > >> > >> Dexin > >> > > >
-
Re: elephantbird JsonLoader doesn't like gz?
Eric Lubow 2011-05-19, 12:46
If you are trying to read gzip files on EMR, you CAN'T use local mode. Once you switch to normal mode, everything will start to work. On EMR, Pig 0.6 (their stock version) will not read gzip or bzip files in local mode.
-e
On Thu, May 19, 2011 at 00:32, Dexin Wang <[EMAIL PROTECTED]> wrote:
> Turns out it's only a problem if I run it in local mode, running it in > cluster doesn't have this problem. I'm using EB1.2.5. > > Wonder how you fix the problem since it seems it's not EB problem. Or are > you gunzipping it in EB load function? > > On Wed, May 18, 2011 at 8:43 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > Which version of EB are you using? I recently fixed this for someone, > > I believe it's been in every version since 1.2.3 > > > > D > > > > On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <[EMAIL PROTECTED]> > wrote: > > > Or is it because I'm using Pig 0.6 where gz format is not supported? > I'll > > > run this on aws EMR which only pig 0.6 is supported. I have to use > later > > > version of Pig? > > > > > > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <[EMAIL PROTECTED]> > > wrote: > > > > > >> Hi, > > >> > > >> Anyone using Twitter's elephantbird library? I was using its > JsonLoader > > and > > >> got this error: > > >> > > >> WARN com.twitter.elephantbird.pig.load.JsonLoader - Could not > > json-decode > > >> string > > >> Unexpected character () at position 0. > > >> at org.json.simple.parser.Yylex.yylex(Unknown Source) > > >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source) > > >> at org.json.simple.parser.JSONParser.parse(Unknown Source) > > >> at org.json.simple.parser.JSONParser.parse(Unknown Source) > > >> > > >> But if I manually gunzip the file to a clear text json file, > JsonLoader > > >> works fine. > > >> > > >> Again this fails: > > >> > > >> raw_json = LOAD 'cc.json.gz' USING > > >> com.twitter.elephantbird.pig.load.JsonLoader(); > > >> > > >> this works: > > >> > > >> $ gunzip cc.json.gz > > >> raw_json = LOAD 'cc.json' USING > > >> com.twitter.elephantbird.pig.load.JsonLoader(); > > >> > > >> Any suggestions for this? Or is there any other json loader library > out > > >> there? I can write my own but would rather use one if already exists. > > >> > > >> Thanks, > > >> > > >> Dexin > > >> > > > > > >
Eric Lubow e: [EMAIL PROTECTED] w: eric.lubow.org
-
Re: elephantbird JsonLoader doesn't like gz?
Dmitriy Ryaboy 2011-05-19, 14:25
Without getting into the details -- local mode in Pig was fundamentally flawed when it comes to reading anything but the simplest of formats, which is why the whole thing was changed in 0.7.
Upgrade :).
D
On Thu, May 19, 2011 at 5:46 AM, Eric Lubow <[EMAIL PROTECTED]> wrote: > If you are trying to read gzip files on EMR, you CAN'T use local mode. Once > you switch to normal mode, everything will start to work. On EMR, Pig 0.6 > (their stock version) will not read gzip or bzip files in local mode. > > -e > > On Thu, May 19, 2011 at 00:32, Dexin Wang <[EMAIL PROTECTED]> wrote: > >> Turns out it's only a problem if I run it in local mode, running it in >> cluster doesn't have this problem. I'm using EB1.2.5. >> >> Wonder how you fix the problem since it seems it's not EB problem. Or are >> you gunzipping it in EB load function? >> >> On Wed, May 18, 2011 at 8:43 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> >> wrote: >> >> > Which version of EB are you using? I recently fixed this for someone, >> > I believe it's been in every version since 1.2.3 >> > >> > D >> > >> > On Wed, May 18, 2011 at 11:26 AM, Dexin Wang <[EMAIL PROTECTED]> >> wrote: >> > > Or is it because I'm using Pig 0.6 where gz format is not supported? >> I'll >> > > run this on aws EMR which only pig 0.6 is supported. I have to use >> later >> > > version of Pig? >> > > >> > > On Wed, May 18, 2011 at 11:12 AM, Dexin Wang <[EMAIL PROTECTED]> >> > wrote: >> > > >> > >> Hi, >> > >> >> > >> Anyone using Twitter's elephantbird library? I was using its >> JsonLoader >> > and >> > >> got this error: >> > >> >> > >> WARN com.twitter.elephantbird.pig.load.JsonLoader - Could not >> > json-decode >> > >> string >> > >> Unexpected character () at position 0. >> > >> at org.json.simple.parser.Yylex.yylex(Unknown Source) >> > >> at org.json.simple.parser.JSONParser.nextToken(Unknown Source) >> > >> at org.json.simple.parser.JSONParser.parse(Unknown Source) >> > >> at org.json.simple.parser.JSONParser.parse(Unknown Source) >> > >> >> > >> But if I manually gunzip the file to a clear text json file, >> JsonLoader >> > >> works fine. >> > >> >> > >> Again this fails: >> > >> >> > >> raw_json = LOAD 'cc.json.gz' USING >> > >> com.twitter.elephantbird.pig.load.JsonLoader(); >> > >> >> > >> this works: >> > >> >> > >> $ gunzip cc.json.gz >> > >> raw_json = LOAD 'cc.json' USING >> > >> com.twitter.elephantbird.pig.load.JsonLoader(); >> > >> >> > >> Any suggestions for this? Or is there any other json loader library >> out >> > >> there? I can write my own but would rather use one if already exists. >> > >> >> > >> Thanks, >> > >> >> > >> Dexin >> > >> >> > > >> > >> > > Eric Lubow e: [EMAIL PROTECTED] w: eric.lubow.org >
|
|