Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF discussion? Here or on the dev list? / Json Loading

Alex McLintock 2011-01-29, 12:12
Copy link to this message
Re: UDF discussion? Here or on the dev list? / Json Loading

It's a hack (sort of) but here's how I always do it. Since parsing json
in java will put you in an insane asylum:

Write a map only wukong script that parses the json as you want it. See
the example here:


then use the STREAM operator to stream your raw records (load them as
chararrays first) through your wukong script. It's not perfect but it
gets the job done.

On Sat, 2011-01-29 at 12:12 +0000, Alex McLintock wrote:
> I wonder if discussion of the Piggybank and other User Defined Fields is
> best done here (since it is *using* Pig) or on the Development list (because
> it is enhancing Pig).
> I'm trying to load some Json into pig using the PigJsonLoader.java UDF which
> Kim Vogt posted about back in September. (It isn't in Piggybank AFAICS)
> https://gist.github.com/601331
> The class works for me - mostly....
> This works when the Json is just a single level
> {"field1": "value1", "field2": "value2", "field3": "value3"}
> But doesn't seem to work when the json is nested
> {"field1": "value1", "field2": "value2", {"field4": "value4", "field5":
> "value5", "field6": "value6"}, "field3": "value3"}
> Has anyone got this working? I can't see how the existing code deals with
> this.
> parseStringToTuple only creates a single Map. There is no recursion I can
> see.
> Any suggestions?
Alex McLintock 2011-01-30, 20:09
Jacob Perkins 2011-01-30, 21:01
Harsh J 2011-01-30, 22:23