Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - UDF discussion? Here or on the dev list? / Json Loading


+
Alex McLintock 2011-01-29, 12:12
Copy link to this message
-
Re: UDF discussion? Here or on the dev list? / Json Loading
Jacob Perkins 2011-01-29, 13:43
Alex,

It's a hack (sort of) but here's how I always do it. Since parsing json
in java will put you in an insane asylum:

Write a map only wukong script that parses the json as you want it. See
the example here:

http://thedatachef.blogspot.com/2011/01/processing-json-records-with-hadoop-and.html

then use the STREAM operator to stream your raw records (load them as
chararrays first) through your wukong script. It's not perfect but it
gets the job done.

--jacob
@thedatachef
On Sat, 2011-01-29 at 12:12 +0000, Alex McLintock wrote:
> I wonder if discussion of the Piggybank and other User Defined Fields is
> best done here (since it is *using* Pig) or on the Development list (because
> it is enhancing Pig).
>
> I'm trying to load some Json into pig using the PigJsonLoader.java UDF which
> Kim Vogt posted about back in September. (It isn't in Piggybank AFAICS)
> https://gist.github.com/601331
>
>
> The class works for me - mostly....
>
>
> This works when the Json is just a single level
>
> {"field1": "value1", "field2": "value2", "field3": "value3"}
>
> But doesn't seem to work when the json is nested
>
> {"field1": "value1", "field2": "value2", {"field4": "value4", "field5":
> "value5", "field6": "value6"}, "field3": "value3"}
>
> Has anyone got this working? I can't see how the existing code deals with
> this.
> parseStringToTuple only creates a single Map. There is no recursion I can
> see.
>
>
>
> Any suggestions?
+
Alex McLintock 2011-01-30, 20:09
+
Jacob Perkins 2011-01-30, 21:01
+
Harsh J 2011-01-30, 22:23