|
|
+
Alex McLintock 2011-01-29, 12:12
+
Jacob Perkins 2011-01-29, 13:43
+
Alex McLintock 2011-01-30, 20:09
+
Jacob Perkins 2011-01-30, 21:01
-
Re: UDF discussion? Here or on the dev list? / Json LoadingHarsh J 2011-01-30, 22:23
Hello,
On Sat, Jan 29, 2011 at 5:42 PM, Alex McLintock <[EMAIL PROTECTED]> wrote: > I wonder if discussion of the Piggybank and other User Defined Fields is > best done here (since it is *using* Pig) or on the Development list (because > it is enhancing Pig). > > I'm trying to load some Json into pig using the PigJsonLoader.java UDF which > Kim Vogt posted about back in September. (It isn't in Piggybank AFAICS) > https://gist.github.com/601331 > > > The class works for me - mostly.... > > > This works when the Json is just a single level > > {"field1": "value1", "field2": "value2", "field3": "value3"} > > But doesn't seem to work when the json is nested > > {"field1": "value1", "field2": "value2", {"field4": "value4", "field5": > "value5", "field6": "value6"}, "field3": "value3"} > The json-simple library for Java will build the entire JSON representation as a JSONObject, which is _exactly_ what you need. This is a Java Map-like class which would contain your structure properly. What remains is to properly convert this to a Pig-acceptable Map structure. But what's happening in Vogt's code (and also Elephant-Bird's LzoJsonLoader from which it was sourced) is that the Map is down-converted to a simple Key-Value mapping instead of a Map containing another Map. This was done due to a limitation in Pig 0.6.0, where the Map type could not hold complex types in it -- as noted in the latter class's javadoc [1]. This limitation has gone away in 0.7.0+ I think (As the Pig Map spec now supports <String, {Atom, Tuple, Bag, Map}>, so you can feel free to change/get rid of the iteration inside parseStringToTuple(...) to not 'flatten' the Map. Additionally I think the json-simple dependency can perhaps be removed in favor of Jackson Core/Mapper libraries that are now being shipped by Hadoop itself (eliminating an extra JAR). Pig does not ship the json-simple library along. But you may want to be careful about the version of Jackson Core/Mapper in place inside your Hadoop. There are much more recent updates of it available with benefits. Perhaps, if you feel like, you can contribute your change back to elephant-bird [2]. I think they're open to newer-Pig related changes. [1] - https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/LzoJsonLoader.java [2] - https://github.com/kevinweil/elephant-bird -- Harsh J www.harshj.com |