Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF discussion? Here or on the dev list? / Json Loading


Copy link to this message
-
Re: UDF discussion? Here or on the dev list? / Json Loading
Alex,

It's a hack (sort of) but here's how I always do it. Since parsing json
in java will put you in an insane asylum:

Write a map only wukong script that parses the json as you want it. See
the example here:

http://thedatachef.blogspot.com/2011/01/processing-json-records-with-hadoop-and.html

then use the STREAM operator to stream your raw records (load them as
chararrays first) through your wukong script. It's not perfect but it
gets the job done.

--jacob
@thedatachef
On Sat, 2011-01-29 at 12:12 +0000, Alex McLintock wrote:
> I wonder if discussion of the Piggybank and other User Defined Fields is
> best done here (since it is *using* Pig) or on the Development list (because
> it is enhancing Pig).
>
> I'm trying to load some Json into pig using the PigJsonLoader.java UDF which
> Kim Vogt posted about back in September. (It isn't in Piggybank AFAICS)
> https://gist.github.com/601331
>
>
> The class works for me - mostly....
>
>
> This works when the Json is just a single level
>
> {"field1": "value1", "field2": "value2", "field3": "value3"}
>
> But doesn't seem to work when the json is nested
>
> {"field1": "value1", "field2": "value2", {"field4": "value4", "field5":
> "value5", "field6": "value6"}, "field3": "value3"}
>
> Has anyone got this working? I can't see how the existing code deals with
> this.
> parseStringToTuple only creates a single Map. There is no recursion I can
> see.
>
>
>
> Any suggestions?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB