Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Mapping nested json objects to map data type


+
kiran chitturi 2013-03-14, 03:37
+
Harsha 2013-03-14, 03:54
+
kiran chitturi 2013-03-14, 06:25
+
Harsha 2013-03-14, 06:51
Copy link to this message
-
Re: Mapping nested json objects to map data type
kiran chitturi 2013-03-14, 15:08
Thank you Harsha.

I was able to run my scripts successfully by following the example scripts
and finally, I have my Json object in map data type.

Thanks again,
Kiran.
On Thu, Mar 14, 2013 at 2:51 AM, Harsha <[EMAIL PROTECTED]> wrote:

> Hi Kiran,
>       Can you take a look at pig scripts under here
>
>
> https://github.com/mozilla-metrics/telemetry-toolbox/tree/master/src/main/pig
> All of them uses those Json udfs to parse.
> --
> Harsha
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Wednesday, March 13, 2013 at 11:25 PM, kiran chitturi wrote:
>
> > Hi Harsha,
> >
> > I am using the UDF that was in the link
> >
> https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/eval/json/MapToJson.java
> > .
> >
> > I was able to run it successfully but I had some issues since the output
> is
> > null.
> >
> > Please find my commands below
> >
> > ----------
> > fields = load 'hbase://documents' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
> > -limit 5') as (rowkey, metadata:map[]);
> > fields_split = foreach fields generate
> > com.mozilla.pig.eval.json.MapToJson(metadata);
> > dump fields_split;
> > -----------
> >
> > The output is empty 51 records. When I used the command 'illustrate
> > fields_split', It gave me the below output.
> >
> > -------------------------------------------------------
> > | fields | rowkey:bytearray
> > | metadata:map
> >
> > |
> > ------------------------------------------------------
> > | |
> > collection100hdfs://LucidN1:50001/input/reuters/reut2-021.sgm-166.txt |
> >
> {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5685425678289969,"tika.Content-Type":"text/plain;
> > charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}} |
> > ------------------------------------
> > | fields_split | :chararray |
> > ------------------------------------
> > | | |
> > ------------------------------------
> >
> > Am I missing something here ? Can you give me a simple working usecase of
> > yours if you don't mind ? All of my records have something in the
> 'fields'
> > family. It is quite strange to see empty results.
> >
> > Please let me know your suggestions.
> >
> > Thank you,
> >
> >
> > On Wed, Mar 13, 2013 at 11:54 PM, Harsha <[EMAIL PROTECTED]> wrote:
> >
> > > Hi Kiran,
> > > If you are ok with using java for udfs take a look at this
> > >
> > >
> https://github.com/mozilla-metrics/akela/tree/master/src/main/java/com/mozilla/pig/eval/json
> > > we Use MapToJson to parse complex json objects from hbase.
> > > -Harsha
> > >
> > >
> > > --
> > > Harsha
> > >
> > >
> > > On Wednesday, March 13, 2013 at 8:37 PM, kiran chitturi wrote:
> > >
> > > > Hi!
> > > >
> > > > I am using Pig 0.10 version and I have a question about mapping
> nested
> > > JSON
> > > > objects from Hbase.
> > > >
> > > > *For example: *
> > > >
> > > > The below commands loads the field family from Hbase.
> > > >
> > > > fields = load 'hbase://documents' using
> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey
> true
> > > > -limit 5') as (rowkey, metadata:map[]);
> > > >
> > > > The metadata field looks like below after the above command. ( I used
> > > > 'illustrate fields' to get this)
> > > >
> > >
> > >
> {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5761632290266712,"tika.Content-Type":"text/plain;
> > > > charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}}
> > > >
> > > > Map data type worked as I wanted so far. Now, I would like the value
> for
> > > > 'fields_j' key to be also a Map data type. I think it is being
> assigned
> > > >
> > >
> > > as
> > > > 'byteArray' by default.
> > > >
> > > > Is there any way by which I can convert this in to a map data type ?
> That
> > > > would be helpful for me to process more.
> > > >
> > > > I tried to write python UDF but jython only supports python 2.5, I
> am not
> > > > sure how to convert this string in to a dictionary in python.
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>
+
kiran chitturi 2013-03-14, 04:40