Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> dereferencing bag of map


Copy link to this message
-
Re: dereferencing bag of map
This is what all worked:

1. Download elephant-bird-pig.jar and put in HDFS
2. REGISTER 'elephant-bird-pig.jar'; on grunt shell
3. Use
com.twitter.elephantbird.pig.piggybank.JsonStringToMap(attributes#'md') AS
metadata

Works brilliantly.

HTH
Ss

On Tue, Jun 25, 2013 at 6:35 PM, Abhinav Neelam <[EMAIL PROTECTED]>wrote:

> Use REGEX_EXTRACT_ALL
> Something like this should work (untested, please verify)
>
> rel2 = foreach rel1 generate
>
> FLATTEN(REGEX_EXTRACT_ALL(attributes#'md','\\{"cld":"(\\w+)","sld":"(\\w+)"\\}'))
> AS (cld: chararray, sld: chararray);
>
> Tighten up the regex appropriately.
>
>
> On 24 June 2013 14:55, Suresh Saggar <[EMAIL PROTECTED]> wrote:
>
> > *Thanks a lot* for your reply but the problem still exists. To clarify
> > further the exact sequence of pig statements are shown below:
> >
> > REGISTER 'hdfs://hadoop-prod-master.vpc:8020/user/hdfs/libs/prod.jar';
> > <<<<< *Our custom jar containing the Loader() code.*
> > records_log = LOAD
> > 'hdfs://hadoop-prod-master.vpc:8020/data/{prod}/{2013-06-20-11}/*' USING
> > com.example.Loader() AS (date:chararray, type:chararray, attributes:[]);
> > http = FILTER records_log BY type == 'm' AND attributes#'st' == 'http';
> > X = FOREACH http GENERATE attributes#'md' AS metadata;
> > Y = FOREACH X GENERATE FLATTEN(metadata);
> >
> > grunt> describe Y
> > Y: {metadata: bytearray}
> > grunt> describe X
> > X: {metadata: bytearray}
> >
> > Once I dump either X or Y, both result in the same. Further I tried
> FLATTEN
> > directly on records_log too, but no help i.e.
> > Z = FOREACH records_log GENERATE FLATTEN(attributes);
> >
> > Similarly JsonStorage() can't be used directly as my raw data (one stored
> > in HDFS) is not json, but a custom format as shown below:
> > 2013-06-20-11|m|{'st':'http','md':{'cId':'a','sId':'b'}}
> >
> > Here our Loader() takes above raw data as input and returns the output in
> > the format: (date:chararray, type:chararray, attributes:[]). Now since
> > attributes#'md' is a JSON here, I'm having problems getting the 'cId' &
> > 'sId' values. Hope this clarifies the context. I assume that FLATTEN
> > operator couldn't 'un-nests' the  attributes#'md' as that is represented
> as
> > {'cId':'a','sId':'b'} but not as ['cId'#'a','sId'#'b']  (map in pig) or
> > {('cId'#'a'),('sId'#'b')} (bag in pig).
> >
> > TIA
> > Ss
> >
> > On Fri, Jun 21, 2013 at 6:12 PM, Pradeep Gollakota <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Suresh,
> > >
> > > Look into using JsonStorage(). This seems to be what you're looking
> for.
> > > http://pig.apache.org/docs/r0.10.0/func.html#jsonloadstore
> > >
> > >
> > > On Fri, Jun 21, 2013 at 8:35 AM, Shahab Yunus <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Have you tried flattening the bag first?
> > > >
> > > >
> > > > On Fri, Jun 21, 2013 at 5:43 AM, Suresh Saggar <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Facing a similar challenge. Here X contains one column named
> > 'metadata'
> > > > of
> > > > > type bytearray. But the actual content is a JSON i.e. the value of
> > > > metadata
> > > > > field is a JSON (keys as sId & cId) as shown below:
> > > > >
> > > > > grunt> describe X
> > > > > X: {metadata: bytearray}
> > > > >
> > > > > grunt> dump X
> > > > > ({"sId":"003_w","cId":"k"})
> > > > > ({"sId":"001_rf","cId":"r"})
> > > > > ({"sId":"001_rf","cId":"r"})
> > > > > ({"sId":"004_rf","cId":"r"})
> > > > >
> > > > > Any idea how can I generate cId & sId as separate chararray
> columns?
> > > TIA
> > > > >
> > > > > Ss
> > > > >
> > > > > On Tue, Jun 18, 2013 at 5:52 AM, Pradeep Gollakota <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > What's the error you are seeing? What does you bag of maps look
> > like?
> > > > > What
> > > > > > exactly is a userId? Is it a field or is it a key in the map?
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 17, 2013 at 8:18 PM, Jerry Lam <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Pig users,
> > > > > > >
> > > > > > > anyone has experience in dereferencing a bag of maps? For