Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> dereferencing bag of map


Copy link to this message
-
Re: dereferencing bag of map
This is what all worked:

1. Download elephant-bird-pig.jar and put in HDFS
2. REGISTER 'elephant-bird-pig.jar'; on grunt shell
3. Use
com.twitter.elephantbird.pig.piggybank.JsonStringToMap(attributes#'md') AS
metadata

Works brilliantly.

HTH
Ss

On Tue, Jun 25, 2013 at 6:35 PM, Abhinav Neelam <[EMAIL PROTECTED]>wrote:

> Use REGEX_EXTRACT_ALL
> Something like this should work (untested, please verify)
>
> rel2 = foreach rel1 generate
>
> FLATTEN(REGEX_EXTRACT_ALL(attributes#'md','\\{"cld":"(\\w+)","sld":"(\\w+)"\\}'))
> AS (cld: chararray, sld: chararray);
>
> Tighten up the regex appropriately.
>
>
> On 24 June 2013 14:55, Suresh Saggar <[EMAIL PROTECTED]> wrote:
>
> > *Thanks a lot* for your reply but the problem still exists. To clarify
> > further the exact sequence of pig statements are shown below:
> >
> > REGISTER 'hdfs://hadoop-prod-master.vpc:8020/user/hdfs/libs/prod.jar';
> > <<<<< *Our custom jar containing the Loader() code.*
> > records_log = LOAD
> > 'hdfs://hadoop-prod-master.vpc:8020/data/{prod}/{2013-06-20-11}/*' USING
> > com.example.Loader() AS (date:chararray, type:chararray, attributes:[]);
> > http = FILTER records_log BY type == 'm' AND attributes#'st' == 'http';
> > X = FOREACH http GENERATE attributes#'md' AS metadata;
> > Y = FOREACH X GENERATE FLATTEN(metadata);
> >
> > grunt> describe Y
> > Y: {metadata: bytearray}
> > grunt> describe X
> > X: {metadata: bytearray}
> >
> > Once I dump either X or Y, both result in the same. Further I tried
> FLATTEN
> > directly on records_log too, but no help i.e.
> > Z = FOREACH records_log GENERATE FLATTEN(attributes);
> >
> > Similarly JsonStorage() can't be used directly as my raw data (one stored
> > in HDFS) is not json, but a custom format as shown below:
> > 2013-06-20-11|m|{'st':'http','md':{'cId':'a','sId':'b'}}
> >
> > Here our Loader() takes above raw data as input and returns the output in
> > the format: (date:chararray, type:chararray, attributes:[]). Now since
> > attributes#'md' is a JSON here, I'm having problems getting the 'cId' &
> > 'sId' values. Hope this clarifies the context. I assume that FLATTEN
> > operator couldn't 'un-nests' the  attributes#'md' as that is represented
> as
> > {'cId':'a','sId':'b'} but not as ['cId'#'a','sId'#'b']  (map in pig) or
> > {('cId'#'a'),('sId'#'b')} (bag in pig).
> >
> > TIA
> > Ss
> >
> > On Fri, Jun 21, 2013 at 6:12 PM, Pradeep Gollakota <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Suresh,
> > >
> > > Look into using JsonStorage(). This seems to be what you're looking
> for.
> > > http://pig.apache.org/docs/r0.10.0/func.html#jsonloadstore
> > >
> > >
> > > On Fri, Jun 21, 2013 at 8:35 AM, Shahab Yunus <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Have you tried flattening the bag first?
> > > >
> > > >
> > > > On Fri, Jun 21, 2013 at 5:43 AM, Suresh Saggar <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Facing a similar challenge. Here X contains one column named
> > 'metadata'
> > > > of
> > > > > type bytearray. But the actual content is a JSON i.e. the value of
> > > > metadata
> > > > > field is a JSON (keys as sId & cId) as shown below:
> > > > >
> > > > > grunt> describe X
> > > > > X: {metadata: bytearray}
> > > > >
> > > > > grunt> dump X
> > > > > ({"sId":"003_w","cId":"k"})
> > > > > ({"sId":"001_rf","cId":"r"})
> > > > > ({"sId":"001_rf","cId":"r"})
> > > > > ({"sId":"004_rf","cId":"r"})
> > > > >
> > > > > Any idea how can I generate cId & sId as separate chararray
> columns?
> > > TIA
> > > > >
> > > > > Ss
> > > > >
> > > > > On Tue, Jun 18, 2013 at 5:52 AM, Pradeep Gollakota <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > What's the error you are seeing? What does you bag of maps look
> > like?
> > > > > What
> > > > > > exactly is a userId? Is it a field or is it a key in the map?
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 17, 2013 at 8:18 PM, Jerry Lam <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Pig users,
> > > > > > >
> > > > > > > anyone has experience in dereferencing a bag of maps? For
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB