Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Complare values with in bag


Copy link to this message
-
Complare values with in bag
Hi,

I am having similar kind of problem, Want to compare value with in bag.

Example

(1{(2,4)(1,5)) --> Now want to compare the overlap of tuple, if  ( 2 between 1 and 5) or (4 between 1 and 5) want to return true else false
(2, {(1,10)(5,8))
A = Group collection By  (x);
B = Foreach  A {
        Compareflag  = How to compare values with in bag for each group
                Generate  x,Compareflag;
};

Saurabh

comScore Media Metrix(r) Multi-Platform: Audience Analytics for the Brave New Digital World

www.comscore.com/multiplatform
-----Original Message-----
From: Jacob Perkins [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, November 20, 2013 7:55 AM
To: [EMAIL PROTECTED]
Subject: Re: Simple word count in pig..

Jamal,

You're going to want to use a FLATTEN and another group by. Consider:

flattened   = foreach processed generate id, flatten(tokens) as token;
frequency = foreach (group flattened by (id, token)) generate
                        flatten(group)         as (id, token),
                        COUNT(flattened) as freq;

Of course, this will spawn another map-reduce job. However, since COUNT is algebraic, pig will make use of combiners drastically reducing the amount of data sent to the reducers.

--jacob
@thedatachef

On Nov 19, 2013, at 5:45 PM, jamal sasha <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I have data already processed in following form:
>
>
> ( id ,{ bag of words})
> So for example:
>
> (foobar, {(foo), (foo),(foobar),(bar)})
> (foo,{(bar),(bar)})
>
> and so on..
> describe processed gives me:
> processed: {id: chararray,tokens: {tuple_of_tokens: (token:
> chararray)}}
>
>
> Now what I want is.. also count the number of times a word appears in
> this data and output it as foobar, foo, 2
> foobar,foobar,1
> foobar,bar,1
> foo,bar,2
>
> and so on...
>
> How do I do this in pig?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB