Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Finding records greater than a value


+
jamal sasha 2012-09-28, 16:52
Copy link to this message
-
Re: Finding records greater than a value
Hi,

Please try this:

value = load '1.txt' using PigStorage(',') as (id:int,amount:float);
minAndMax = load '2.txt' using PigStorage(',') as
(id:int,min:float,max:float);
joined = join value by id, minAndMax by id;
filtered = filter joined by (value::amount > minAndMax::min and
value::amount < minAndMax::max);
grouped = group filtered by value::id;
result = foreach grouped generate group, AVG(filtered.value::amount);
dump result;

Given that "1.txt" and "2.txt" are as follows:

cheolsoo@localhost:~/workspace/pig-2778-matches $cat 2.txt
1234,8,150
1158,0,200
cheolsoo@localhost:~/workspace/pig-2778-matches $cat 1.txt
1234,22.7
1158,88
1234,280
1158,100

The result is:

(1158,94.0)
(1234,22.700000762939453)

Thanks,
Cheolsoo

On Fri, Sep 28, 2012 at 9:52 AM, jamal sasha <[EMAIL PROTECTED]> wrote:

> Hi
>  I have two files..
> File 1 contains following data.
> Id, amount
> 1234, 22.7
> 1158,88
> 1234,  280
>
> File 2 contains following data
> Id, min, max
> 1234, 8, 150
>
> Now I want to calculate the mean (avg) but without considering the values
> less or greater than min and max respectively
>
> So basically in mean calculation here
> I don't want 1234, 280 as 280 > 150
>
> Any suggestions
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB