Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Finding records greater than a value


Copy link to this message
-
Re: Finding records greater than a value
Cheolsoo Park 2012-09-28, 17:40
Hi,

Please try this:

value = load '1.txt' using PigStorage(',') as (id:int,amount:float);
minAndMax = load '2.txt' using PigStorage(',') as
(id:int,min:float,max:float);
joined = join value by id, minAndMax by id;
filtered = filter joined by (value::amount > minAndMax::min and
value::amount < minAndMax::max);
grouped = group filtered by value::id;
result = foreach grouped generate group, AVG(filtered.value::amount);
dump result;

Given that "1.txt" and "2.txt" are as follows:

cheolsoo@localhost:~/workspace/pig-2778-matches $cat 2.txt
1234,8,150
1158,0,200
cheolsoo@localhost:~/workspace/pig-2778-matches $cat 1.txt
1234,22.7
1158,88
1234,280
1158,100

The result is:

(1158,94.0)
(1234,22.700000762939453)

Thanks,
Cheolsoo

On Fri, Sep 28, 2012 at 9:52 AM, jamal sasha <[EMAIL PROTECTED]> wrote:

> Hi
>  I have two files..
> File 1 contains following data.
> Id, amount
> 1234, 22.7
> 1158,88
> 1234,  280
>
> File 2 contains following data
> Id, min, max
> 1234, 8, 150
>
> Now I want to calculate the mean (avg) but without considering the values
> less or greater than min and max respectively
>
> So basically in mean calculation here
> I don't want 1234, 280 as 280 > 150
>
> Any suggestions
>