Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Can limit operator use variable?


Copy link to this message
-
Re: Can limit operator use variable?
Thejas Nair 2011-12-03, 03:06
Is this what you want ? (using TOP and COUNT).

raw_data = load ... as (id:chararray, weight:float);
group_id = group raw_data by id;

filter_spec_id = filter group_id by group == '1';
-- COMMENTED OUT - count_spec_id = foreach filter_spec_id generate
COUNT(raw_data) as tot;

sample_id = foreach filter_spec_id {
   order_weight = order raw_data by weight desc;
   limit_id = TOP((int)SIZE(raw_data)/2, 1, order_weight);
   generate limit_id;
}

---------

The use of variables will be supported for limit in 0.10 . But it is
supported only for scalar[1] variables. see -
https://issues.apache.org/jira/browse/PIG-1926

[1] see 'Casting Relations to Scalars' in
http://pig.apache.org/docs/r0.9.1/basic.html

It should be possible to add support for other variables in case of
limit in nested foreach statement.
But the way you used it can't be supported if there are multiple records
in count_spec_id, as the limit variable comes from a different relation,
and pig does not know which value from that relation should be used in
the limit.

-Thejas

On 12/2/11 5:45 PM, 唐亮 wrote:
> Hi,
>
> The pig codes are as below:
>
> raw_data = load ... as (id:chararray, weight:float);
> group_id = group raw_data by id;
>
> filter_spec_id = filter group_id by group == '1';
> count_spec_id = foreach filter_spec_id generate COUNT(raw_data) as tot;
>
> sample_id = foreach filter_spec_id {
>    order_weight = order raw_data by weight desc;
>    limit_id = limit order_weight (int)count_spec_id.tot/2; -- *It's the
> problem*
>    generate limit_id;
> }
>
> The compiler complain limit should be followed by<INTEGER>.
> So, how can I limit the relation with a variable?
>