Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Can I pass an entire relation to a Pig UDF?


+
Arun A K 2011-04-27, 02:07
Copy link to this message
-
Re: Can I pass an entire relation to a Pig UDF?
The question is, do you need the entire relation all at once to assign a
rank? If so then map-reduce may not be the answer. If not, why not just
run the UDF on each tuple of the relation, one at a time, with a
projection?

If you need some global information, such as the max and min score, then
you might look at the MAX and MIN operations. They do require a GROUP
ALL but are algebraic so it's not actually going to bring all the data
to one machine as it otherwise would.

--jacob
@thedatachef
On Tue, 2011-04-26 at 19:07 -0700, Arun A K wrote:
> Hi
>
> I have the following input relation:
> Name Score
> Jack    25
> Jimmy   30
> Sam     20
> Hick    35
> Tampa   22
>
> My goal is to rank the tuples by score.
>
> Pig script:
>
> sample_data = LOAD 'sample.txt' USING PigStorage()   AS (name:chararray,
> score:int);
> sample_data_group = GROUP sample_data BY score;
> sample_data_count = FOREACH sample_data_group GENERATE group AS score,
> COUNT(sample_data.name) AS countVal;
> sample_data_order = ORDER sample_data_count BY score DESC;
> sample_data_group_all = GROUP sample_data_order all;
> sample_data_project = FOREACH sample_data_group_all GENERATE
> FLATTEN(myUDF.Rank(sample_data_order));
> dump sample_data_project;
>
> Can someone please point me to a UDF example where a relation is read in and
> iterated over all its tuples? I plan to iterate over the tuples and assign a
> rank to each of them based on the score value.
>
> Is there any other way to generate rank?
>
> Thanks much.
>
> Arun
+
Arun A K 2011-04-27, 02:43
+
Jacob Perkins 2011-04-27, 02:54
+
Arun A K 2011-04-27, 03:49
+
Dexin Wang 2011-04-27, 04:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB