Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Mapping a reference table


+
Matt Tanquary 2010-11-29, 22:51
Copy link to this message
-
Re: Mapping a reference table
The mapping will have to be done in a udf. The udf would return a bag of
tuples.

Pig query would look like this -

mapped_tuples = foreach input generate FLATTEN mapudf(bagcol);

In pig 0.8 (to be released in few days), you can also write your udfs in
python  - http://wiki.apache.org/pig/UDFsUsingScriptingLanguages

Thanks,
Thejas

On 11/29/10 2:51 PM, "Matt Tanquary" <[EMAIL PROTECTED]> wrote:

> I have this problem which I solved easily with M/R but I'm trying to solve
> through PIG instead:
>
> Given the following bags, perform a lookup in a special table to retrieve 4
> additional variations of the data:
> {(10), (15)}
> {(5}
> {(5), (10), (15)}
>
> Lookup table:
> 5 15 30 8 2
> 10 125 135 13 3
> 15 4 90 10 1
>
> Note the lookup table has 5 columns, 1 for each level. The bags are given as
> level 1 data, so you will find that value in the first column of the lookup.
> Now, for the fun part: Need to create new bags for each level based on the
> given level 1 data. For instance:
>
> {(10), (15)} IN would yield the additional bags:
> {(125), (4)}
> {(135), (90)}
> {(13), (10)}
> {(3), (1)}
>
> additionally:
> {(5)} IN would yield:
> {(15)}
> {(30)}
> {(8)}
> {(2)}
>
> So, this is the final big picture:
> Records IN:
> {(10), (15)}
> {(5)}
>
> Records OUT:
> {(10), (15)}
> {(125), (4)}
> {(135), (90)}
> {(13), (10)}
> {(3), (1)}
> {(5)}
> {(15)}
> {(30)}
> {(8)}
> {(2)}
>
> The cases where there is only one item in a bag is simple, but when more
> than one are introduced I am unable to determine an efficient way to tackle
> this. As a side note, I will probably only need to process up to 3 items in
> a bag in this manner.
>
> I hope this makes sense. Any assistance is much appreciated.
> Regards,
> -M@
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB