Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Mapping a reference table


Copy link to this message
-
Re: Mapping a reference table
The mapping will have to be done in a udf. The udf would return a bag of
tuples.

Pig query would look like this -

mapped_tuples = foreach input generate FLATTEN mapudf(bagcol);

In pig 0.8 (to be released in few days), you can also write your udfs in
python  - http://wiki.apache.org/pig/UDFsUsingScriptingLanguages

Thanks,
Thejas

On 11/29/10 2:51 PM, "Matt Tanquary" <[EMAIL PROTECTED]> wrote:

> I have this problem which I solved easily with M/R but I'm trying to solve
> through PIG instead:
>
> Given the following bags, perform a lookup in a special table to retrieve 4
> additional variations of the data:
> {(10), (15)}
> {(5}
> {(5), (10), (15)}
>
> Lookup table:
> 5 15 30 8 2
> 10 125 135 13 3
> 15 4 90 10 1
>
> Note the lookup table has 5 columns, 1 for each level. The bags are given as
> level 1 data, so you will find that value in the first column of the lookup.
> Now, for the fun part: Need to create new bags for each level based on the
> given level 1 data. For instance:
>
> {(10), (15)} IN would yield the additional bags:
> {(125), (4)}
> {(135), (90)}
> {(13), (10)}
> {(3), (1)}
>
> additionally:
> {(5)} IN would yield:
> {(15)}
> {(30)}
> {(8)}
> {(2)}
>
> So, this is the final big picture:
> Records IN:
> {(10), (15)}
> {(5)}
>
> Records OUT:
> {(10), (15)}
> {(125), (4)}
> {(135), (90)}
> {(13), (10)}
> {(3), (1)}
> {(5)}
> {(15)}
> {(30)}
> {(8)}
> {(2)}
>
> The cases where there is only one item in a bag is simple, but when more
> than one are introduced I am unable to determine an efficient way to tackle
> this. As a side note, I will probably only need to process up to 3 items in
> a bag in this manner.
>
> I hope this makes sense. Any assistance is much appreciated.
> Regards,
> -M@
>