Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Join Question


Hi Pradeep,

This is exactly what I'm looking for.  I was going to process this data
inside a UDF anyway, so it's easy for me to pick out what I need.  Many
thanks.

--Jerrell

On Wed, 4 Sep 2013, Pradeep Gollakota wrote:

> I think there's probably some convoluted way to do this. First thing you'll
> have to do is flatten your data.
>
> data1 = A, B
> _____
> X, X1
> X, X2
> Y, Y1
> Y, Y2
> Y, Y3
>
> Then do a  join by "B" onto you second dataset. This should produce the
> following
>
> data2 = data1::A, data1::B, data2::A, data2::B, data2::C (I'm assuming data
> set has exactly 4 columns).
> _______________
> X, X1, X1, 4, 5, 6
> X, X2, X2, 3, 7, 3
>
> Now do a group by data1::A to get
> {X, {(X, X1, X1, 4, 5, 6), (X, X2, X2, 3, 7, 3), ...}}
> {Y, {(Y, Y1, Y1, ...), (Y, Y2, Y2, ...), ...}}
>
> This is as far as I got, I'm not sure if there's a built-in UDF to
> transform that into what you're looking for. I thought maybe BagToTuple,
> but it will return a single tuple with all elements of all tuples in the
> bag. If the above data format supports your use cases, you're done. If not,
> you can write a UDF to transform it into the required format.
>
>
> On Wed, Sep 4, 2013 at 4:39 PM, F. Jerrell Schivers
> <[EMAIL PROTECTED]>wrote:
>
>> Howdy folks,
>>
>> Let's say I have a set of data that looks like this:
>>
>> X, (X1, X2)
>> Y, (Y1, Y2, Y3)
>>
>> So there could be an unknown number of members of each tuple per row.
>>
>> I also have a second set of data that looks like this:
>>
>> X1, 4, 5, 6
>> X2, 3, 7, 3
>>
>> I'd like to join these such that I get:
>>
>> X, (X1, 4, 5, 6), (X2, 3, 7, 3)
>> Y, (Y1, etc), (Y2, etc), (Y3, etc)
>>
>> Is this possible with Pig?
>>
>> Thanks,
>> Jerrell
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB