|
|
-
Join Multiple Relations by Different FieldsThomas Bach 2012-12-14, 10:11
Hi,
Say I have three files `data1`, `data2` and `assocs`: $ cat data1 key1,foo key2,bar $ cat data2 key3,braz key4,froz $ cat assoc key1,key3 key2,key4 I load these files via $ pig -b -p debug=WARN -x local Warning: $HADOOP_HOME is deprecated. Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12 Logging error messages to: /home/vince/tmp/pig_1355407390166.log Connecting to hadoop file system at: file:/// grunt> data1 = load 'data1' as (key: chararray, val: chararray); grunt> data2 = load 'data2' as (key: chararray, val: chararray); grunt> assoc = load 'assoc' as (key1: chararray, key2: chararray); What I want is a relation that looks like: (foo, braz) (bar, froz) That is data1_val, data1_key <-> assoc_key1, assoc_key2 <-> data2_key, data2_val So my first assumption was to do a join on data1, assoc first and then on the resulting relation with data2. Anyways, doing a A = join data1 by key, assoc by key1; dump A; Doesn't yield any results. Is this a bug or am I doing something conceptually wrong? Regards, Thomas Bach. +
Thomas Bach 2012-12-14, 12:17
+
Jonathan Coveney 2012-12-14, 18:54
|