Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - question on pig join


Copy link to this message
-
Re: question on pig join
Thejas M Nair 2010-08-02, 19:35
I am not sure about what you meant by "null match".

Would this work  ?

F1 = load 'largefile' as (field1,..);
F2 = load 'smallfile' as (field2, ..);

-- as the file is very small , use replicated join.
J = join F1 by field1 LEFT, F2 by field1 using 'replicated';
FE = foreach J generate F1.field1,
    F2.field1 is null ? F1.field1 : F2.field1,
    F2.field1 is null ? F1.field1 : F2.field1
    ;

On 8/2/10 7:13 AM, "Kochis, Allan" <[EMAIL PROTECTED]> wrote:

Hi,
Have a pig question.
I have two HDFS file, a smaller file
that has
|field1|field2|field3|
and a larger file that has

|..|.. |...|field2|....|field3|.....|field1|...| ..|

I would like to replace field2 and field3 in my larger file when they
are null match on field1.

I am currently doing this by caching my smaller file and using a perl
hash lookup to populate the larger records in a UDF.

Can this be done in a pig join?
Thanks,

Allan