I am not sure about what you meant by "null match".
Would this work ?
F1 = load 'largefile' as (field1,..);
F2 = load 'smallfile' as (field2, ..);
-- as the file is very small , use replicated join.
J = join F1 by field1 LEFT, F2 by field1 using 'replicated';
FE = foreach J generate F1.field1,
F2.field1 is null ? F1.field1 : F2.field1,
F2.field1 is null ? F1.field1 : F2.field1
On 8/2/10 7:13 AM, "Kochis, Allan" <[EMAIL PROTECTED]> wrote:
Have a pig question.
I have two HDFS file, a smaller file
and a larger file that has
|..|.. |...|field2|....|field3|.....|field1|...| ..|
I would like to replace field2 and field3 in my larger file when they
are null match on field1.
I am currently doing this by caching my smaller file and using a perl
hash lookup to populate the larger records in a UDF.
Can this be done in a pig join?