Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> question on pig join


Copy link to this message
-
Re: question on pig join
I am not sure about what you meant by "null match".

Would this work  ?

F1 = load 'largefile' as (field1,..);
F2 = load 'smallfile' as (field2, ..);

-- as the file is very small , use replicated join.
J = join F1 by field1 LEFT, F2 by field1 using 'replicated';
FE = foreach J generate F1.field1,
    F2.field1 is null ? F1.field1 : F2.field1,
    F2.field1 is null ? F1.field1 : F2.field1
    ;

On 8/2/10 7:13 AM, "Kochis, Allan" <[EMAIL PROTECTED]> wrote:

Hi,
Have a pig question.
I have two HDFS file, a smaller file
that has
|field1|field2|field3|
and a larger file that has

|..|.. |...|field2|....|field3|.....|field1|...| ..|

I would like to replace field2 and field3 in my larger file when they
are null match on field1.

I am currently doing this by caching my smaller file and using a perl
hash lookup to populate the larger records in a UDF.

Can this be done in a pig join?
Thanks,

Allan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB