Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> simple pig logic


Copy link to this message
-
Re: simple pig logic
If I understood your question correctly, given the following input:

main_data.txt
{"id": "foo", "some_field": 12354, "score": 0}
{"id": "foobar", "some_field": 12354, "score": 0}
{"id": "baz", "some_field": 12345, "score": 0}

score_data.txt
{"id": "foo", "score": 1}
{"id": "foobar", "score": 20}

you want the following output

{"id": "foo", "some_field": 12354, "score": 1}
{"id": "foobar", "some_field": 12354, "score": 20}
{"id": "baz", "some_field": 12345, "score": 0}

If that is correct, you can do a LEFT OUTER join on the two relations.

main = LOAD 'main_data.txt' as (id: chararray, some_field: int, score: int);
scores = LOAD 'score_data.txt' as (id: chararray, score: int);
both = JOIN main by id LEFT, scores by id;
final = FOREACH both GENERATE main::id as id, main::some_field as
some_field, (scores::score == null ? main::score : scores::score) as
score;
dump final;

After the join, check to see if the scores::score is null… if it is, choose
the default of main::score… if not choose scores::score.

Hope this helps!