Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Join,Filter on the same line and optimization


Copy link to this message
-
Join,Filter on the same line and optimization
Is it possible to say something like
F = JOIN A BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), B BY
(FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT) AND FILTER A BY FORM_ID == 0;

Also, how far does pig go in optimizing the job if I do specify the line
above for instance as:

F = JOIN A BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), B BY
(FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT)

G = FILTER F BY FORM_ID == 0;

Would pig run only one reduce job or multiple in the case above?