Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Create Table + Join + Max 'String' Date


Copy link to this message
-
Re: Create Table + Join + Max 'String' Date
Jacob Perkins 2013-09-10, 11:57
Abhishek,

The cogroup operator and a filter should get you what you want:

--
-- First, filter table1
--
t1_filtered = filter table1 by reporting_dt <= '01APR2013';

--
-- Cogroup, this results in 'bags' you can check
--
grouped = cogroup
            t1_filtered by serial_num,
            table2        by serial_num,
            table3        by serial_num;

--
-- Do outer join and max in same projection
--
maxdate = foreach (filter grouped by not IsEmpty(table2) and IsEmpty(table3)) generate
            group as serial_num,
            MAX(t1_filtered.reporting_dt) as reporting_dt;
Note that I haven't actually ran this yet but it's pretty straightforward pig.

--jacob
@thedatachef

On Sep 10, 2013, at 12:07 AM, "Ambastha, Abhishek" <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I want to write a pig query for the below SQL:
>
>                CREATE TABLE MAXDATE AS
>                SELECT SERIAL_NUM, MAX(REPORTING_DT) AS REPORTING_DT
>                FROM Table_1
>                WHERE REPORTING_DT <= '01APR2013' AND
>                SERIAL_NUM IN (SELECT SERIAL_NUM FROM TABLE_2) AND
>                SERIAL_NUM NOT IN (SELECT SERIAL_NUM FROM TABLE_3)
>                GROUP BY SERIAL_NUM;
>
> REPORTING_DT is a string but I want Maximum Date. Probably, I could use CustomFormatToISO for Date conversion and get the Maximum Value.
>
> Join is on Table_1, Table_2 and Table_3.
>
> Please suggest how to do this.
>
> Regards,
> Abhishek
>