Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Create Table + Join + Max 'String' Date


+
Ambastha, Abhishek 2013-09-10, 05:07
Copy link to this message
-
Re: Create Table + Join + Max 'String' Date
Abhishek,

The cogroup operator and a filter should get you what you want:

--
-- First, filter table1
--
t1_filtered = filter table1 by reporting_dt <= '01APR2013';

--
-- Cogroup, this results in 'bags' you can check
--
grouped = cogroup
            t1_filtered by serial_num,
            table2        by serial_num,
            table3        by serial_num;

--
-- Do outer join and max in same projection
--
maxdate = foreach (filter grouped by not IsEmpty(table2) and IsEmpty(table3)) generate
            group as serial_num,
            MAX(t1_filtered.reporting_dt) as reporting_dt;
Note that I haven't actually ran this yet but it's pretty straightforward pig.

--jacob
@thedatachef

On Sep 10, 2013, at 12:07 AM, "Ambastha, Abhishek" <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I want to write a pig query for the below SQL:
>
>                CREATE TABLE MAXDATE AS
>                SELECT SERIAL_NUM, MAX(REPORTING_DT) AS REPORTING_DT
>                FROM Table_1
>                WHERE REPORTING_DT <= '01APR2013' AND
>                SERIAL_NUM IN (SELECT SERIAL_NUM FROM TABLE_2) AND
>                SERIAL_NUM NOT IN (SELECT SERIAL_NUM FROM TABLE_3)
>                GROUP BY SERIAL_NUM;
>
> REPORTING_DT is a string but I want Maximum Date. Probably, I could use CustomFormatToISO for Date conversion and get the Maximum Value.
>
> Join is on Table_1, Table_2 and Table_3.
>
> Please suggest how to do this.
>
> Regards,
> Abhishek
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB