Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> updates using Pig


+
Srinivas Surasani 2012-08-28, 04:36
+
TianYi Zhu 2012-08-28, 04:55
Copy link to this message
-
Re: updates using Pig
Hello  TianYi Zhu,

Thanks !! and will get back..

-->by the way, you can sort these 2 files by trade-key then merge them
using a
small script, that's much more faster than using pig.
... Trying out POC on updates in hadoop

Thanks,
Srinivas
On Tue, Aug 28, 2012 at 12:55 AM, TianYi Zhu <
[EMAIL PROTECTED]> wrote:

> Hi Srinivas,
>
> you can write a user defined function for this
>
> feed = union feed1, feed2;
> feed_grouped = group feed by trade-key;
> output = foreach feed_grouped generate
> flatten(your_user_defined_function(feed)) as (trade-key, trade-add-date,
> trade-price)
>
> your_user_defined_function take the one or more records with the same
> trade-key as input, and it should only output the latest tuple of
> (trade-key, trade-add-date, trade-price)
>
>
> by the way, you can sort these 2 files by trade-key then merge them using a
> small script, that's much more faster than using pig.
>
> On Tue, Aug 28, 2012 at 2:36 PM, Srinivas Surasani <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I'm trying to do updates of records in hadoop using Pig ( I know this is
> > not ideal but trying out POC )..
> > data looks like the below:
> >
> > *feed1:*
> > --> here trade key is unique for each order/record
> > --> this is history file
> >
> > trade-key    trade-add-date       trade-price
> > *k1                 05/21/2012            2000*
> > k2                  04/21/2012             3000
> > k3                 03/21/2012            4000
> > k4                 05/21/2012             5000
> >
> > *feed2:  *--> this is the latest/daily feed
> > trade-key    trade-add-date       trade-price
> > k5                06/22/2012             1000
> > k6                 06/22/2012            2000
> > *k1                06/21/2012             3000   ---> we can see here,
> > trade with key "k1" is appeared again..that means order with trade key
> "k1"
> > has some update*
> > *
> > *
> > Now I'm looking for the below output :  ( merging the both files and and
> > looking for common key from both feeds and keeping the latest key record
> in
> > the output file )
> > *k1                06/21/2012             3000*
> > *
> > k2                  04/21/2012             3000
> > k3                 06/21/2012            4000
> > k4                 07/21/2012             5000
> > *k5                06/22/2012             1000
> > k6                 06/22/2012            2000*
> >
> > any help appreciated greatly !!
> > *
> >
> > Regards,
> > Srinivas
> >
>

--
Regards,
Srinivas
[EMAIL PROTECTED]
+
Jonathan Coveney 2012-08-28, 06:47
+
Srini 2012-08-29, 09:47
+
pablomar 2012-08-29, 11:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB