Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - updates using Pig


+
Srinivas Surasani 2012-08-28, 04:36
+
TianYi Zhu 2012-08-28, 04:55
Copy link to this message
-
Re: updates using Pig
Srini 2012-08-28, 05:37
Hello  TianYi Zhu,

Thanks !! and will get back..

-->by the way, you can sort these 2 files by trade-key then merge them
using a
small script, that's much more faster than using pig.
... Trying out POC on updates in hadoop

Thanks,
Srinivas
On Tue, Aug 28, 2012 at 12:55 AM, TianYi Zhu <
[EMAIL PROTECTED]> wrote:

> Hi Srinivas,
>
> you can write a user defined function for this
>
> feed = union feed1, feed2;
> feed_grouped = group feed by trade-key;
> output = foreach feed_grouped generate
> flatten(your_user_defined_function(feed)) as (trade-key, trade-add-date,
> trade-price)
>
> your_user_defined_function take the one or more records with the same
> trade-key as input, and it should only output the latest tuple of
> (trade-key, trade-add-date, trade-price)
>
>
> by the way, you can sort these 2 files by trade-key then merge them using a
> small script, that's much more faster than using pig.
>
> On Tue, Aug 28, 2012 at 2:36 PM, Srinivas Surasani <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I'm trying to do updates of records in hadoop using Pig ( I know this is
> > not ideal but trying out POC )..
> > data looks like the below:
> >
> > *feed1:*
> > --> here trade key is unique for each order/record
> > --> this is history file
> >
> > trade-key    trade-add-date       trade-price
> > *k1                 05/21/2012            2000*
> > k2                  04/21/2012             3000
> > k3                 03/21/2012            4000
> > k4                 05/21/2012             5000
> >
> > *feed2:  *--> this is the latest/daily feed
> > trade-key    trade-add-date       trade-price
> > k5                06/22/2012             1000
> > k6                 06/22/2012            2000
> > *k1                06/21/2012             3000   ---> we can see here,
> > trade with key "k1" is appeared again..that means order with trade key
> "k1"
> > has some update*
> > *
> > *
> > Now I'm looking for the below output :  ( merging the both files and and
> > looking for common key from both feeds and keeping the latest key record
> in
> > the output file )
> > *k1                06/21/2012             3000*
> > *
> > k2                  04/21/2012             3000
> > k3                 06/21/2012            4000
> > k4                 07/21/2012             5000
> > *k5                06/22/2012             1000
> > k6                 06/22/2012            2000*
> >
> > any help appreciated greatly !!
> > *
> >
> > Regards,
> > Srinivas
> >
>

--
Regards,
Srinivas
[EMAIL PROTECTED]
+
Jonathan Coveney 2012-08-28, 06:47
+
Srini 2012-08-29, 09:47
+
pablomar 2012-08-29, 11:04