Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> updates using Pig


+
Srinivas Surasani 2012-08-28, 04:36
+
TianYi Zhu 2012-08-28, 04:55
+
Srini 2012-08-28, 05:37
+
Jonathan Coveney 2012-08-28, 06:47
+
Srini 2012-08-29, 09:47
Copy link to this message
-
Re: updates using Pig
now I can see it :-)
very beautiful place
On Wed, Aug 29, 2012 at 5:47 AM, Srini <[EMAIL PROTECTED]> wrote:

> Thank-you very much  Jonathan...
>
> On Tue, Aug 28, 2012 at 2:47 AM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > I would do this with a cogroup. Whether or not you need a UDF depends on
> > whether or not a key can appear more than once in a file.
> >
> > trade-key    trade-add-date       trade-price
> >
> > feed_group = cogroup feed1 by trade-key, feed2 by trade-key;
> > feed_proj = foreach feed_group generate FLATTEN( IsEmpty(feed2) ? feed1 ?
> > feed2 );
> >
> > and there you go (you may need to tweak the flatten to make it work).
> >
> > It'd be slightly more complicated if you had multiple key/date pairs.
> >
> > 2012/8/27 Srini <[EMAIL PROTECTED]>
> >
> > > Hello  TianYi Zhu,
> > >
> > > Thanks !! and will get back..
> > >
> > > -->by the way, you can sort these 2 files by trade-key then merge them
> > > using a
> > > small script, that's much more faster than using pig.
> > > ... Trying out POC on updates in hadoop
> > >
> > > Thanks,
> > > Srinivas
> > > On Tue, Aug 28, 2012 at 12:55 AM, TianYi Zhu <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Hi Srinivas,
> > > >
> > > > you can write a user defined function for this
> > > >
> > > > feed = union feed1, feed2;
> > > > feed_grouped = group feed by trade-key;
> > > > output = foreach feed_grouped generate
> > > > flatten(your_user_defined_function(feed)) as (trade-key,
> > trade-add-date,
> > > > trade-price)
> > > >
> > > > your_user_defined_function take the one or more records with the same
> > > > trade-key as input, and it should only output the latest tuple of
> > > > (trade-key, trade-add-date, trade-price)
> > > >
> > > >
> > > > by the way, you can sort these 2 files by trade-key then merge them
> > > using a
> > > > small script, that's much more faster than using pig.
> > > >
> > > > On Tue, Aug 28, 2012 at 2:36 PM, Srinivas Surasani <
> > > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm trying to do updates of records in hadoop using Pig ( I know
> this
> > > is
> > > > > not ideal but trying out POC )..
> > > > > data looks like the below:
> > > > >
> > > > > *feed1:*
> > > > > --> here trade key is unique for each order/record
> > > > > --> this is history file
> > > > >
> > > > > trade-key    trade-add-date       trade-price
> > > > > *k1                 05/21/2012            2000*
> > > > > k2                  04/21/2012             3000
> > > > > k3                 03/21/2012            4000
> > > > > k4                 05/21/2012             5000
> > > > >
> > > > > *feed2:  *--> this is the latest/daily feed
> > > > > trade-key    trade-add-date       trade-price
> > > > > k5                06/22/2012             1000
> > > > > k6                 06/22/2012            2000
> > > > > *k1                06/21/2012             3000   ---> we can see
> > here,
> > > > > trade with key "k1" is appeared again..that means order with trade
> > key
> > > > "k1"
> > > > > has some update*
> > > > > *
> > > > > *
> > > > > Now I'm looking for the below output :  ( merging the both files
> and
> > > and
> > > > > looking for common key from both feeds and keeping the latest key
> > > record
> > > > in
> > > > > the output file )
> > > > > *k1                06/21/2012             3000*
> > > > > *
> > > > > k2                  04/21/2012             3000
> > > > > k3                 06/21/2012            4000
> > > > > k4                 07/21/2012             5000
> > > > > *k5                06/22/2012             1000
> > > > > k6                 06/22/2012            2000*
> > > > >
> > > > > any help appreciated greatly !!
> > > > > *
> > > > >
> > > > > Regards,
> > > > > Srinivas
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Srinivas
> > > [EMAIL PROTECTED]
> > >
> >
>
>
>
> --
> Regards,
> Srinivas
> [EMAIL PROTECTED]
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB