Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> join with sum


Copy link to this message
-
Re: join with sum
I still believe you want the COGROUP operator:
A = LOAD '$A' AS (field_1:int, field_2:int);
DUMP A;

(1,11)
(2,15)

B = LOAD '$B' AS (field_1:int, field_2:int);
DUMP B;

(1,10)
(4,11)
(5,10)

grouped = COGROUP A BY field_1, B BY field_1;
DUMP grouped;

(1,{(1,11)},{(1,10)})
(2,{(2,15)},{})
(4,{},{(4,11)})
(5,{},{(5,10)})

summed = FOREACH grouped {
           left_sum     = SUM(A.field_2);
           right_sum    = SUM(B.field_2);
           summed_fields = (left_sum IS NOT NULL ? left_sum : 0) +
(right_sum IS NOT NULL ? right_sum : 0);
           GENERATE
             FLATTEN(group) AS key,
             summed_fields
           ;
         };
DUMP summed;

(1,21L)
(2,15L)
(4,11L)
(5,10L)
It's messy because you need to account for when the cogroup misses. I
don't think it's exactly what your looking for, but then again it's not
entirely clear. This should at least get you on the right track.

--jacob
@thedatachef

On Sun, 2011-01-30 at 16:43 +0200, Cam Bazz wrote:
> Hello,
>
> I did look over to the GROUP operator, but could not find what i was
> looking for.
>
> I need to sum the values given by keys. This is because I want to do
> incremental processing.
>
> best.
>
> On Sun, Jan 30, 2011 at 4:01 PM, Jacob Perkins
> <[EMAIL PROTECTED]> wrote:
> > See the COGROUP operator
> >
> > --jacob
> > @thedatachef
> >
> > Sent from my iPhone
> >
> > On Jan 30, 2011, at 5:10 AM, Cam Bazz <[EMAIL PROTECTED]> wrote:
> >
> >> Hello,
> >>
> >> Lets say I have two tables like
> >>
> >> A:
> >> 1,11
> >> 2,15
> >>
> >> and
> >>
> >> B:
> >> 1,10
> >> 4,11
> >> 5,10
> >>
> >> and joinin them
> >>
> >> J = JOIN A by $0 FULL, B by $0
> >>
> >> I get J:
> >>
> >> 1,11,1,21
> >> 2,16,,
> >> ,,4,11
> >> ,,5,10
> >>
> >> which is a full outer join:
> >>
> >> what I need is:
> >>
> >> 1,31
> >> 2,16
> >> 4,11
> >> 5,10
> >>
> >> so I want to join by key, and then sum some fields.
> >>
> >> the regular sum operator is for group or agregate summing and can not do
> >> it.
> >>
> >> any ideas / recomendation / help greatly appreciated.
> >>
> >> best.
> >> c.b.
> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB