Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> join with sum


Copy link to this message
-
Re: join with sum
I still believe you want the COGROUP operator:
A = LOAD '$A' AS (field_1:int, field_2:int);
DUMP A;

(1,11)
(2,15)

B = LOAD '$B' AS (field_1:int, field_2:int);
DUMP B;

(1,10)
(4,11)
(5,10)

grouped = COGROUP A BY field_1, B BY field_1;
DUMP grouped;

(1,{(1,11)},{(1,10)})
(2,{(2,15)},{})
(4,{},{(4,11)})
(5,{},{(5,10)})

summed = FOREACH grouped {
           left_sum     = SUM(A.field_2);
           right_sum    = SUM(B.field_2);
           summed_fields = (left_sum IS NOT NULL ? left_sum : 0) +
(right_sum IS NOT NULL ? right_sum : 0);
           GENERATE
             FLATTEN(group) AS key,
             summed_fields
           ;
         };
DUMP summed;

(1,21L)
(2,15L)
(4,11L)
(5,10L)
It's messy because you need to account for when the cogroup misses. I
don't think it's exactly what your looking for, but then again it's not
entirely clear. This should at least get you on the right track.

--jacob
@thedatachef

On Sun, 2011-01-30 at 16:43 +0200, Cam Bazz wrote:
> Hello,
>
> I did look over to the GROUP operator, but could not find what i was
> looking for.
>
> I need to sum the values given by keys. This is because I want to do
> incremental processing.
>
> best.
>
> On Sun, Jan 30, 2011 at 4:01 PM, Jacob Perkins
> <[EMAIL PROTECTED]> wrote:
> > See the COGROUP operator
> >
> > --jacob
> > @thedatachef
> >
> > Sent from my iPhone
> >
> > On Jan 30, 2011, at 5:10 AM, Cam Bazz <[EMAIL PROTECTED]> wrote:
> >
> >> Hello,
> >>
> >> Lets say I have two tables like
> >>
> >> A:
> >> 1,11
> >> 2,15
> >>
> >> and
> >>
> >> B:
> >> 1,10
> >> 4,11
> >> 5,10
> >>
> >> and joinin them
> >>
> >> J = JOIN A by $0 FULL, B by $0
> >>
> >> I get J:
> >>
> >> 1,11,1,21
> >> 2,16,,
> >> ,,4,11
> >> ,,5,10
> >>
> >> which is a full outer join:
> >>
> >> what I need is:
> >>
> >> 1,31
> >> 2,16
> >> 4,11
> >> 5,10
> >>
> >> so I want to join by key, and then sum some fields.
> >>
> >> the regular sum operator is for group or agregate summing and can not do
> >> it.
> >>
> >> any ideas / recomendation / help greatly appreciated.
> >>
> >> best.
> >> c.b.
> >