|
|
-
How to coalesce fields in Pig?
James Kebinger 2011-08-29, 18:15
My apologies if this is in the docs somewhere, I was unable to find anything, but I might be calling it the wrong name.
I'm doing a full outer join in Pig - as such, one or the other join keys may be null. I'd like to be able to look at 2 columns, and retrieve just the one that is not null. Is that possible?
I tried an expression in generate with is null and the ternary operator, and took a look at DECODE. That might do the trick but wasn't sure if null checking would work, and if other expressions could appear inside the decode.
In my case the fields are integers, so I abused the MAX and TOBAG operators like this MAX(TOBAG(rx_keyed::u2,cx_keyed::u2)) to get the effect I was after, but I would love to know if there's a better way.
Thanks for your time!
-James Kebinger
+
James Kebinger 2011-08-29, 18:15
-
Re: How to coalesce fields in Pig?
Dmitriy Ryaboy 2011-08-29, 18:29
Hi James, I use ternary expressions for this: foreach joined generate ( rel1.x is null ? rel2.x : rel1.x) as x;
On Mon, Aug 29, 2011 at 11:15 AM, James Kebinger <[EMAIL PROTECTED]>wrote:
> My apologies if this is in the docs somewhere, I was unable to find > anything, but I might be calling it the wrong name. > > I'm doing a full outer join in Pig - as such, one or the other join keys > may > be null. I'd like to be able to look at 2 columns, and retrieve just the > one > that is not null. Is that possible? > > I tried an expression in generate with is null and the ternary operator, > and > took a look at DECODE. That might do the trick but wasn't sure if null > checking would work, and if other expressions could appear inside the > decode. > > In my case the fields are integers, so I abused the MAX and TOBAG operators > like this MAX(TOBAG(rx_keyed::u2,cx_keyed::u2)) to get the effect I was > after, but I would love to know if there's a better way. > > Thanks for your time! > > -James Kebinger >
+
Dmitriy Ryaboy 2011-08-29, 18:29
-
Re: How to coalesce fields in Pig?
James Kebinger 2011-08-29, 19:18
Thanks, it must have been the lack of parenthesis that did me in when i tried the ternary expression, or some other typo. I'll use that in the future.
On Mon, Aug 29, 2011 at 2:29 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Hi James, > I use ternary expressions for this: foreach joined generate ( rel1.x is > null > ? rel2.x : rel1.x) as x; > > On Mon, Aug 29, 2011 at 11:15 AM, James Kebinger <[EMAIL PROTECTED] > >wrote: > > > My apologies if this is in the docs somewhere, I was unable to find > > anything, but I might be calling it the wrong name. > > > > I'm doing a full outer join in Pig - as such, one or the other join keys > > may > > be null. I'd like to be able to look at 2 columns, and retrieve just the > > one > > that is not null. Is that possible? > > > > I tried an expression in generate with is null and the ternary operator, > > and > > took a look at DECODE. That might do the trick but wasn't sure if null > > checking would work, and if other expressions could appear inside the > > decode. > > > > In my case the fields are integers, so I abused the MAX and TOBAG > operators > > like this MAX(TOBAG(rx_keyed::u2,cx_keyed::u2)) to get the effect I was > > after, but I would love to know if there's a better way. > > > > Thanks for your time! > > > > -James Kebinger > > >
+
James Kebinger 2011-08-29, 19:18
|
|