


how to combine some rows into 1 row in hive
zuohua zhang 20130622, 22:36
I have the following table: f1 f2 f3 f4 f5 a1 a2 P x1 x2 a1 a2 N x3 x4 a1 a3 N x5 x6 a4 a6 P x7 x8
i want to convert to below: f1 f2 pf4 pf5 nf4 nf5 a1 a2 x1 x2 x3 x4 a1 a3 0 0 x5 x6 a4 a6 x7 x8 0 0
basically, when f3="P", I want f4 f5 to be moved to pf4 pf5 when f3="N", i want f4 f5 to be moved to nf4 nf5 when there is no "P" or "N" rows for the record, I want to fill in zeros in the corresponding fields.
how to do it?
Thanks, Zuohua
+
zuohua zhang 20130622, 22:36

Re: how to combine some rows into 1 row in hive
Edward Capriolo 20130622, 23:48
Using hives streaming feature is a nice option for this as it is a fairly natural way to work with entire rows and return multiple columns
select transform a,b,c,d using /bin/pipeprogram as a, b ,c ,d
You an also write a UDTF user defined Table function as well because this can return more then one column.
On Sat, Jun 22, 2013 at 6:36 PM, zuohua zhang <[EMAIL PROTECTED]> wrote:
> I have the following table: > f1 f2 f3 f4 f5 > a1 a2 P x1 x2 > a1 a2 N x3 x4 > a1 a3 N x5 x6 > a4 a6 P x7 x8 > > i want to convert to below: > f1 f2 pf4 pf5 nf4 nf5 > a1 a2 x1 x2 x3 x4 > a1 a3 0 0 x5 x6 > a4 a6 x7 x8 0 0 > > basically, when f3="P", I want f4 f5 to be moved to pf4 pf5 > when f3="N", i want f4 f5 to be moved to nf4 nf5 > when there is no "P" or "N" rows for the record, I want to fill in zeros > in the corresponding fields. > > how to do it? > > Thanks, > Zuohua > >
+
Edward Capriolo 20130622, 23:48

Re: how to combine some rows into 1 row in hive
Michael Malak 20130623, 00:05
Or, the singlelanguage (HiveQL) alternative might be (i.e. I haven't tested it): select f1, f2, if(max(if(f3='P',f4,null)) is null,0,max(if(f3='P',f4,null))) pf4, if(max(if(f3='P',f5,null)) is null,0,max(if(f3='P',f5,null))) pf5, if(max(if(f3='N',f4,null)) is null,0,max(if(f3='N',f4,null))) nf4, if(max(if(f3='N',f5,null)) is null,0,max(if(f3='N',f5,null))) nf5 from mytable group by f1, f2;
________________________________ From: Edward Capriolo <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Saturday, June 22, 2013 7:48 PM Subject: Re: how to combine some rows into 1 row in hive Using hives streaming feature is a nice option for this as it is a fairly natural way to work with entire rows and return multiple columns
select transform a,b,c,d using /bin/pipeprogram as a, b ,c ,d
You an also write a UDTF user defined Table function as well because this can return more then one column.
On Sat, Jun 22, 2013 at 6:36 PM, zuohua zhang <[EMAIL PROTECTED]> wrote:
I have the following table: >f1 f2 f3 f4 f5 >a1 a2 P x1 x2 >a1 a2 N x3 x4 >a1 a3 N x5 x6 >a4 a6 P x7 x8 > > >i want to convert to below: >f1 f2 pf4 pf5 nf4 nf5 >a1 a2 x1 x2 x3 x4 >a1 a3 0 0 x5 x6 >a4 a6 x7 x8 0 0 > > >basically, when f3="P", I want f4 f5 to be moved to pf4 pf5 >when f3="N", i want f4 f5 to be moved to nf4 nf5 >when there is no "P" or "N" rows for the record, I want to fill in zeros in the corresponding fields. > > >how to do it? > > >Thanks, >Zuohua > >
+
Michael Malak 20130623, 00:05

Re: how to combine some rows into 1 row in hive
zuohua zhang 20130623, 00:38
Thanks Michael! That worked without modification! On Sat, Jun 22, 2013 at 5:05 PM, Michael Malak <[EMAIL PROTECTED]>wrote:
> Or, the singlelanguage (HiveQL) alternative might be (i.e. I haven't > tested it): > > select f1, > f2, > if(max(if(f3='P',f4,null)) is null,0,max(if(f3='P',f4,null))) pf4, > if(max(if(f3='P',f5,null)) is null,0,max(if(f3='P',f5,null))) pf5, > if(max(if(f3='N',f4,null)) is null,0,max(if(f3='N',f4,null))) nf4, > if(max(if(f3='N',f5,null)) is null,0,max(if(f3='N',f5,null))) nf5 > from mytable > group by f1, f2; > > > *From:* Edward Capriolo <[EMAIL PROTECTED]> > *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > *Sent:* Saturday, June 22, 2013 7:48 PM > *Subject:* Re: how to combine some rows into 1 row in hive > > Using hives streaming feature is a nice option for this as it is a fairly > natural way to work with entire rows and return multiple columns > > select transform a,b,c,d using /bin/pipeprogram as a, b ,c ,d > > You an also write a UDTF user defined Table function as well because this > can return more then one column. > > On Sat, Jun 22, 2013 at 6:36 PM, zuohua zhang <[EMAIL PROTECTED]> wrote: > > I have the following table: > f1 f2 f3 f4 f5 > a1 a2 P x1 x2 > a1 a2 N x3 x4 > a1 a3 N x5 x6 > a4 a6 P x7 x8 > > i want to convert to below: > f1 f2 pf4 pf5 nf4 nf5 > a1 a2 x1 x2 x3 x4 > a1 a3 0 0 x5 x6 > a4 a6 x7 x8 0 0 > > basically, when f3="P", I want f4 f5 to be moved to pf4 pf5 > when f3="N", i want f4 f5 to be moved to nf4 nf5 > when there is no "P" or "N" rows for the record, I want to fill in zeros > in the corresponding fields. > > how to do it? > > Thanks, > Zuohua > > > > >
+
zuohua zhang 20130623, 00:38

Re: how to combine some rows into 1 row in hive
Stephen Sprague 20130623, 19:20
so its not a Hive issue at all then instead its clever sql tricks. good luck maintaining that and hopefully you got some comments in your code for the next sap who comes along and has the privilege of reading it. :)
On Sat, Jun 22, 2013 at 5:38 PM, zuohua zhang <[EMAIL PROTECTED]> wrote:
> Thanks Michael! That worked without modification! > > > On Sat, Jun 22, 2013 at 5:05 PM, Michael Malak <[EMAIL PROTECTED]>wrote: > >> Or, the singlelanguage (HiveQL) alternative might be (i.e. I haven't >> tested it): >> >> select f1, >> f2, >> if(max(if(f3='P',f4,null)) is null,0,max(if(f3='P',f4,null))) pf4, >> if(max(if(f3='P',f5,null)) is null,0,max(if(f3='P',f5,null))) pf5, >> if(max(if(f3='N',f4,null)) is null,0,max(if(f3='N',f4,null))) nf4, >> if(max(if(f3='N',f5,null)) is null,0,max(if(f3='N',f5,null))) nf5 >> from mytable >> group by f1, f2; >> >> >> *From:* Edward Capriolo <[EMAIL PROTECTED]> >> *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> *Sent:* Saturday, June 22, 2013 7:48 PM >> *Subject:* Re: how to combine some rows into 1 row in hive >> >> Using hives streaming feature is a nice option for this as it is a fairly >> natural way to work with entire rows and return multiple columns >> >> select transform a,b,c,d using /bin/pipeprogram as a, b ,c ,d >> >> You an also write a UDTF user defined Table function as well because this >> can return more then one column. >> >> On Sat, Jun 22, 2013 at 6:36 PM, zuohua zhang <[EMAIL PROTECTED]> wrote: >> >> I have the following table: >> f1 f2 f3 f4 f5 >> a1 a2 P x1 x2 >> a1 a2 N x3 x4 >> a1 a3 N x5 x6 >> a4 a6 P x7 x8 >> >> i want to convert to below: >> f1 f2 pf4 pf5 nf4 nf5 >> a1 a2 x1 x2 x3 x4 >> a1 a3 0 0 x5 x6 >> a4 a6 x7 x8 0 0 >> >> basically, when f3="P", I want f4 f5 to be moved to pf4 pf5 >> when f3="N", i want f4 f5 to be moved to nf4 nf5 >> when there is no "P" or "N" rows for the record, I want to fill in zeros >> in the corresponding fields. >> >> how to do it? >> >> Thanks, >> Zuohua >> >> >> >> >> >
+
Stephen Sprague 20130623, 19:20

Re: how to combine some rows into 1 row in hive
Michael Malak 20130623, 21:40
It's really more of a HiveQL trick than a SQL trick. In SQL, one would express it as a subquery, which is more straightforward and readable, and has the additional advantage that you could eliminate the aggregation and assume P and N are not true simultaneously, which would nicely produce an error were that assumption ever violated. HiveQL's lack of subqueries forces this trick to use aggregation to simulate a merge.
________________________________ From: Stephen Sprague <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Sunday, June 23, 2013 1:20 PM Subject: Re: how to combine some rows into 1 row in hive so its not a Hive issue at all then instead its clever sql tricks. good luck maintaining that and hopefully you got some comments in your code for the next sap who comes along and has the privilege of reading it. :)
On Sat, Jun 22, 2013 at 5:38 PM, zuohua zhang <[EMAIL PROTECTED]> wrote:
Thanks Michael! That worked without modification! > > > >On Sat, Jun 22, 2013 at 5:05 PM, Michael Malak <[EMAIL PROTECTED]> wrote: > >Or, the singlelanguage (HiveQL) alternative might be (i.e. I haven't tested it): >> >>select f1, >> f2, >> if(max(if(f3='P',f4,null)) is null,0,max(if(f3='P',f4,null))) pf4, if(max(if(f3='P',f5,null)) is null,0,max(if(f3='P',f5,null))) pf5, >> if(max(if(f3='N',f4,null)) is null,0,max(if(f3='N',f4,null))) nf4, >> if(max(if(f3='N',f5,null)) is null,0,max(if(f3='N',f5,null))) nf5 >>from mytable >>group by f1, f2; >> >> >> >> From: Edward Capriolo <[EMAIL PROTECTED]> >>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >>Sent: Saturday, June 22, 2013 7:48 PM >>Subject: Re: how to combine some rows into 1 row in hive >> >> >> >>Using hives streaming feature is a nice option for this as it is a fairly natural way to work with entire rows and return multiple columns >> >>select transform a,b,c,d using /bin/pipeprogram as a, b ,c ,d >> >> >> >>You an also write a UDTF user defined Table function as well because this can return more then one column. >> >> >> >>On Sat, Jun 22, 2013 at 6:36 PM, zuohua zhang <[EMAIL PROTECTED]> wrote: >> >>I have the following table: >>>f1 f2 f3 f4 f5 >>>a1 a2 P x1 x2 >>>a1 a2 N x3 x4 >>>a1 a3 N x5 x6 >>>a4 a6 P x7 x8 >>> >>> >>>i want to convert to below: >>>f1 f2 pf4 pf5 nf4 nf5 >>>a1 a2 x1 x2 x3 x4 >>>a1 a3 0 0 x5 x6 >>>a4 a6 x7 x8 0 0 >>> >>> >>>basically, when f3="P", I want f4 f5 to be moved to pf4 pf5 >>>when f3="N", i want f4 f5 to be moved to nf4 nf5 >>>when there is no "P" or "N" rows for the record, I want to fill in zeros in the corresponding fields. >>> >>> >>>how to do it? >>> >>> >>>Thanks, >>>Zuohua
+
Michael Malak 20130623, 21:40

Re: how to combine some rows into 1 row in hive
zuohua zhang 20130622, 23:51
Thanks! Can you be specific with code examples? On Sat, Jun 22, 2013 at 4:48 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> Using hives streaming feature is a nice option for this as it is a fairly > natural way to work with entire rows and return multiple columns > > select transform a,b,c,d using /bin/pipeprogram as a, b ,c ,d > > You an also write a UDTF user defined Table function as well because this > can return more then one column. > > On Sat, Jun 22, 2013 at 6:36 PM, zuohua zhang <[EMAIL PROTECTED]> wrote: > >> I have the following table: >> f1 f2 f3 f4 f5 >> a1 a2 P x1 x2 >> a1 a2 N x3 x4 >> a1 a3 N x5 x6 >> a4 a6 P x7 x8 >> >> i want to convert to below: >> f1 f2 pf4 pf5 nf4 nf5 >> a1 a2 x1 x2 x3 x4 >> a1 a3 0 0 x5 x6 >> a4 a6 x7 x8 0 0 >> >> basically, when f3="P", I want f4 f5 to be moved to pf4 pf5 >> when f3="N", i want f4 f5 to be moved to nf4 nf5 >> when there is no "P" or "N" rows for the record, I want to fill in zeros >> in the corresponding fields. >> >> how to do it? >> >> Thanks, >> Zuohua >> >> >
+
zuohua zhang 20130622, 23:51

