Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Data transformation query

Copy link to this message
Data transformation query
Panshul Whisper 2013-10-10, 11:53

I have a data manipulation query.

I have my data in the following format:

*Date   PLZ   Count*
date1   plz1   count1
date1   plz1   count2
date1   plz1   count3
date1   plz2   count4
date1   plz2   count5
date1   plz3   count6
date1   plz3   count7

date2   plz1   count8
date2   plz1   count9
date2   plz3   count10
date2   plz3   count11

date3   plz1   count12
date3   plz1   count13
date3   plz2   count14
date3   plz2   count15
date3   plz2   count16

date4   plz1   count17
date4   plz2   count18
date4   plz3   count19

With Hive queries I have managed to get the data into the following grouped
and aggregated format:

select plz, TO_DATE(time), sum(totalcount) from power_pad_part where
yearfolder='2013' and monthfolder in ('01')
and eco=0 and TO_DATE(time)>='Date1' and TO_DATE(time)<'Date4'
and plz in ('plz1','plz2','plz3')
group by plz, time

*PLZ   Date   TotalCount*
plz1   date1   TC1
plz1   date2   TC2
plz1   date3   TC3
plz1   date4   TC4

plz2   date1   TC5
plz2   date3   TC6
plz2   date4   TC7

plz3   date1   TC8
plz3   date2   TC9
plz3   date4   TC10

The above data is grouped by plz and date. There is no entry of a plz if it
does not exist for a date.
I used the following query to generate the above data:

But I require the data to be in the following format:

*PLZ   Date1   Date2   Date3   Date4 ..... DateN .....*
plz1   TC1       TC2      TC3       TC4   .....  TC N
plz2   TC5       ------      TC6       TC7   .....  TC N
plz3   TC8       TC9      ------       TC10  ....  TC N
This needs to be generated as 1 column for every date of the year.

Any help to generate this format is welcome. Writing my own mapper and
reduce and calling in Hive as function  is to be my last option.

Thanking You,
Ouch Whisper