Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Data transformation query


Copy link to this message
-
Re: Data transformation query
Panshul Whisper 2013-10-10, 12:42
Thank you Selva

for the response. But the data is too huge to be handled by Excel, thus I
need to do it with hive.
On Thu, Oct 10, 2013 at 2:31 PM, Selvamohan Neethiraj <
[EMAIL PROTECTED]> wrote:

> If it is not so much data, you can use Excel's PivotTable to solve this
> specific requirement:
>
> 1. Select the date and plz columns (w/o header) and create pivotTable on a
> new WorkSheet
> 2. Drag the column name: 'Date' from the 'Pivot Table' Builder  to the
> 'Column Label' section.
> 3. Drag the column name: 'PLZ' from the 'Pivot Table' Builder  to the
> 'Row Labels'
> 4. Drag the column name: 'Date' from the 'Pivot Table' Builder  to the
> 'Values' and ensure the selected summary function is 'Count'
>
> Hope, this helps,
>
> Thanks,
> Selva-
>
>
> On Oct 10, 2013, at 7:53 AM, Panshul Whisper <[EMAIL PROTECTED]>
> wrote:
>
> Hello,
>
> I have a data manipulation query.
>
> I have my data in the following format:
>
> *Date   PLZ   Count*
> date1   plz1   count1
> date1   plz1   count2
> date1   plz1   count3
> date1   plz2   count4
> date1   plz2   count5
> date1   plz3   count6
> date1   plz3   count7
>
> date2   plz1   count8
> date2   plz1   count9
> date2   plz3   count10
> date2   plz3   count11
>
> date3   plz1   count12
> date3   plz1   count13
> date3   plz2   count14
> date3   plz2   count15
> date3   plz2   count16
>
> date4   plz1   count17
> date4   plz2   count18
> date4   plz3   count19
>
> With Hive queries I have managed to get the data into the following
> grouped and aggregated format:
>
> select plz, TO_DATE(time), sum(totalcount) from power_pad_part where
> yearfolder='2013' and monthfolder in ('01')
> and eco=0 and TO_DATE(time)>='Date1' and TO_DATE(time)<'Date4'
> and plz in ('plz1','plz2','plz3')
> group by plz, time
>
> *PLZ   Date   TotalCount*
> plz1   date1   TC1
> plz1   date2   TC2
> plz1   date3   TC3
> plz1   date4   TC4
>
> plz2   date1   TC5
> plz2   date3   TC6
> plz2   date4   TC7
>
> plz3   date1   TC8
> plz3   date2   TC9
> plz3   date4   TC10
>
> The above data is grouped by plz and date. There is no entry of a plz if
> it does not exist for a date.
> I used the following query to generate the above data:
>
>
>
> But I require the data to be in the following format:
>
> *PLZ   Date1   Date2   Date3   Date4 ..... DateN .....*
> plz1   TC1       TC2      TC3       TC4   .....  TC N
> plz2   TC5       ------      TC6       TC7   .....  TC N
> plz3   TC8       TC9      ------       TC10  ....  TC N
>
>
> This needs to be generated as 1 column for every date of the year.
>
> Any help to generate this format is welcome. Writing my own mapper and
> reduce and calling in Hive as function  is to be my last option.
>
> Thanking You,
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>
>
>
--
Regards,
Ouch Whisper
010101010101