Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Data transformation query


Copy link to this message
-
Re: Data transformation query
Thank you Selva

for the response. But the data is too huge to be handled by Excel, thus I
need to do it with hive.
On Thu, Oct 10, 2013 at 2:31 PM, Selvamohan Neethiraj <
[EMAIL PROTECTED]> wrote:

> If it is not so much data, you can use Excel's PivotTable to solve this
> specific requirement:
>
> 1. Select the date and plz columns (w/o header) and create pivotTable on a
> new WorkSheet
> 2. Drag the column name: 'Date' from the 'Pivot Table' Builder  to the
> 'Column Label' section.
> 3. Drag the column name: 'PLZ' from the 'Pivot Table' Builder  to the
> 'Row Labels'
> 4. Drag the column name: 'Date' from the 'Pivot Table' Builder  to the
> 'Values' and ensure the selected summary function is 'Count'
>
> Hope, this helps,
>
> Thanks,
> Selva-
>
>
> On Oct 10, 2013, at 7:53 AM, Panshul Whisper <[EMAIL PROTECTED]>
> wrote:
>
> Hello,
>
> I have a data manipulation query.
>
> I have my data in the following format:
>
> *Date   PLZ   Count*
> date1   plz1   count1
> date1   plz1   count2
> date1   plz1   count3
> date1   plz2   count4
> date1   plz2   count5
> date1   plz3   count6
> date1   plz3   count7
>
> date2   plz1   count8
> date2   plz1   count9
> date2   plz3   count10
> date2   plz3   count11
>
> date3   plz1   count12
> date3   plz1   count13
> date3   plz2   count14
> date3   plz2   count15
> date3   plz2   count16
>
> date4   plz1   count17
> date4   plz2   count18
> date4   plz3   count19
>
> With Hive queries I have managed to get the data into the following
> grouped and aggregated format:
>
> select plz, TO_DATE(time), sum(totalcount) from power_pad_part where
> yearfolder='2013' and monthfolder in ('01')
> and eco=0 and TO_DATE(time)>='Date1' and TO_DATE(time)<'Date4'
> and plz in ('plz1','plz2','plz3')
> group by plz, time
>
> *PLZ   Date   TotalCount*
> plz1   date1   TC1
> plz1   date2   TC2
> plz1   date3   TC3
> plz1   date4   TC4
>
> plz2   date1   TC5
> plz2   date3   TC6
> plz2   date4   TC7
>
> plz3   date1   TC8
> plz3   date2   TC9
> plz3   date4   TC10
>
> The above data is grouped by plz and date. There is no entry of a plz if
> it does not exist for a date.
> I used the following query to generate the above data:
>
>
>
> But I require the data to be in the following format:
>
> *PLZ   Date1   Date2   Date3   Date4 ..... DateN .....*
> plz1   TC1       TC2      TC3       TC4   .....  TC N
> plz2   TC5       ------      TC6       TC7   .....  TC N
> plz3   TC8       TC9      ------       TC10  ....  TC N
>
>
> This needs to be generated as 1 column for every date of the year.
>
> Any help to generate this format is welcome. Writing my own mapper and
> reduce and calling in Hive as function  is to be my last option.
>
> Thanking You,
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>
>
>
--
Regards,
Ouch Whisper
010101010101
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB