Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Dynamic columns in Hive Table - Best Design for the problem


+
Raj Hadoop 2013-12-28, 17:56
+
Matt Tucker 2013-12-28, 19:46
+
Raj Hadoop 2013-12-29, 15:17
Copy link to this message
-
Re: Dynamic columns in Hive Table - Best Design for the problem
Basically when you have data like this, it is best to treat the all the
columns as a single string and write a tool to break the entire row apart.
You could use a UDF or a UDTF actually. Look at something like parseUrl...

select myRow(row) as id string, events List<String> ....

A UDTF allows you to produce columns and or rows.

The other way is you write a UDF that returns a struct.
On Sun, Dec 29, 2013 at 10:17 AM, Raj Hadoop <[EMAIL PROTECTED]> wrote:

> Matt,
>
> Thanks for the suggestion. Can you please provide more details on what
> type of UDAF should I develop ? I have never worked on a UDAF earlier. But
> would like to explore it. Any tips on how to proceed.
>
> Thanks,
> Raj
>
>
>   On Saturday, December 28, 2013 2:47 PM, Matt Tucker <
> [EMAIL PROTECTED]> wrote:
>  It looks like you're essentially doing a pivot function. Your best bet
> is to write a custom UDAF or look at the windowing functions available in
> recent releases.
> Matt
> On Dec 28, 2013 12:57 PM, "Raj Hadoop" <[EMAIL PROTECTED]> wrote:
>
> Dear All Hive Group Members,
>
> I have the following requirement.
>
> Input:
>
> Ticket#|Date of booking|Price
> 100|20-Oct-13|54
> 100|21-Oct-13|56
> 100|22-Oct-13|54
> 100|23-Oct-13|55
> 100|27-Oct-13|60
> 100|30-Oct-13|47
>
> 101|10-Sep-13|12
> 101|13-Sep-13|14
> 101|20-Oct-13|6
>
>
> Expected Output:
>
> Ticket#|Initial|Delta1|Delta2|Delta3|Delta4|Delta5
> 100|20-Oct-13,54|
> 21-Oct-13,2|22-Oct-13,0|23-Oct-3,1|27-Oct-13,6|30-Oct-13,-7
> 101|10-Sep-13,12|13-Sep-13,2|20-Oct-13,-6|||
>
> The number of columns in the expected output is a dynamic list depending
> on the number of price changes of a ticket.
>
> 1) What is the best design to solve the above problem in Hive?
> 2) How do we implement it?
>
> Please advise.
>
> Regards,
> Raj
>
>
>
>
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB