Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Simulating an auto-incrementing column


Copy link to this message
-
Simulating an auto-incrementing column
Hi,

I have a problem and I hope someone has an idea on how to solve it.

My dataset consists of just very simple key-value pairs of strings
coming from PostgreSQL using Sqoop.

1) I need to count how often a key occurs -> Easy
2) I need to count how often a key-value pair occurs -> Easy

I need to output this data to PostgreSQL again, into two tables:

a) "keys" with the columns: id, key_name, count
b) "values" with the columns: id, key_id, value_name, count

Now the ids I'm referring to don't exist yet and I'm looking into
solutions to generate them. They have to be integers/longs but they
don't have to be in any order/pattern. I'm not concerned about
performance either as this query will be run monthly at most.

Do you have any idea how I could introduce this new column into the
output of query 1)? I could easily introduce it into 2) with a join
then. I thought about using a custom reducer script but apart from the
fact that I've never done it so far it would require that there is
only one reducer so that I can simulate an auto-incrementer. My
current best idea is to write a regular MR job that processes the Hive
output but I'd love to do everything in Hive if possible.

I might very well approach this problem completely wrong so don't
hesitate to propose a better solution or bash me for my poor
understanding of Hive :)

Thanks for any input and help.

Cheers,
Lars
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB