Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Bucketing external tables

Copy link to this message
Bucketing external tables

We run M/R jobs to parse and process large and highly complex xml files
into AVRO files. Then we build external Hive tables on top the parsed Avro
files. The hive tables are partitioned by day; but they are still huge
partitions and joins do not perform that well. So I would like to try
out creating buckets on the join key. How do I create the buckets on the
existing HDFS files? I would prefer to avoid creating another set of tables
(bucketed) and load data from non-bucketed table to bucketed tables if at
all possible. Is it possible to do the bucketing in Java as part of the M/R
jobs while creating the Avro files?

Any help / insight would greatly be appreciated.

Thank you very much for your time and help.

Dean Wampler 2013-03-29, 22:57
Sadananda Hegde 2013-03-30, 22:44
Dean Wampler 2013-03-31, 00:00
Sadananda Hegde 2013-04-04, 02:17
Mark Grover 2013-04-04, 05:36
Sadananda Hegde 2013-04-05, 22:02
Mark Grover 2013-04-06, 15:07
Sadananda Hegde 2013-04-11, 17:46
Bejoy KS 2013-04-16, 15:13