Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Bucketing external tables

Copy link to this message
Re: Bucketing external tables
I don't know of any way to avoid creating new tables and moving the data.
In fact, that's the official way to do it, from a temp table to the final
table, so Hive can ensure the bucketing is done correctly:


In other words, you might have a big move now, but going forward, you'll
want to stage your data in a temp table, use this procedure to put it in
the final location, then delete the temp data.


On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:

> Hello,
> We run M/R jobs to parse and process large and highly complex xml files
> into AVRO files. Then we build external Hive tables on top the parsed Avro
> files. The hive tables are partitioned by day; but they are still huge
> partitions and joins do not perform that well. So I would like to try
> out creating buckets on the join key. How do I create the buckets on the
> existing HDFS files? I would prefer to avoid creating another set of tables
> (bucketed) and load data from non-bucketed table to bucketed tables if at
> all possible. Is it possible to do the bucketing in Java as part of the M/R
> jobs while creating the Avro files?
> Any help / insight would greatly be appreciated.
> Thank you very much for your time and help.
> Sadu

*Dean Wampler, Ph.D.*