Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Bucketing external tables


+
Sadananda Hegde 2013-03-29, 21:58
Copy link to this message
-
Re: Bucketing external tables
I don't know of any way to avoid creating new tables and moving the data.
In fact, that's the official way to do it, from a temp table to the final
table, so Hive can ensure the bucketing is done correctly:

 https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html

In other words, you might have a big move now, but going forward, you'll
want to stage your data in a temp table, use this procedure to put it in
the final location, then delete the temp data.

dean

On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:

> Hello,
>
> We run M/R jobs to parse and process large and highly complex xml files
> into AVRO files. Then we build external Hive tables on top the parsed Avro
> files. The hive tables are partitioned by day; but they are still huge
> partitions and joins do not perform that well. So I would like to try
> out creating buckets on the join key. How do I create the buckets on the
> existing HDFS files? I would prefer to avoid creating another set of tables
> (bucketed) and load data from non-bucketed table to bucketed tables if at
> all possible. Is it possible to do the bucketing in Java as part of the M/R
> jobs while creating the Avro files?
>
> Any help / insight would greatly be appreciated.
>
> Thank you very much for your time and help.
>
> Sadu
>

--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
+
Sadananda Hegde 2013-03-30, 22:44
+
Dean Wampler 2013-03-31, 00:00
+
Sadananda Hegde 2013-04-04, 02:17
+
Mark Grover 2013-04-04, 05:36
+
Sadananda Hegde 2013-04-05, 22:02
+
Mark Grover 2013-04-06, 15:07
+
Sadananda Hegde 2013-04-11, 17:46
+
Bejoy KS 2013-04-16, 15:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB