Sadananda Hegde 2013-03-29, 21:58
Dean Wampler 2013-03-29, 22:57
-Re: Bucketing external tables
Sadananda Hegde 2013-03-30, 22:44
Does that mean, this bucketing is exclusively Hive feature and not
available to others like Java, Pig, etc?
And also, my final tables have to be managed tables; not external tables,
Thank again for your time and help.
On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler <
[EMAIL PROTECTED]> wrote:
> I don't know of any way to avoid creating new tables and moving the data.
> In fact, that's the official way to do it, from a temp table to the final
> table, so Hive can ensure the bucketing is done correctly:
> In other words, you might have a big move now, but going forward, you'll
> want to stage your data in a temp table, use this procedure to put it in
> the final location, then delete the temp data.
> On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>> We run M/R jobs to parse and process large and highly complex xml files
>> into AVRO files. Then we build external Hive tables on top the parsed Avro
>> files. The hive tables are partitioned by day; but they are still huge
>> partitions and joins do not perform that well. So I would like to try
>> out creating buckets on the join key. How do I create the buckets on the
>> existing HDFS files? I would prefer to avoid creating another set of tables
>> (bucketed) and load data from non-bucketed table to bucketed tables if at
>> all possible. Is it possible to do the bucketing in Java as part of the M/R
>> jobs while creating the Avro files?
>> Any help / insight would greatly be appreciated.
>> Thank you very much for your time and help.
> *Dean Wampler, Ph.D.*
Dean Wampler 2013-03-31, 00:00
Sadananda Hegde 2013-04-04, 02:17
Mark Grover 2013-04-04, 05:36
Sadananda Hegde 2013-04-05, 22:02
Mark Grover 2013-04-06, 15:07
Sadananda Hegde 2013-04-11, 17:46
Bejoy KS 2013-04-16, 15:13