bucketing is certainly helpful when you have finite number of values on a
different column in a partitioned column.
though bucketing would mean that when you load data into the table, it
can't be a straight forward load data in path, you will need to run it via
hive queries (which does not seem to be a problem at least from the look of
clustering used to be in the ranges of 2 like 2, 4, 8, 16 etc. Not sure if
it has changed now.
Also while loading data for bucketed table its advised you set the value
for set hive.enforce.bucketing = true;
I have rarely used indexing in hive. but I do remember hive indexes used
to provide better data access to certain queries as well the storage layout
helps in improving search and lookup of the data.
It may be really helpful if you can note down the performance you get after
fine tuning the parameters
On Tue, Mar 25, 2014 at 10:37 PM, Saumitra Shahapure (Vizury) <
[EMAIL PROTECTED]> wrote: