Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Hive List Bucketing - Feature Review


Copy link to this message
-
Re: Hive List Bucketing - Feature Review
I am of the opinion this feature is too specialized to be generally helpful.

-------------------------------
The cardinality of 'x' is in 1000's per partition of T. Moreover,
there is a skew for the values of 'x'. In general, there are ~10
values of 'x' which have a very large skew, and the remaining
values of 'x' have a small cardinality. Also, note that this mapping
(values of 'x' with a high cardinality can change daily).
--------------------------

In these cases you should use clustering/bucketing. This will prevent
the skew you are talking about. If you want more efficiency in certain
query types build a index on top of the original table.

I understand someone wanting to do this because mysql partition can do
this, but this sounds like a management problem. Who is to say the
skew is the same each partition?

-----------------------------------------
hive compiler to do input pruning. The list of skewed keys is stored
at the table level (note that, this list can be initially supplied by
the client periodically, and can be eventually updated when a new
partition is being loaded).
-----------------------------------------

Imagine you have a table partitioned by hour and two datacenters China
and NY. At some hours the skew will be different. Skews change over
time. Since this property is table level I do not understand how this
would be changed.

On Thu, Jun 14, 2012 at 4:14 AM, Carl Steinbach <[EMAIL PROTECTED]> wrote:
> Hi Tim,
>
> I added some comments to the wiki a couple days ago. I just wanted to make
> sure you saw them since it doesn't look like you're registered as a watcher
> for that page.
>
> Thanks.
>
> Carl
>
> On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <[EMAIL PROTECTED]> wrote:
>
>> Hi Carl, thanks Tim
>>
>> On 6/11/12 12:14 PM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote:
>>
>> >+ hcatalog-dev
>> >
>> >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]>
>> >wrote:
>> >
>> >> This link may work better for some people:
>> >>
>> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing
>> >>
>> >> Thanks.
>> >>
>> >> Carl
>> >>
>> >>
>> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote:
>> >>
>> >>> Dear all hive developers,
>> >>>
>> >>> We are making good progress of implementing the list bucketing
>> >>>feature. It
>> >>> should be available soon in weeks.
>> >>>
>> >>> We'd like to call feature review again and please provide your
>> >>>comments.
>> >>>
>> >>> Thanks
>> >>>
>> >>> Tim
>> >>>
>> >>> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote:
>> >>>
>> >>> >Dear all,
>> >>> >
>> >>> >Please review the proposal and provide your comments:
>> >>> >
>> >>> >https://cwiki.apache.org/Hive/listbucketing.html
>> >>> >
>> >>> >
>> >>> >Thanks
>> >>> >
>> >>> >Tim
>> >>> >
>> >>>
>> >>>
>> >>
>>
>>