|
Gang Liu
2012-06-01, 17:13
Gang Liu
2012-06-11, 19:03
Carl Steinbach
2012-06-11, 19:09
Carl Steinbach
2012-06-11, 19:14
Gang Liu
2012-06-11, 19:22
Carl Steinbach
2012-06-14, 08:14
Edward Capriolo
2012-06-14, 16:34
Gang Liu
2012-06-14, 17:25
Edward Capriolo
2012-06-14, 17:30
Gang Liu
2012-06-14, 17:38
Gang Liu
2012-06-14, 16:43
|
-
Hive List Bucketing - Feature ReviewGang Liu 2012-06-01, 17:13
Dear all,
Please review the proposal and provide your comments: https://cwiki.apache.org/Hive/listbucketing.html Thanks Tim +
Gang Liu 2012-06-01, 17:13
-
Re: Hive List Bucketing - Feature ReviewGang Liu 2012-06-11, 19:03
Dear all hive developers,
We are making good progress of implementing the list bucketing feature. It should be available soon in weeks. We'd like to call feature review again and please provide your comments. Thanks Tim On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: >Dear all, > >Please review the proposal and provide your comments: > >https://cwiki.apache.org/Hive/listbucketing.html > > >Thanks > >Tim > +
Gang Liu 2012-06-11, 19:03
-
Re: Hive List Bucketing - Feature ReviewCarl Steinbach 2012-06-11, 19:09
This link may work better for some people:
https://cwiki.apache.org/confluence/display/Hive/ListBucketing Thanks. Carl On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: > Dear all hive developers, > > We are making good progress of implementing the list bucketing feature. It > should be available soon in weeks. > > We'd like to call feature review again and please provide your comments. > > Thanks > > Tim > > On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: > > >Dear all, > > > >Please review the proposal and provide your comments: > > > >https://cwiki.apache.org/Hive/listbucketing.html > > > > > >Thanks > > > >Tim > > > > +
Carl Steinbach 2012-06-11, 19:09
-
Re: Hive List Bucketing - Feature ReviewCarl Steinbach 2012-06-11, 19:14
+ hcatalog-dev
On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]> wrote: > This link may work better for some people: > > https://cwiki.apache.org/confluence/display/Hive/ListBucketing > > Thanks. > > Carl > > > On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: > >> Dear all hive developers, >> >> We are making good progress of implementing the list bucketing feature. It >> should be available soon in weeks. >> >> We'd like to call feature review again and please provide your comments. >> >> Thanks >> >> Tim >> >> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: >> >> >Dear all, >> > >> >Please review the proposal and provide your comments: >> > >> >https://cwiki.apache.org/Hive/listbucketing.html >> > >> > >> >Thanks >> > >> >Tim >> > >> >> > +
Carl Steinbach 2012-06-11, 19:14
-
Re: Hive List Bucketing - Feature ReviewGang Liu 2012-06-11, 19:22
Hi Carl, thanks Tim
On 6/11/12 12:14 PM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote: >+ hcatalog-dev > >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]> >wrote: > >> This link may work better for some people: >> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing >> >> Thanks. >> >> Carl >> >> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >> >>> Dear all hive developers, >>> >>> We are making good progress of implementing the list bucketing >>>feature. It >>> should be available soon in weeks. >>> >>> We'd like to call feature review again and please provide your >>>comments. >>> >>> Thanks >>> >>> Tim >>> >>> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: >>> >>> >Dear all, >>> > >>> >Please review the proposal and provide your comments: >>> > >>> >https://cwiki.apache.org/Hive/listbucketing.html >>> > >>> > >>> >Thanks >>> > >>> >Tim >>> > >>> >>> >> +
Gang Liu 2012-06-11, 19:22
-
Re: Hive List Bucketing - Feature ReviewCarl Steinbach 2012-06-14, 08:14
Hi Tim,
I added some comments to the wiki a couple days ago. I just wanted to make sure you saw them since it doesn't look like you're registered as a watcher for that page. Thanks. Carl On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <[EMAIL PROTECTED]> wrote: > Hi Carl, thanks Tim > > On 6/11/12 12:14 PM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote: > > >+ hcatalog-dev > > > >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]> > >wrote: > > > >> This link may work better for some people: > >> > >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing > >> > >> Thanks. > >> > >> Carl > >> > >> > >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: > >> > >>> Dear all hive developers, > >>> > >>> We are making good progress of implementing the list bucketing > >>>feature. It > >>> should be available soon in weeks. > >>> > >>> We'd like to call feature review again and please provide your > >>>comments. > >>> > >>> Thanks > >>> > >>> Tim > >>> > >>> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: > >>> > >>> >Dear all, > >>> > > >>> >Please review the proposal and provide your comments: > >>> > > >>> >https://cwiki.apache.org/Hive/listbucketing.html > >>> > > >>> > > >>> >Thanks > >>> > > >>> >Tim > >>> > > >>> > >>> > >> > > +
Carl Steinbach 2012-06-14, 08:14
-
Re: Hive List Bucketing - Feature ReviewEdward Capriolo 2012-06-14, 16:34
I am of the opinion this feature is too specialized to be generally helpful.
------------------------------- The cardinality of 'x' is in 1000's per partition of T. Moreover, there is a skew for the values of 'x'. In general, there are ~10 values of 'x' which have a very large skew, and the remaining values of 'x' have a small cardinality. Also, note that this mapping (values of 'x' with a high cardinality can change daily). -------------------------- In these cases you should use clustering/bucketing. This will prevent the skew you are talking about. If you want more efficiency in certain query types build a index on top of the original table. I understand someone wanting to do this because mysql partition can do this, but this sounds like a management problem. Who is to say the skew is the same each partition? ----------------------------------------- hive compiler to do input pruning. The list of skewed keys is stored at the table level (note that, this list can be initially supplied by the client periodically, and can be eventually updated when a new partition is being loaded). ----------------------------------------- Imagine you have a table partitioned by hour and two datacenters China and NY. At some hours the skew will be different. Skews change over time. Since this property is table level I do not understand how this would be changed. On Thu, Jun 14, 2012 at 4:14 AM, Carl Steinbach <[EMAIL PROTECTED]> wrote: > Hi Tim, > > I added some comments to the wiki a couple days ago. I just wanted to make > sure you saw them since it doesn't look like you're registered as a watcher > for that page. > > Thanks. > > Carl > > On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <[EMAIL PROTECTED]> wrote: > >> Hi Carl, thanks Tim >> >> On 6/11/12 12:14 PM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote: >> >> >+ hcatalog-dev >> > >> >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]> >> >wrote: >> > >> >> This link may work better for some people: >> >> >> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing >> >> >> >> Thanks. >> >> >> >> Carl >> >> >> >> >> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >> >> >> >>> Dear all hive developers, >> >>> >> >>> We are making good progress of implementing the list bucketing >> >>>feature. It >> >>> should be available soon in weeks. >> >>> >> >>> We'd like to call feature review again and please provide your >> >>>comments. >> >>> >> >>> Thanks >> >>> >> >>> Tim >> >>> >> >>> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: >> >>> >> >>> >Dear all, >> >>> > >> >>> >Please review the proposal and provide your comments: >> >>> > >> >>> >https://cwiki.apache.org/Hive/listbucketing.html >> >>> > >> >>> > >> >>> >Thanks >> >>> > >> >>> >Tim >> >>> > >> >>> >> >>> >> >> >> >> +
Edward Capriolo 2012-06-14, 16:34
-
Re: Hive List Bucketing - Feature ReviewGang Liu 2012-06-14, 17:25
Hey Edward,
Thank you very much for providing comments. This feature is designed for use cases described in wiki. We do see them in the real life so that we come up with the feature. In this first release, in order to use the feature: 1. Hive table users need to know the skewed key in advance 2. Hive table users need to know the skewed key is the same each partition. 3. If Hive table users know skewed key change, they can "alter" skewed key via "alter" statement. 4. If #3 happens, old partitions have old skewed key and new partition have new. It's expected. We may consider the following in the future release: 1. Hive instruments skewed key and displays them to user Thanks Tim On 6/14/12 9:34 AM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: >I am of the opinion this feature is too specialized to be generally >helpful. > >------------------------------- >The cardinality of 'x' is in 1000's per partition of T. Moreover, >there is a skew for the values of 'x'. In general, there are ~10 >values of 'x' which have a very large skew, and the remaining >values of 'x' have a small cardinality. Also, note that this mapping >(values of 'x' with a high cardinality can change daily). >-------------------------- > >In these cases you should use clustering/bucketing. This will prevent >the skew you are talking about. If you want more efficiency in certain >query types build a index on top of the original table. > >I understand someone wanting to do this because mysql partition can do >this, but this sounds like a management problem. Who is to say the >skew is the same each partition? > >----------------------------------------- >hive compiler to do input pruning. The list of skewed keys is stored >at the table level (note that, this list can be initially supplied by >the client periodically, and can be eventually updated when a new >partition is being loaded). >----------------------------------------- > >Imagine you have a table partitioned by hour and two datacenters China >and NY. At some hours the skew will be different. Skews change over >time. Since this property is table level I do not understand how this >would be changed. > > > >On Thu, Jun 14, 2012 at 4:14 AM, Carl Steinbach <[EMAIL PROTECTED]> wrote: >> Hi Tim, >> >> I added some comments to the wiki a couple days ago. I just wanted to >>make >> sure you saw them since it doesn't look like you're registered as a >>watcher >> for that page. >> >> Thanks. >> >> Carl >> >> On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >> >>> Hi Carl, thanks Tim >>> >>> On 6/11/12 12:14 PM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote: >>> >>> >+ hcatalog-dev >>> > >>> >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]> >>> >wrote: >>> > >>> >> This link may work better for some people: >>> >> >>> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing >>> >> >>> >> Thanks. >>> >> >>> >> Carl >>> >> >>> >> >>> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >>> >> >>> >>> Dear all hive developers, >>> >>> >>> >>> We are making good progress of implementing the list bucketing >>> >>>feature. It >>> >>> should be available soon in weeks. >>> >>> >>> >>> We'd like to call feature review again and please provide your >>> >>>comments. >>> >>> >>> >>> Thanks >>> >>> >>> >>> Tim >>> >>> >>> >>> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: >>> >>> >>> >>> >Dear all, >>> >>> > >>> >>> >Please review the proposal and provide your comments: >>> >>> > >>> >>> >https://cwiki.apache.org/Hive/listbucketing.html >>> >>> > >>> >>> > >>> >>> >Thanks >>> >>> > >>> >>> >Tim >>> >>> > >>> >>> >>> >>> >>> >> >>> >>> +
Gang Liu 2012-06-14, 17:25
-
Re: Hive List Bucketing - Feature ReviewEdward Capriolo 2012-06-14, 17:30
We have had a ticket open for quite some time for combine input format
to work across partitions. Not sure if that can help with what you are seeing as well. It could help us alot. Edward On Thu, Jun 14, 2012 at 1:25 PM, Gang Liu <[EMAIL PROTECTED]> wrote: > Hey Edward, > > Thank you very much for providing comments. > > This feature is designed for use cases described in wiki. We do see them > in the real life so that we come up with the feature. > > In this first release, in order to use the feature: > 1. Hive table users need to know the skewed key in advance > 2. Hive table users need to know the skewed key is the same each > partition. > 3. If Hive table users know skewed key change, they can "alter" skewed key > via "alter" statement. > > 4. If #3 happens, old partitions have old skewed key and new partition > have new. It's expected. > > We may consider the following in the future release: > 1. Hive instruments skewed key and displays them to user > > Thanks > > Tim > > > On 6/14/12 9:34 AM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: > >>I am of the opinion this feature is too specialized to be generally >>helpful. >> >>------------------------------- >>The cardinality of 'x' is in 1000's per partition of T. Moreover, >>there is a skew for the values of 'x'. In general, there are ~10 >>values of 'x' which have a very large skew, and the remaining >>values of 'x' have a small cardinality. Also, note that this mapping >>(values of 'x' with a high cardinality can change daily). >>-------------------------- >> >>In these cases you should use clustering/bucketing. This will prevent >>the skew you are talking about. If you want more efficiency in certain >>query types build a index on top of the original table. >> >>I understand someone wanting to do this because mysql partition can do >>this, but this sounds like a management problem. Who is to say the >>skew is the same each partition? >> >>----------------------------------------- >>hive compiler to do input pruning. The list of skewed keys is stored >>at the table level (note that, this list can be initially supplied by >>the client periodically, and can be eventually updated when a new >>partition is being loaded). >>----------------------------------------- >> >>Imagine you have a table partitioned by hour and two datacenters China >>and NY. At some hours the skew will be different. Skews change over >>time. Since this property is table level I do not understand how this >>would be changed. >> >> >> >>On Thu, Jun 14, 2012 at 4:14 AM, Carl Steinbach <[EMAIL PROTECTED]> wrote: >>> Hi Tim, >>> >>> I added some comments to the wiki a couple days ago. I just wanted to >>>make >>> sure you saw them since it doesn't look like you're registered as a >>>watcher >>> for that page. >>> >>> Thanks. >>> >>> Carl >>> >>> On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >>> >>>> Hi Carl, thanks Tim >>>> >>>> On 6/11/12 12:14 PM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote: >>>> >>>> >+ hcatalog-dev >>>> > >>>> >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]> >>>> >wrote: >>>> > >>>> >> This link may work better for some people: >>>> >> >>>> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing >>>> >> >>>> >> Thanks. >>>> >> >>>> >> Carl >>>> >> >>>> >> >>>> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >>>> >> >>>> >>> Dear all hive developers, >>>> >>> >>>> >>> We are making good progress of implementing the list bucketing >>>> >>>feature. It >>>> >>> should be available soon in weeks. >>>> >>> >>>> >>> We'd like to call feature review again and please provide your >>>> >>>comments. >>>> >>> >>>> >>> Thanks >>>> >>> >>>> >>> Tim >>>> >>> >>>> >>> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: >>>> >>> >>>> >>> >Dear all, >>>> >>> > >>>> >>> >Please review the proposal and provide your comments: >>>> >>> > >>>> >>> >https://cwiki.apache.org/Hive/listbucketing.html >>>> >>> > >>>> >>> > >>>> >>> >Thanks >>>> >> +
Edward Capriolo 2012-06-14, 17:30
-
Re: Hive List Bucketing - Feature ReviewGang Liu 2012-06-14, 17:38
Would you please elaborate on it?
Thanks Tim On 6/14/12 10:30 AM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: >We have had a ticket open for quite some time for combine input format >to work across partitions. Not sure if that can help with what you are >seeing as well. It could help us alot. > >Edward > >On Thu, Jun 14, 2012 at 1:25 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >> Hey Edward, >> >> Thank you very much for providing comments. >> >> This feature is designed for use cases described in wiki. We do see them >> in the real life so that we come up with the feature. >> >> In this first release, in order to use the feature: >> 1. Hive table users need to know the skewed key in advance >> 2. Hive table users need to know the skewed key is the same each >> partition. >> 3. If Hive table users know skewed key change, they can "alter" skewed >>key >> via "alter" statement. >> >> 4. If #3 happens, old partitions have old skewed key and new partition >> have new. It's expected. >> >> We may consider the following in the future release: >> 1. Hive instruments skewed key and displays them to user >> >> Thanks >> >> Tim >> >> >> On 6/14/12 9:34 AM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: >> >>>I am of the opinion this feature is too specialized to be generally >>>helpful. >>> >>>------------------------------- >>>The cardinality of 'x' is in 1000's per partition of T. Moreover, >>>there is a skew for the values of 'x'. In general, there are ~10 >>>values of 'x' which have a very large skew, and the remaining >>>values of 'x' have a small cardinality. Also, note that this mapping >>>(values of 'x' with a high cardinality can change daily). >>>-------------------------- >>> >>>In these cases you should use clustering/bucketing. This will prevent >>>the skew you are talking about. If you want more efficiency in certain >>>query types build a index on top of the original table. >>> >>>I understand someone wanting to do this because mysql partition can do >>>this, but this sounds like a management problem. Who is to say the >>>skew is the same each partition? >>> >>>----------------------------------------- >>>hive compiler to do input pruning. The list of skewed keys is stored >>>at the table level (note that, this list can be initially supplied by >>>the client periodically, and can be eventually updated when a new >>>partition is being loaded). >>>----------------------------------------- >>> >>>Imagine you have a table partitioned by hour and two datacenters China >>>and NY. At some hours the skew will be different. Skews change over >>>time. Since this property is table level I do not understand how this >>>would be changed. >>> >>> >>> >>>On Thu, Jun 14, 2012 at 4:14 AM, Carl Steinbach <[EMAIL PROTECTED]> >>>wrote: >>>> Hi Tim, >>>> >>>> I added some comments to the wiki a couple days ago. I just wanted to >>>>make >>>> sure you saw them since it doesn't look like you're registered as a >>>>watcher >>>> for that page. >>>> >>>> Thanks. >>>> >>>> Carl >>>> >>>> On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >>>> >>>>> Hi Carl, thanks Tim >>>>> >>>>> On 6/11/12 12:14 PM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >+ hcatalog-dev >>>>> > >>>>> >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]> >>>>> >wrote: >>>>> > >>>>> >> This link may work better for some people: >>>>> >> >>>>> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing >>>>> >> >>>>> >> Thanks. >>>>> >> >>>>> >> Carl >>>>> >> >>>>> >> >>>>> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >>>>> >> >>>>> >>> Dear all hive developers, >>>>> >>> >>>>> >>> We are making good progress of implementing the list bucketing >>>>> >>>feature. It >>>>> >>> should be available soon in weeks. >>>>> >>> >>>>> >>> We'd like to call feature review again and please provide your >>>>> >>>comments. >>>>> >>> >>>>> >>> Thanks >>>>> >>> >>>>> >>> Tim >>>>> >>> >>>>> >>> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: +
Gang Liu 2012-06-14, 17:38
-
Re: Hive List Bucketing - Feature ReviewGang Liu 2012-06-14, 16:43
Hi Carl,
Thank you for informing me. Yes, I didn't register as a watcher so that I didn't see it. I am sorry. I just register as a watcher. I will check comments and reply accordingly. Thanks Tim On 6/14/12 1:14 AM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote: >Hi Tim, > >I added some comments to the wiki a couple days ago. I just wanted to make >sure you saw them since it doesn't look like you're registered as a >watcher >for that page. > >Thanks. > >Carl > >On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <[EMAIL PROTECTED]> wrote: > >> Hi Carl, thanks Tim >> >> On 6/11/12 12:14 PM, "Carl Steinbach" <[EMAIL PROTECTED]> wrote: >> >> >+ hcatalog-dev >> > >> >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <[EMAIL PROTECTED]> >> >wrote: >> > >> >> This link may work better for some people: >> >> >> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing >> >> >> >> Thanks. >> >> >> >> Carl >> >> >> >> >> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <[EMAIL PROTECTED]> wrote: >> >> >> >>> Dear all hive developers, >> >>> >> >>> We are making good progress of implementing the list bucketing >> >>>feature. It >> >>> should be available soon in weeks. >> >>> >> >>> We'd like to call feature review again and please provide your >> >>>comments. >> >>> >> >>> Thanks >> >>> >> >>> Tim >> >>> >> >>> On 6/1/12 10:13 AM, "Gang Liu" <[EMAIL PROTECTED]> wrote: >> >>> >> >>> >Dear all, >> >>> > >> >>> >Please review the proposal and provide your comments: >> >>> > >> >>> >https://cwiki.apache.org/Hive/listbucketing.html >> >>> > >> >>> > >> >>> >Thanks >> >>> > >> >>> >Tim >> >>> > >> >>> >> >>> >> >> >> >> +
Gang Liu 2012-06-14, 16:43
|