|
|
-
customize partitioning of regionserver
Weishung Chung 2011-01-10, 18:14
Does HBase have the capability to partition dataset by range like the MySQL partitioning eg. partition the datetime, row key by month? Thank you.
+
Weishung Chung 2011-01-10, 18:14
-
RE: customize partitioning of regionserver
Buttler, David 2011-01-10, 18:48
Not to my knowledge. Partitions are dynamically determined. As your table grows, regions become too large and are split roughly in half. This prevents unbalanced regions. Any predetermined partitioning will ultimately fail because you don't know your data as well as you think you do.
Dave -----Original Message----- From: Weishung Chung [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 10:14 AM To: [EMAIL PROTECTED] Subject: customize partitioning of regionserver
Does HBase have the capability to partition dataset by range like the MySQL partitioning eg. partition the datetime, row key by month? Thank you.
+
Buttler, David 2011-01-10, 18:48
-
Re: customize partitioning of regionserver
Weishung Chung 2011-01-10, 19:02
Thanks for the reply. In my use case, I have to retrieve a range of data usually by month and operate on them before reinserting them, so it would be nice if i could partition by month but then I don't know how would the partition affect the mapreduce job.
On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <[EMAIL PROTECTED]> wrote:
> Not to my knowledge. Partitions are dynamically determined. As your table > grows, regions become too large and are split roughly in half. This > prevents unbalanced regions. Any predetermined partitioning will ultimately > fail because you don't know your data as well as you think you do. > > Dave > > > -----Original Message----- > From: Weishung Chung [mailto:[EMAIL PROTECTED]] > Sent: Monday, January 10, 2011 10:14 AM > To: [EMAIL PROTECTED] > Subject: customize partitioning of regionserver > > Does HBase have the capability to partition dataset by range like the MySQL > partitioning eg. partition the datetime, row key by month? > Thank you. >
+
Weishung Chung 2011-01-10, 19:02
-
RE: customize partitioning of regionserver
Buttler, David 2011-01-10, 19:19
The default way HBase sets up M/R jobs is to make one per partition. So, if all of your months are in one partition then you will only have one map job. To do something different you would have to change the way splits are determined for your job, rather than using the default. One nice thing about having random keys is that you can use the defaults and just set a filter for your date range. That way you get maximum parallelism on the map side. But then you might be constrained by your subsequent step if you can't parallelize that nicely.
Dave
-----Original Message----- From: Weishung Chung [mailto:[EMAIL PROTECTED]] Sent: Monday, January 10, 2011 11:02 AM To: [EMAIL PROTECTED] Subject: Re: customize partitioning of regionserver
Thanks for the reply. In my use case, I have to retrieve a range of data usually by month and operate on them before reinserting them, so it would be nice if i could partition by month but then I don't know how would the partition affect the mapreduce job.
On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <[EMAIL PROTECTED]> wrote:
> Not to my knowledge. Partitions are dynamically determined. As your table > grows, regions become too large and are split roughly in half. This > prevents unbalanced regions. Any predetermined partitioning will ultimately > fail because you don't know your data as well as you think you do. > > Dave > > > -----Original Message----- > From: Weishung Chung [mailto:[EMAIL PROTECTED]] > Sent: Monday, January 10, 2011 10:14 AM > To: [EMAIL PROTECTED] > Subject: customize partitioning of regionserver > > Does HBase have the capability to partition dataset by range like the MySQL > partitioning eg. partition the datetime, row key by month? > Thank you. >
+
Buttler, David 2011-01-10, 19:19
-
Re: customize partitioning of regionserver
Weishung Chung 2011-01-10, 20:47
Thank you for the explanation. I have a better understanding now :)
On Mon, Jan 10, 2011 at 1:19 PM, Buttler, David <[EMAIL PROTECTED]> wrote:
> The default way HBase sets up M/R jobs is to make one per partition. So, > if all of your months are in one partition then you will only have one map > job. To do something different you would have to change the way splits are > determined for your job, rather than using the default. > One nice thing about having random keys is that you can use the defaults > and just set a filter for your date range. That way you get maximum > parallelism on the map side. But then you might be constrained by your > subsequent step if you can't parallelize that nicely. > > Dave > > -----Original Message----- > From: Weishung Chung [mailto:[EMAIL PROTECTED]] > Sent: Monday, January 10, 2011 11:02 AM > To: [EMAIL PROTECTED] > Subject: Re: customize partitioning of regionserver > > Thanks for the reply. > In my use case, I have to retrieve a range of data usually by month and > operate on them before reinserting them, so it would be nice if i could > partition by month but then I don't know how would the partition affect the > mapreduce job. > > On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <[EMAIL PROTECTED]> > wrote: > > > Not to my knowledge. Partitions are dynamically determined. As your > table > > grows, regions become too large and are split roughly in half. This > > prevents unbalanced regions. Any predetermined partitioning will > ultimately > > fail because you don't know your data as well as you think you do. > > > > Dave > > > > > > -----Original Message----- > > From: Weishung Chung [mailto:[EMAIL PROTECTED]] > > Sent: Monday, January 10, 2011 10:14 AM > > To: [EMAIL PROTECTED] > > Subject: customize partitioning of regionserver > > > > Does HBase have the capability to partition dataset by range like the > MySQL > > partitioning eg. partition the datetime, row key by month? > > Thank you. > > >
+
Weishung Chung 2011-01-10, 20:47
|
|