Weishung Chung 2011-01-10, 18:14
Buttler, David 2011-01-10, 18:48
Weishung Chung 2011-01-10, 19:02
Buttler, David 2011-01-10, 19:19
Thank you for the explanation. I have a better understanding now :)
On Mon, Jan 10, 2011 at 1:19 PM, Buttler, David <[EMAIL PROTECTED]> wrote:
> The default way HBase sets up M/R jobs is to make one per partition. So,
> if all of your months are in one partition then you will only have one map
> job. To do something different you would have to change the way splits are
> determined for your job, rather than using the default.
> One nice thing about having random keys is that you can use the defaults
> and just set a filter for your date range. That way you get maximum
> parallelism on the map side. But then you might be constrained by your
> subsequent step if you can't parallelize that nicely.
> -----Original Message-----
> From: Weishung Chung [mailto:[EMAIL PROTECTED]]
> Sent: Monday, January 10, 2011 11:02 AM
> To: [EMAIL PROTECTED]
> Subject: Re: customize partitioning of regionserver
> Thanks for the reply.
> In my use case, I have to retrieve a range of data usually by month and
> operate on them before reinserting them, so it would be nice if i could
> partition by month but then I don't know how would the partition affect the
> mapreduce job.
> On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <[EMAIL PROTECTED]>
> > Not to my knowledge. Partitions are dynamically determined. As your
> > grows, regions become too large and are split roughly in half. This
> > prevents unbalanced regions. Any predetermined partitioning will
> > fail because you don't know your data as well as you think you do.
> > Dave
> > -----Original Message-----
> > From: Weishung Chung [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, January 10, 2011 10:14 AM
> > To: [EMAIL PROTECTED]
> > Subject: customize partitioning of regionserver
> > Does HBase have the capability to partition dataset by range like the
> > partitioning eg. partition the datetime, row key by month?
> > Thank you.