Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> customize partitioning of regionserver


Copy link to this message
-
Re: customize partitioning of regionserver
Thank you for the explanation. I have a better understanding now :)

On Mon, Jan 10, 2011 at 1:19 PM, Buttler, David <[EMAIL PROTECTED]> wrote:

> The default way HBase sets up M/R jobs is to make one per partition.  So,
> if all of your months are in one partition then you will only have one map
> job.  To do something different you would have to change the way splits are
> determined for your job, rather than using the default.
> One nice thing about having random keys is that you can use the defaults
> and just set a filter for your date range. That way you get maximum
> parallelism on the map side.  But then you might be constrained by your
> subsequent step if you can't parallelize that nicely.
>
> Dave
>
> -----Original Message-----
> From: Weishung Chung [mailto:[EMAIL PROTECTED]]
> Sent: Monday, January 10, 2011 11:02 AM
> To: [EMAIL PROTECTED]
> Subject: Re: customize partitioning of regionserver
>
> Thanks for the reply.
> In my use case, I have to retrieve a range of data usually by month and
> operate on them before reinserting them, so it would be nice if i could
> partition by month but then I don't know how would the partition affect the
> mapreduce job.
>
> On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <[EMAIL PROTECTED]>
> wrote:
>
> > Not to my knowledge.  Partitions are dynamically determined. As your
> table
> > grows, regions become too large and are split roughly in half.  This
> > prevents unbalanced regions.  Any predetermined partitioning will
> ultimately
> > fail because you don't know your data as well as you think you do.
> >
> > Dave
> >
> >
> > -----Original Message-----
> > From: Weishung Chung [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, January 10, 2011 10:14 AM
> > To: [EMAIL PROTECTED]
> > Subject: customize partitioning of regionserver
> >
> > Does HBase have the capability to partition dataset by range like the
> MySQL
> > partitioning eg. partition the datetime, row key by month?
> > Thank you.
> >
>