Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> customize partitioning of regionserver


Copy link to this message
-
Re: customize partitioning of regionserver
Thank you for the explanation. I have a better understanding now :)

On Mon, Jan 10, 2011 at 1:19 PM, Buttler, David <[EMAIL PROTECTED]> wrote:

> The default way HBase sets up M/R jobs is to make one per partition.  So,
> if all of your months are in one partition then you will only have one map
> job.  To do something different you would have to change the way splits are
> determined for your job, rather than using the default.
> One nice thing about having random keys is that you can use the defaults
> and just set a filter for your date range. That way you get maximum
> parallelism on the map side.  But then you might be constrained by your
> subsequent step if you can't parallelize that nicely.
>
> Dave
>
> -----Original Message-----
> From: Weishung Chung [mailto:[EMAIL PROTECTED]]
> Sent: Monday, January 10, 2011 11:02 AM
> To: [EMAIL PROTECTED]
> Subject: Re: customize partitioning of regionserver
>
> Thanks for the reply.
> In my use case, I have to retrieve a range of data usually by month and
> operate on them before reinserting them, so it would be nice if i could
> partition by month but then I don't know how would the partition affect the
> mapreduce job.
>
> On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <[EMAIL PROTECTED]>
> wrote:
>
> > Not to my knowledge.  Partitions are dynamically determined. As your
> table
> > grows, regions become too large and are split roughly in half.  This
> > prevents unbalanced regions.  Any predetermined partitioning will
> ultimately
> > fail because you don't know your data as well as you think you do.
> >
> > Dave
> >
> >
> > -----Original Message-----
> > From: Weishung Chung [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, January 10, 2011 10:14 AM
> > To: [EMAIL PROTECTED]
> > Subject: customize partitioning of regionserver
> >
> > Does HBase have the capability to partition dataset by range like the
> MySQL
> > partitioning eg. partition the datetime, row key by month?
> > Thank you.
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB