Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to speedup Hbase query throughput

Copy link to this message
Re: How to speedup Hbase query throughput
Am I right to assume that all of your data is in HBase, ie you don't
keep anything in just HDFS files?


On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> I wanted to do some more investigation before posting to the list, but it
> seems relevant to this conversation...
> Is it possible that major compactions don't always localize the data blocks?
>  Our cluster had a bunch of regions full of historical analytics data that
> were already major compacted, then we added a new datanode/regionserver.  We
> have a job that triggers major compactions at a minimum of once per week by
> hashing the region name and giving it a time slot.  It's been several weeks
> and the original nodes each have ~480gb used in hdfs, while the new node has
> only 240gb.  Regions are scattered pretty randomly and evenly among the
> regionservers.
> The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName());
> My guess is that if a region is already major compacted and no new data has
> been added to it, then compaction is skipped.  That's definitely an
> essential feature during typical operation, but it's a problem if you're
> relying on major compaction to balance the cluster.
> Matt
> On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[EMAIL PROTECTED]>wrote:
>> I had asked the question about how he created random keys... Hadn't seen a
>> response.
>> Sent from a remote device. Please excuse any typos...
>> Mike Segel
>> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote:
>> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED]>
>> wrote:
>> >> All the DNs almost have the same number of blocks. Major compaction
>> >> makes no difference.
>> >>
>> >
>> > I would expect major compaction to even the number of blocks across
>> > the cluster and it'd move the data for each region local to the
>> > regionserver.
>> >
>> > The only explanation that I can see is that the hot DNs must be
>> > carrying the hot blocks (The client querys are not random).  I do not
>> > know what else it could be.
>> >
>> > St.Ack
>> >

Joseph Echeverria
Cloudera, Inc.