Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How to speedup Hbase query throughput


+
Weihua JIANG 2011-04-26, 02:59
+
Ted Dunning 2011-04-26, 03:36
+
Weihua JIANG 2011-04-26, 05:27
+
Ted Dunning 2011-04-26, 05:35
+
Ted Dunning 2011-04-26, 03:37
+
Weihua JIANG 2011-04-26, 05:30
+
Stack 2011-04-26, 03:38
+
Weihua JIANG 2011-04-26, 05:04
+
Chris Tarnas 2011-04-26, 05:30
+
Weihua JIANG 2011-04-26, 05:36
+
Jean-Daniel Cryans 2011-04-26, 17:59
+
Weihua JIANG 2011-04-27, 01:02
+
Stack 2011-04-27, 16:53
+
Weihua JIANG 2011-04-28, 00:01
+
Weihua JIANG 2011-04-28, 07:55
+
Jean-Daniel Cryans 2011-04-28, 21:56
+
Stack 2011-04-28, 23:34
+
Weihua JIANG 2011-05-17, 06:18
+
Ted Dunning 2011-05-17, 13:50
+
Weihua JIANG 2011-05-17, 13:57
+
Stack 2011-05-17, 14:33
+
Michael Segel 2011-05-17, 14:47
+
Weihua JIANG 2011-05-18, 03:03
+
Stack 2011-05-18, 14:50
+
Weihua JIANG 2011-05-19, 00:11
+
Stack 2011-05-19, 04:27
+
Michel Segel 2011-05-19, 11:42
+
Matt Corgan 2011-05-19, 15:15
+
Joey Echeverria 2011-05-19, 15:23
+
Matt Corgan 2011-05-19, 15:35
Copy link to this message
-
Re: How to speedup Hbase query throughput
I'm surprised the major compactions didn't balance the cluster better.
I wonder if you've stumbled upon a bug in HBase that's causing it to
leak old HFiles.

Is the total amount of data in HDFS what you expect?

-Joey

On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> that's right
>
>
> On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
>
>> Am I right to assume that all of your data is in HBase, ie you don't
>> keep anything in just HDFS files?
>>
>> -Joey
>>
>> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>> > I wanted to do some more investigation before posting to the list, but it
>> > seems relevant to this conversation...
>> >
>> > Is it possible that major compactions don't always localize the data
>> blocks?
>> >  Our cluster had a bunch of regions full of historical analytics data
>> that
>> > were already major compacted, then we added a new datanode/regionserver.
>>  We
>> > have a job that triggers major compactions at a minimum of once per week
>> by
>> > hashing the region name and giving it a time slot.  It's been several
>> weeks
>> > and the original nodes each have ~480gb used in hdfs, while the new node
>> has
>> > only 240gb.  Regions are scattered pretty randomly and evenly among the
>> > regionservers.
>> >
>> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName());
>> >
>> > My guess is that if a region is already major compacted and no new data
>> has
>> > been added to it, then compaction is skipped.  That's definitely an
>> > essential feature during typical operation, but it's a problem if you're
>> > relying on major compaction to balance the cluster.
>> >
>> > Matt
>> >
>> >
>> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> I had asked the question about how he created random keys... Hadn't seen
>> a
>> >> response.
>> >>
>> >> Sent from a remote device. Please excuse any typos...
>> >>
>> >> Mike Segel
>> >>
>> >> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote:
>> >>
>> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED]
>> >
>> >> wrote:
>> >> >> All the DNs almost have the same number of blocks. Major compaction
>> >> >> makes no difference.
>> >> >>
>> >> >
>> >> > I would expect major compaction to even the number of blocks across
>> >> > the cluster and it'd move the data for each region local to the
>> >> > regionserver.
>> >> >
>> >> > The only explanation that I can see is that the hot DNs must be
>> >> > carrying the hot blocks (The client querys are not random).  I do not
>> >> > know what else it could be.
>> >> >
>> >> > St.Ack
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
+
Matt Corgan 2011-05-19, 19:41
+
Weihua JIANG 2011-05-20, 00:08
+
Michel Segel 2011-05-20, 06:15
+
Segel, Mike 2011-05-20, 15:35
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB