Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - How to speedup Hbase query throughput


+
Weihua JIANG 2011-04-26, 02:59
+
Ted Dunning 2011-04-26, 03:36
+
Weihua JIANG 2011-04-26, 05:27
+
Ted Dunning 2011-04-26, 05:35
+
Ted Dunning 2011-04-26, 03:37
+
Weihua JIANG 2011-04-26, 05:30
+
Stack 2011-04-26, 03:38
+
Weihua JIANG 2011-04-26, 05:04
+
Chris Tarnas 2011-04-26, 05:30
+
Weihua JIANG 2011-04-26, 05:36
+
Jean-Daniel Cryans 2011-04-26, 17:59
+
Weihua JIANG 2011-04-27, 01:02
+
Stack 2011-04-27, 16:53
+
Weihua JIANG 2011-04-28, 00:01
+
Weihua JIANG 2011-04-28, 07:55
+
Jean-Daniel Cryans 2011-04-28, 21:56
+
Stack 2011-04-28, 23:34
+
Weihua JIANG 2011-05-17, 06:18
+
Ted Dunning 2011-05-17, 13:50
+
Weihua JIANG 2011-05-17, 13:57
+
Stack 2011-05-17, 14:33
+
Michael Segel 2011-05-17, 14:47
+
Weihua JIANG 2011-05-18, 03:03
+
Stack 2011-05-18, 14:50
+
Weihua JIANG 2011-05-19, 00:11
+
Stack 2011-05-19, 04:27
+
Michel Segel 2011-05-19, 11:42
+
Matt Corgan 2011-05-19, 15:15
+
Joey Echeverria 2011-05-19, 15:23
+
Matt Corgan 2011-05-19, 15:35
Copy link to this message
-
Re: How to speedup Hbase query throughput
Joey Echeverria 2011-05-19, 15:39
I'm surprised the major compactions didn't balance the cluster better.
I wonder if you've stumbled upon a bug in HBase that's causing it to
leak old HFiles.

Is the total amount of data in HDFS what you expect?

-Joey

On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> that's right
>
>
> On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
>
>> Am I right to assume that all of your data is in HBase, ie you don't
>> keep anything in just HDFS files?
>>
>> -Joey
>>
>> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>> > I wanted to do some more investigation before posting to the list, but it
>> > seems relevant to this conversation...
>> >
>> > Is it possible that major compactions don't always localize the data
>> blocks?
>> >  Our cluster had a bunch of regions full of historical analytics data
>> that
>> > were already major compacted, then we added a new datanode/regionserver.
>>  We
>> > have a job that triggers major compactions at a minimum of once per week
>> by
>> > hashing the region name and giving it a time slot.  It's been several
>> weeks
>> > and the original nodes each have ~480gb used in hdfs, while the new node
>> has
>> > only 240gb.  Regions are scattered pretty randomly and evenly among the
>> > regionservers.
>> >
>> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName());
>> >
>> > My guess is that if a region is already major compacted and no new data
>> has
>> > been added to it, then compaction is skipped.  That's definitely an
>> > essential feature during typical operation, but it's a problem if you're
>> > relying on major compaction to balance the cluster.
>> >
>> > Matt
>> >
>> >
>> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> I had asked the question about how he created random keys... Hadn't seen
>> a
>> >> response.
>> >>
>> >> Sent from a remote device. Please excuse any typos...
>> >>
>> >> Mike Segel
>> >>
>> >> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote:
>> >>
>> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED]
>> >
>> >> wrote:
>> >> >> All the DNs almost have the same number of blocks. Major compaction
>> >> >> makes no difference.
>> >> >>
>> >> >
>> >> > I would expect major compaction to even the number of blocks across
>> >> > the cluster and it'd move the data for each region local to the
>> >> > regionserver.
>> >> >
>> >> > The only explanation that I can see is that the hot DNs must be
>> >> > carrying the hot blocks (The client querys are not random).  I do not
>> >> > know what else it could be.
>> >> >
>> >> > St.Ack
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
+
Matt Corgan 2011-05-19, 19:41
+
Weihua JIANG 2011-05-20, 00:08
+
Michel Segel 2011-05-20, 06:15
+
Segel, Mike 2011-05-20, 15:35