Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How to speedup Hbase query throughput


+
Weihua JIANG 2011-04-26, 02:59
+
Ted Dunning 2011-04-26, 03:36
+
Weihua JIANG 2011-04-26, 05:27
+
Ted Dunning 2011-04-26, 05:35
+
Ted Dunning 2011-04-26, 03:37
+
Weihua JIANG 2011-04-26, 05:30
+
Stack 2011-04-26, 03:38
+
Weihua JIANG 2011-04-26, 05:04
+
Chris Tarnas 2011-04-26, 05:30
+
Weihua JIANG 2011-04-26, 05:36
+
Jean-Daniel Cryans 2011-04-26, 17:59
+
Weihua JIANG 2011-04-27, 01:02
+
Stack 2011-04-27, 16:53
+
Weihua JIANG 2011-04-28, 00:01
+
Weihua JIANG 2011-04-28, 07:55
+
Jean-Daniel Cryans 2011-04-28, 21:56
+
Stack 2011-04-28, 23:34
+
Weihua JIANG 2011-05-17, 06:18
+
Ted Dunning 2011-05-17, 13:50
+
Weihua JIANG 2011-05-17, 13:57
+
Stack 2011-05-17, 14:33
+
Michael Segel 2011-05-17, 14:47
+
Weihua JIANG 2011-05-18, 03:03
+
Stack 2011-05-18, 14:50
+
Weihua JIANG 2011-05-19, 00:11
+
Stack 2011-05-19, 04:27
+
Michel Segel 2011-05-19, 11:42
+
Matt Corgan 2011-05-19, 15:15
Copy link to this message
-
Re: How to speedup Hbase query throughput
Am I right to assume that all of your data is in HBase, ie you don't
keep anything in just HDFS files?

-Joey

On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> I wanted to do some more investigation before posting to the list, but it
> seems relevant to this conversation...
>
> Is it possible that major compactions don't always localize the data blocks?
>  Our cluster had a bunch of regions full of historical analytics data that
> were already major compacted, then we added a new datanode/regionserver.  We
> have a job that triggers major compactions at a minimum of once per week by
> hashing the region name and giving it a time slot.  It's been several weeks
> and the original nodes each have ~480gb used in hdfs, while the new node has
> only 240gb.  Regions are scattered pretty randomly and evenly among the
> regionservers.
>
> The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName());
>
> My guess is that if a region is already major compacted and no new data has
> been added to it, then compaction is skipped.  That's definitely an
> essential feature during typical operation, but it's a problem if you're
> relying on major compaction to balance the cluster.
>
> Matt
>
>
> On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> I had asked the question about how he created random keys... Hadn't seen a
>> response.
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote:
>>
>> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED]>
>> wrote:
>> >> All the DNs almost have the same number of blocks. Major compaction
>> >> makes no difference.
>> >>
>> >
>> > I would expect major compaction to even the number of blocks across
>> > the cluster and it'd move the data for each region local to the
>> > regionserver.
>> >
>> > The only explanation that I can see is that the hot DNs must be
>> > carrying the hot blocks (The client querys are not random).  I do not
>> > know what else it could be.
>> >
>> > St.Ack
>> >
>>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
+
Matt Corgan 2011-05-19, 15:35
+
Joey Echeverria 2011-05-19, 15:39
+
Matt Corgan 2011-05-19, 19:41
+
Weihua JIANG 2011-05-20, 00:08
+
Michel Segel 2011-05-20, 06:15
+
Segel, Mike 2011-05-20, 15:35