|
Weihua JIANG
2011-04-26, 02:59
Ted Dunning
2011-04-26, 03:36
Ted Dunning
2011-04-26, 03:37
Stack
2011-04-26, 03:38
Weihua JIANG
2011-04-26, 05:04
Weihua JIANG
2011-04-26, 05:27
Weihua JIANG
2011-04-26, 05:30
Chris Tarnas
2011-04-26, 05:30
Ted Dunning
2011-04-26, 05:35
Weihua JIANG
2011-04-26, 05:36
Jean-Daniel Cryans
2011-04-26, 17:59
Weihua JIANG
2011-04-27, 01:02
Stack
2011-04-27, 16:53
Weihua JIANG
2011-04-28, 00:01
Weihua JIANG
2011-04-28, 07:55
Jean-Daniel Cryans
2011-04-28, 21:56
Stack
2011-04-28, 23:34
Weihua JIANG
2011-05-17, 06:18
Ted Dunning
2011-05-17, 13:50
Weihua JIANG
2011-05-17, 13:57
Stack
2011-05-17, 14:33
Michael Segel
2011-05-17, 14:47
Weihua JIANG
2011-05-18, 03:03
Stack
2011-05-18, 14:50
Weihua JIANG
2011-05-19, 00:11
Stack
2011-05-19, 04:27
Michel Segel
2011-05-19, 11:42
Matt Corgan
2011-05-19, 15:15
Joey Echeverria
2011-05-19, 15:23
Matt Corgan
2011-05-19, 15:35
Joey Echeverria
2011-05-19, 15:39
Matt Corgan
2011-05-19, 19:41
Weihua JIANG
2011-05-20, 00:08
Michel Segel
2011-05-20, 06:15
Segel, Mike
2011-05-20, 15:35
|
-
How to speedup Hbase query throughputWeihua JIANG 2011-04-26, 02:59
Hi all,
We want to implement a bill query system. We have 20M users, the bill for each user per month contains about 10 0.6K-byte records. We want to store user bill for 6 months. Of course, user query focused on the latest month reports. But, the user to be queried doesn't have hot spot. We use CDH3U0 with 6 servers (each with 24G mem and 3 1T disk) for data node and region server (besides the ZK, namenode and hmaster servers). RS heap is 8G and DN is 12G. HFile max size is 1G. The block cache is 0.4. The row key is month+user_id. Each record is stored as a cell. So, a month report per user is a row in HBase. Currently, to store bill records, we can achieve about 30K record/second. However, the query performance is quite poor. We can only achieve about 600~700 month_report/second. That is, each region server can only serve query for about 100 row/second. Block cache hit ratio is about 20%. Do you have any advice on how to improve the query performance? Below is some metrics info reported by region server: 2011-04-26T10:56:12 hbase.regionserver: RegionServer=regionserver50820, blockCacheCount=40969, blockCacheEvictedCount=216359, blockCacheFree=671152504, blockCacheHitCachingRatio=20, blockCacheHitCount=67936, blockCacheHitRatio=20, blockCacheMissCount=257675, blockCacheSize=2743351688, compactionQueueSize=0, compactionSize_avg_time=0, compactionSize_num_ops=7, compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, flushTime_num_ops=0, fsReadLatency_avg_time=46, fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 2011-04-26T10:56:22 hbase.regionserver: RegionServer=regionserver50820, blockCacheCount=42500, blockCacheEvictedCount=216359, blockCacheFree=569659040, blockCacheHitCachingRatio=20, blockCacheHitCount=68418, blockCacheHitRatio=20, blockCacheMissCount=259206, blockCacheSize=2844845152, compactionQueueSize=0, compactionSize_avg_time=0, compactionSize_num_ops=7, compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, flushTime_num_ops=0, fsReadLatency_avg_time=44, fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 2011-04-26T10:56:32 hbase.regionserver: RegionServer=regionserver50820, blockCacheCount=39238, blockCacheEvictedCount=221509, blockCacheFree=785944072, blockCacheHitCachingRatio=20, blockCacheHitCount=69043, blockCacheHitRatio=20, blockCacheMissCount=261095, blockCacheSize=2628560120, compactionQueueSize=0, compactionSize_avg_time=0, compactionSize_num_ops=7, compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, flushTime_num_ops=0, fsReadLatency_avg_time=39, fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0, fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0, fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, requests=128.77777, storefileIndexSizeMB=188, storefiles=343, stores=169 And we also tried to disable block cache, it seems the performance is even a little bit better. And it we use the configuration 6 DN servers + 3 RS servers, we can get better throughput at about 1000 month_report/second. I am confused. Can any one explain the reason? Thanks Weihua
-
Re: How to speedup Hbase query throughputTed Dunning 2011-04-26, 03:36
Change your key to user_month.
That will put all of the records for a user together so you will only need a single disk operation to read all of your data. Also, test the option of putting multiple months in a single row. On Mon, Apr 25, 2011 at 7:59 PM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > Hi all, > > We want to implement a bill query system. We have 20M users, the bill > for each user per month contains about 10 0.6K-byte records. We want > to store user bill for 6 months. Of course, user query focused on the > latest month reports. But, the user to be queried doesn't have hot > spot. > > We use CDH3U0 with 6 servers (each with 24G mem and 3 1T disk) for > data node and region server (besides the ZK, namenode and hmaster > servers). RS heap is 8G and DN is 12G. HFile max size is 1G. The > block cache is 0.4. > > The row key is month+user_id. Each record is stored as a cell. So, a > month report per user is a row in HBase. > > Currently, to store bill records, we can achieve about 30K record/second. > > However, the query performance is quite poor. We can only achieve > about 600~700 month_report/second. That is, each region server can > only serve query for about 100 row/second. Block cache hit ratio is > about 20%. > > Do you have any advice on how to improve the query performance? > > Below is some metrics info reported by region server: > 2011-04-26T10:56:12 hbase.regionserver: > RegionServer=regionserver50820, blockCacheCount=40969, > blockCacheEvictedCount=216359, blockCacheFree=671152504, > blockCacheHitCachingRatio=20, blockCacheHitCount=67936, > blockCacheHitRatio=20, blockCacheMissCount=257675, > blockCacheSize=2743351688, compactionQueueSize=0, > compactionSize_avg_time=0, compactionSize_num_ops=7, > compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > flushTime_num_ops=0, fsReadLatency_avg_time=46, > fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, > fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, > fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 > 2011-04-26T10:56:22 hbase.regionserver: > RegionServer=regionserver50820, blockCacheCount=42500, > blockCacheEvictedCount=216359, blockCacheFree=569659040, > blockCacheHitCachingRatio=20, blockCacheHitCount=68418, > blockCacheHitRatio=20, blockCacheMissCount=259206, > blockCacheSize=2844845152, compactionQueueSize=0, > compactionSize_avg_time=0, compactionSize_num_ops=7, > compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > flushTime_num_ops=0, fsReadLatency_avg_time=44, > fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, > fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, > fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 > 2011-04-26T10:56:32 hbase.regionserver: > RegionServer=regionserver50820, blockCacheCount=39238, > blockCacheEvictedCount=221509, blockCacheFree=785944072, > blockCacheHitCachingRatio=20, blockCacheHitCount=69043, > blockCacheHitRatio=20, blockCacheMissCount=261095, > blockCacheSize=2628560120, compactionQueueSize=0, > compactionSize_avg_time=0, compactionSize_num_ops=7, > compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > flushTime_num_ops=0, fsReadLatency_avg_time=39, > fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0, > fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0, > fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > requests=128.77777, storefileIndexSizeMB=188, storefiles=343, > stores=169 > > > And we also tried to disable block cache, it seems the performance is > even a little bit better. And it we use the configuration 6 DN servers > + 3 RS servers, we can get better throughput at about 1000
-
Re: How to speedup Hbase query throughputTed Dunning 2011-04-26, 03:37
Because of your key organization you are blowing away your cache anyway so
it isn't doing you any good. On Mon, Apr 25, 2011 at 7:59 PM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > And we also tried to disable block cache, it seems the performance is > even a little bit better. And it we use the configuration 6 DN servers > + 3 RS servers, we can get better throughput at about 1000 > month_report/second. I am confused. Can any one explain the reason? >
-
Re: How to speedup Hbase query throughputStack 2011-04-26, 03:38
> Currently, to store bill records, we can achieve about 30K record/second.
> Can you use bulk load? See http://hbase.apache.org/bulk-loads.html > However, the query performance is quite poor. We can only achieve > about 600~700 month_report/second. That is, each region server can > only serve query for about 100 row/second. Block cache hit ratio is > about 20%. > This is random accesses? Why random accesses and not scans? > Do you have any advice on how to improve the query performance? > See above cited performance section from website book. > Below is some metrics info reported by region server: > 2011-04-26T10:56:12 hbase.regionserver: > RegionServer=regionserver50820, blockCacheCount=40969, > blockCacheEvictedCount=216359, blockCacheFree=671152504, > blockCacheHitCachingRatio=20, blockCacheHitCount=67936, > blockCacheHitRatio=20, blockCacheMissCount=257675, > blockCacheSize=2743351688, compactionQueueSize=0, > compactionSize_avg_time=0, compactionSize_num_ops=7, > compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > flushTime_num_ops=0, fsReadLatency_avg_time=46, > fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, > fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, > fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 > 2011-04-26T10:56:22 hbase.regionserver: > RegionServer=regionserver50820, blockCacheCount=42500, > blockCacheEvictedCount=216359, blockCacheFree=569659040, > blockCacheHitCachingRatio=20, blockCacheHitCount=68418, > blockCacheHitRatio=20, blockCacheMissCount=259206, > blockCacheSize=2844845152, compactionQueueSize=0, > compactionSize_avg_time=0, compactionSize_num_ops=7, > compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > flushTime_num_ops=0, fsReadLatency_avg_time=44, > fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, > fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, > fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 > 2011-04-26T10:56:32 hbase.regionserver: > RegionServer=regionserver50820, blockCacheCount=39238, > blockCacheEvictedCount=221509, blockCacheFree=785944072, > blockCacheHitCachingRatio=20, blockCacheHitCount=69043, > blockCacheHitRatio=20, blockCacheMissCount=261095, > blockCacheSize=2628560120, compactionQueueSize=0, > compactionSize_avg_time=0, compactionSize_num_ops=7, > compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > flushTime_num_ops=0, fsReadLatency_avg_time=39, > fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0, > fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0, > fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > requests=128.77777, storefileIndexSizeMB=188, storefiles=343, > stores=169 > This is hard to read but I don't see anything obnoxious. > > And we also tried to disable block cache, it seems the performance is > even a little bit better. And it we use the configuration 6 DN servers > + 3 RS servers, we can get better throughput at about 1000 > month_report/second. I am confused. Can any one explain the reason? > Sounds like you are doing all random reads? Do you have to? St.Ack
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-04-26, 05:04
The query is all random read. The scenario is that a user want to
query his own monthly bill report, e.g. to query what happened on his bill in March, or Feb, etc. Since every user may want to do so, we can't predict who will be the next to ask for such monthly bill report. 2011/4/26 Stack <[EMAIL PROTECTED]>: >> Currently, to store bill records, we can achieve about 30K record/second. >> > > Can you use bulk load? See http://hbase.apache.org/bulk-loads.html > >> However, the query performance is quite poor. We can only achieve >> about 600~700 month_report/second. That is, each region server can >> only serve query for about 100 row/second. Block cache hit ratio is >> about 20%. >> > > This is random accesses? Why random accesses and not scans? > > >> Do you have any advice on how to improve the query performance? >> > > See above cited performance section from website book. > > >> Below is some metrics info reported by region server: >> 2011-04-26T10:56:12 hbase.regionserver: >> RegionServer=regionserver50820, blockCacheCount=40969, >> blockCacheEvictedCount=216359, blockCacheFree=671152504, >> blockCacheHitCachingRatio=20, blockCacheHitCount=67936, >> blockCacheHitRatio=20, blockCacheMissCount=257675, >> blockCacheSize=2743351688, compactionQueueSize=0, >> compactionSize_avg_time=0, compactionSize_num_ops=7, >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >> flushTime_num_ops=0, fsReadLatency_avg_time=46, >> fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, >> fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >> requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 >> 2011-04-26T10:56:22 hbase.regionserver: >> RegionServer=regionserver50820, blockCacheCount=42500, >> blockCacheEvictedCount=216359, blockCacheFree=569659040, >> blockCacheHitCachingRatio=20, blockCacheHitCount=68418, >> blockCacheHitRatio=20, blockCacheMissCount=259206, >> blockCacheSize=2844845152, compactionQueueSize=0, >> compactionSize_avg_time=0, compactionSize_num_ops=7, >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >> flushTime_num_ops=0, fsReadLatency_avg_time=44, >> fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, >> fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >> requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 >> 2011-04-26T10:56:32 hbase.regionserver: >> RegionServer=regionserver50820, blockCacheCount=39238, >> blockCacheEvictedCount=221509, blockCacheFree=785944072, >> blockCacheHitCachingRatio=20, blockCacheHitCount=69043, >> blockCacheHitRatio=20, blockCacheMissCount=261095, >> blockCacheSize=2628560120, compactionQueueSize=0, >> compactionSize_avg_time=0, compactionSize_num_ops=7, >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >> flushTime_num_ops=0, fsReadLatency_avg_time=39, >> fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0, >> fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0, >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >> requests=128.77777, storefileIndexSizeMB=188, storefiles=343, >> stores=169 >> > > This is hard to read but I don't see anything obnoxious. > > >> >> And we also tried to disable block cache, it seems the performance is >> even a little bit better. And it we use the configuration 6 DN servers >> + 3 RS servers, we can get better throughput at about 1000 >> month_report/second. I am confused. Can any one explain the reason? >> > > Sounds like you are doing all random reads? Do you have to? > > St.Ack >
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-04-26, 05:27
Changing key to user_month may not be useful to me since, for each
query, we only need to get one month report for a user instead of all the data stored for a user. Putting multiple month data into a single row may be useful, but not sure. I will perform some experimentation when I have time. 2011/4/26 Ted Dunning <[EMAIL PROTECTED]>: > Change your key to user_month. > > That will put all of the records for a user together so you will only need a > single disk operation to read all of your data. Also, test the option of > putting multiple months in a single row. > > On Mon, Apr 25, 2011 at 7:59 PM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > >> Hi all, >> >> We want to implement a bill query system. We have 20M users, the bill >> for each user per month contains about 10 0.6K-byte records. We want >> to store user bill for 6 months. Of course, user query focused on the >> latest month reports. But, the user to be queried doesn't have hot >> spot. >> >> We use CDH3U0 with 6 servers (each with 24G mem and 3 1T disk) for >> data node and region server (besides the ZK, namenode and hmaster >> servers). RS heap is 8G and DN is 12G. HFile max size is 1G. The >> block cache is 0.4. >> >> The row key is month+user_id. Each record is stored as a cell. So, a >> month report per user is a row in HBase. >> >> Currently, to store bill records, we can achieve about 30K record/second. >> >> However, the query performance is quite poor. We can only achieve >> about 600~700 month_report/second. That is, each region server can >> only serve query for about 100 row/second. Block cache hit ratio is >> about 20%. >> >> Do you have any advice on how to improve the query performance? >> >> Below is some metrics info reported by region server: >> 2011-04-26T10:56:12 hbase.regionserver: >> RegionServer=regionserver50820, blockCacheCount=40969, >> blockCacheEvictedCount=216359, blockCacheFree=671152504, >> blockCacheHitCachingRatio=20, blockCacheHitCount=67936, >> blockCacheHitRatio=20, blockCacheMissCount=257675, >> blockCacheSize=2743351688, compactionQueueSize=0, >> compactionSize_avg_time=0, compactionSize_num_ops=7, >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >> flushTime_num_ops=0, fsReadLatency_avg_time=46, >> fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, >> fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >> requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 >> 2011-04-26T10:56:22 hbase.regionserver: >> RegionServer=regionserver50820, blockCacheCount=42500, >> blockCacheEvictedCount=216359, blockCacheFree=569659040, >> blockCacheHitCachingRatio=20, blockCacheHitCount=68418, >> blockCacheHitRatio=20, blockCacheMissCount=259206, >> blockCacheSize=2844845152, compactionQueueSize=0, >> compactionSize_avg_time=0, compactionSize_num_ops=7, >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >> flushTime_num_ops=0, fsReadLatency_avg_time=44, >> fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, >> fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >> requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 >> 2011-04-26T10:56:32 hbase.regionserver: >> RegionServer=regionserver50820, blockCacheCount=39238, >> blockCacheEvictedCount=221509, blockCacheFree=785944072, >> blockCacheHitCachingRatio=20, blockCacheHitCount=69043, >> blockCacheHitRatio=20, blockCacheMissCount=261095, >> blockCacheSize=2628560120, compactionQueueSize=0, >> compactionSize_avg_time=0, compactionSize_num_ops=7, >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >> flushTime_num_ops=0, fsReadLatency_avg_time=39, >> fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0,
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-04-26, 05:30
So, you mean I shall disable block cache and make all query directly to DFS?
Then, the query latency maybe high. And how much block cache hit ratio is considered to be acceptable? I mean, above such ratio, block cache is benefical. 2011/4/26 Ted Dunning <[EMAIL PROTECTED]>: > Because of your key organization you are blowing away your cache anyway so > it isn't doing you any good. > > On Mon, Apr 25, 2011 at 7:59 PM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > >> And we also tried to disable block cache, it seems the performance is >> even a little bit better. And it we use the configuration 6 DN servers >> + 3 RS servers, we can get better throughput at about 1000 >> month_report/second. I am confused. Can any one explain the reason? >> >
-
Re: How to speedup Hbase query throughputChris Tarnas 2011-04-26, 05:30
For your query tests, are they all from a single thread? Have you tried reading from multiple threads/processes in parallel - that sounds more like your use case.
-chris On Apr 25, 2011, at 10:04 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote: > The query is all random read. The scenario is that a user want to > query his own monthly bill report, e.g. to query what happened on his > bill in March, or Feb, etc. Since every user may want to do so, we > can't predict who will be the next to ask for such monthly bill > report. > > 2011/4/26 Stack <[EMAIL PROTECTED]>: >>> Currently, to store bill records, we can achieve about 30K record/second. >>> >> >> Can you use bulk load? See http://hbase.apache.org/bulk-loads.html >> >>> However, the query performance is quite poor. We can only achieve >>> about 600~700 month_report/second. That is, each region server can >>> only serve query for about 100 row/second. Block cache hit ratio is >>> about 20%. >>> >> >> This is random accesses? Why random accesses and not scans? >> >> >>> Do you have any advice on how to improve the query performance? >>> >> >> See above cited performance section from website book. >> >> >>> Below is some metrics info reported by region server: >>> 2011-04-26T10:56:12 hbase.regionserver: >>> RegionServer=regionserver50820, blockCacheCount=40969, >>> blockCacheEvictedCount=216359, blockCacheFree=671152504, >>> blockCacheHitCachingRatio=20, blockCacheHitCount=67936, >>> blockCacheHitRatio=20, blockCacheMissCount=257675, >>> blockCacheSize=2743351688, compactionQueueSize=0, >>> compactionSize_avg_time=0, compactionSize_num_ops=7, >>> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >>> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >>> flushTime_num_ops=0, fsReadLatency_avg_time=46, >>> fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, >>> fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, >>> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >>> requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 >>> 2011-04-26T10:56:22 hbase.regionserver: >>> RegionServer=regionserver50820, blockCacheCount=42500, >>> blockCacheEvictedCount=216359, blockCacheFree=569659040, >>> blockCacheHitCachingRatio=20, blockCacheHitCount=68418, >>> blockCacheHitRatio=20, blockCacheMissCount=259206, >>> blockCacheSize=2844845152, compactionQueueSize=0, >>> compactionSize_avg_time=0, compactionSize_num_ops=7, >>> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >>> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >>> flushTime_num_ops=0, fsReadLatency_avg_time=44, >>> fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, >>> fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, >>> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >>> requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 >>> 2011-04-26T10:56:32 hbase.regionserver: >>> RegionServer=regionserver50820, blockCacheCount=39238, >>> blockCacheEvictedCount=221509, blockCacheFree=785944072, >>> blockCacheHitCachingRatio=20, blockCacheHitCount=69043, >>> blockCacheHitRatio=20, blockCacheMissCount=261095, >>> blockCacheSize=2628560120, compactionQueueSize=0, >>> compactionSize_avg_time=0, compactionSize_num_ops=7, >>> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >>> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >>> flushTime_num_ops=0, fsReadLatency_avg_time=39, >>> fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0, >>> fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0, >>> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >>> requests=128.77777, storefileIndexSizeMB=188, storefiles=343, >>> stores=169 >>> >> >> This is hard to read but I don't see anything obnoxious. >> >> >>> >>> And we also tried to disable block cache, it seems the performance is >>> even a little bit better. And it we use the configuration 6 DN servers
-
Re: How to speedup Hbase query throughputTed Dunning 2011-04-26, 05:35
user_month might still be helpful on average if a user looks for one month
and then another a short time later. This is because your cache could be primed by the first query. But you know your application best, of course. On Mon, Apr 25, 2011 at 10:27 PM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > Changing key to user_month may not be useful to me since, for each > query, we only need to get one month report for a user instead of all > the data stored for a user. > > Putting multiple month data into a single row may be useful, but not > sure. I will perform some experimentation when I have time. > > 2011/4/26 Ted Dunning <[EMAIL PROTECTED]>: > > Change your key to user_month. > > > > That will put all of the records for a user together so you will only > need a > > single disk operation to read all of your data. Also, test the option of > > putting multiple months in a single row. > > > > On Mon, Apr 25, 2011 at 7:59 PM, Weihua JIANG <[EMAIL PROTECTED] > >wrote: > > > >> Hi all, > >> > >> We want to implement a bill query system. We have 20M users, the bill > >> for each user per month contains about 10 0.6K-byte records. We want > >> to store user bill for 6 months. Of course, user query focused on the > >> latest month reports. But, the user to be queried doesn't have hot > >> spot. > >> > >> We use CDH3U0 with 6 servers (each with 24G mem and 3 1T disk) for > >> data node and region server (besides the ZK, namenode and hmaster > >> servers). RS heap is 8G and DN is 12G. HFile max size is 1G. The > >> block cache is 0.4. > >> > >> The row key is month+user_id. Each record is stored as a cell. So, a > >> month report per user is a row in HBase. > >> > >> Currently, to store bill records, we can achieve about 30K > record/second. > >> > >> However, the query performance is quite poor. We can only achieve > >> about 600~700 month_report/second. That is, each region server can > >> only serve query for about 100 row/second. Block cache hit ratio is > >> about 20%. > >> > >> Do you have any advice on how to improve the query performance? > >> > >> Below is some metrics info reported by region server: > >> 2011-04-26T10:56:12 hbase.regionserver: > >> RegionServer=regionserver50820, blockCacheCount=40969, > >> blockCacheEvictedCount=216359, blockCacheFree=671152504, > >> blockCacheHitCachingRatio=20, blockCacheHitCount=67936, > >> blockCacheHitRatio=20, blockCacheMissCount=257675, > >> blockCacheSize=2743351688, compactionQueueSize=0, > >> compactionSize_avg_time=0, compactionSize_num_ops=7, > >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > >> flushTime_num_ops=0, fsReadLatency_avg_time=46, > >> fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, > >> fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, > >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > >> requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 > >> 2011-04-26T10:56:22 hbase.regionserver: > >> RegionServer=regionserver50820, blockCacheCount=42500, > >> blockCacheEvictedCount=216359, blockCacheFree=569659040, > >> blockCacheHitCachingRatio=20, blockCacheHitCount=68418, > >> blockCacheHitRatio=20, blockCacheMissCount=259206, > >> blockCacheSize=2844845152, compactionQueueSize=0, > >> compactionSize_avg_time=0, compactionSize_num_ops=7, > >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > >> flushTime_num_ops=0, fsReadLatency_avg_time=44, > >> fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, > >> fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, > >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > >> requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 > >> 2011-04-26T10:56:32 hbase.regionserver: > >> RegionServer=regionserver50820, blockCacheCount=39238, > >> blockCacheEvictedCount=221509, blockCacheFree=785944072,
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-04-26, 05:36
I use two machines (each with 30 threads) to act as clients. Both
servers and clients are connected via giganet. Thanks Weihua 2011/4/26 Chris Tarnas <[EMAIL PROTECTED]>: > For your query tests, are they all from a single thread? Have you tried reading from multiple threads/processes in parallel - that sounds more like your use case. > > -chris > > > > On Apr 25, 2011, at 10:04 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote: > >> The query is all random read. The scenario is that a user want to >> query his own monthly bill report, e.g. to query what happened on his >> bill in March, or Feb, etc. Since every user may want to do so, we >> can't predict who will be the next to ask for such monthly bill >> report. >> >> 2011/4/26 Stack <[EMAIL PROTECTED]>: >>>> Currently, to store bill records, we can achieve about 30K record/second. >>>> >>> >>> Can you use bulk load? See http://hbase.apache.org/bulk-loads.html >>> >>>> However, the query performance is quite poor. We can only achieve >>>> about 600~700 month_report/second. That is, each region server can >>>> only serve query for about 100 row/second. Block cache hit ratio is >>>> about 20%. >>>> >>> >>> This is random accesses? Why random accesses and not scans? >>> >>> >>>> Do you have any advice on how to improve the query performance? >>>> >>> >>> See above cited performance section from website book. >>> >>> >>>> Below is some metrics info reported by region server: >>>> 2011-04-26T10:56:12 hbase.regionserver: >>>> RegionServer=regionserver50820, blockCacheCount=40969, >>>> blockCacheEvictedCount=216359, blockCacheFree=671152504, >>>> blockCacheHitCachingRatio=20, blockCacheHitCount=67936, >>>> blockCacheHitRatio=20, blockCacheMissCount=257675, >>>> blockCacheSize=2743351688, compactionQueueSize=0, >>>> compactionSize_avg_time=0, compactionSize_num_ops=7, >>>> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >>>> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >>>> flushTime_num_ops=0, fsReadLatency_avg_time=46, >>>> fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, >>>> fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, >>>> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >>>> requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 >>>> 2011-04-26T10:56:22 hbase.regionserver: >>>> RegionServer=regionserver50820, blockCacheCount=42500, >>>> blockCacheEvictedCount=216359, blockCacheFree=569659040, >>>> blockCacheHitCachingRatio=20, blockCacheHitCount=68418, >>>> blockCacheHitRatio=20, blockCacheMissCount=259206, >>>> blockCacheSize=2844845152, compactionQueueSize=0, >>>> compactionSize_avg_time=0, compactionSize_num_ops=7, >>>> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >>>> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >>>> flushTime_num_ops=0, fsReadLatency_avg_time=44, >>>> fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, >>>> fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, >>>> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >>>> requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 >>>> 2011-04-26T10:56:32 hbase.regionserver: >>>> RegionServer=regionserver50820, blockCacheCount=39238, >>>> blockCacheEvictedCount=221509, blockCacheFree=785944072, >>>> blockCacheHitCachingRatio=20, blockCacheHitCount=69043, >>>> blockCacheHitRatio=20, blockCacheMissCount=261095, >>>> blockCacheSize=2628560120, compactionQueueSize=0, >>>> compactionSize_avg_time=0, compactionSize_num_ops=7, >>>> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, >>>> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, >>>> flushTime_num_ops=0, fsReadLatency_avg_time=39, >>>> fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0, >>>> fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0, >>>> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, >>>> requests=128.77777, storefileIndexSizeMB=188, storefiles=343, >>>> stores=169
-
Re: How to speedup Hbase query throughputJean-Daniel Cryans 2011-04-26, 17:59
> servers). RS heap is 8G and DN is 12G.
I haven't done much testing changing the DN heap, but in my experience it's not really of use to have 12GB there since the data never goes through the DN. Max 2GB maybe, give the rest to the region server or even the OS cache (ie don't allocate some GBs on purpose). >From what I see in your other responses, it appears that most of your performance testing was done in a black box fashion. Did you try even try looking into where the bottle neck is? If not, then how could we even be able to tell you why 3 RS would be faster than 6 apart from doing educated guesses? As far as I can tell, you might want to see if some region server is serving most of the load or not. If it is, is it because of poor region balancing (all the hottest regions at the same place) or because of poor key design (all the reads hit only one region). That's just one thing to look at. J-D
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-04-27, 01:02
The DN heap configuration is quite valuable to us.
As for our test, I have enabled HPROF on client side to see what happens. According to HPROF, client spent 96% time on epollWait (about 79% on waiting for RS response and 17% on communication with ZK). I tried to enable HPROF on RS, but failed. If I added the HPROF agent in hbase-env.sh, RS startup reports an error said HPROF can't be loaded twice. But, I am sure I only enabled it once. I don't know where the problem is. So, for RS, I have to work in a black-box style to only analyze the RS performance metrics to see what happens. And, in our test, the balance is OK. When 6 RS in service and the total TPS is 600, then according to status on HMaster web page, each RS handles about 100 requests. Thanks Weihua 2011/4/27 Jean-Daniel Cryans <[EMAIL PROTECTED]>: >> servers). RS heap is 8G and DN is 12G. > > I haven't done much testing changing the DN heap, but in my experience > it's not really of use to have 12GB there since the data never goes > through the DN. Max 2GB maybe, give the rest to the region server or > even the OS cache (ie don't allocate some GBs on purpose). > > From what I see in your other responses, it appears that most of your > performance testing was done in a black box fashion. Did you try even > try looking into where the bottle neck is? If not, then how could we > even be able to tell you why 3 RS would be faster than 6 apart from > doing educated guesses? > > As far as I can tell, you might want to see if some region server is > serving most of the load or not. If it is, is it because of poor > region balancing (all the hottest regions at the same place) or > because of poor key design (all the reads hit only one region). That's > just one thing to look at. > > J-D >
-
Re: How to speedup Hbase query throughputStack 2011-04-27, 16:53
On Tue, Apr 26, 2011 at 6:02 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote:
> I tried to enable HPROF on RS, but failed. If I added the HPROF agent > in hbase-env.sh, RS startup reports an error said HPROF can't be > loaded twice. But, I am sure I only enabled it once. I don't know > where the problem is. > This sounds like 'HBASE-3561 OPTS arguments are duplicated' Are you running 0.90.2? St.Ack
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-04-28, 00:01
I am using CDH3U0. It is HBase 0.90.1, I think.
Thanks Weihua 2011/4/28 Stack <[EMAIL PROTECTED]>: > On Tue, Apr 26, 2011 at 6:02 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote: >> I tried to enable HPROF on RS, but failed. If I added the HPROF agent >> in hbase-env.sh, RS startup reports an error said HPROF can't be >> loaded twice. But, I am sure I only enabled it once. I don't know >> where the problem is. >> > > This sounds like 'HBASE-3561 OPTS arguments are duplicated' Are you > running 0.90.2? > > St.Ack >
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-04-28, 07:55
After solving HBASE-3561, I successfully run hprof for RS and DN.
Since block cache is useless in my case, I disabled it. I rerun my test with 14 RS+DNs and 1 client with 200 threads. But, the throughput is still only about 700. No scalability shown in this case. Below is the hot spots in RS: CPU SAMPLES BEGIN (total = 1469756) Thu Apr 28 15:43:35 2011 rank self accum count trace method 1 44.33% 44.33% 651504 300612 sun.nio.ch.EPollArrayWrapper.epollWait 2 19.88% 64.21% 292221 301351 sun.nio.ch.EPollArrayWrapper.epollWait 3 8.88% 73.09% 130582 300554 sun.nio.ch.EPollArrayWrapper.epollWait 4 4.43% 77.52% 65106 301248 sun.nio.ch.EPollArrayWrapper.epollWait 5 4.43% 81.95% 65104 301249 sun.nio.ch.EPollArrayWrapper.epollWait 6 4.43% 86.38% 65100 301247 sun.nio.ch.EPollArrayWrapper.epollWait 7 4.43% 90.81% 65061 301266 sun.nio.ch.EPollArrayWrapper.epollWait 8 4.32% 95.13% 63465 301565 sun.nio.ch.EPollArrayWrapper.epollWait 9 2.31% 97.43% 33894 301555 sun.nio.ch.EPollArrayWrapper.epollWait 10 1.76% 99.19% 25841 301588 sun.nio.ch.EPollArrayWrapper.epollWait 11 0.48% 99.67% 7025 301443 sun.nio.ch.EPollArrayWrapper.epollWait 12 0.02% 99.69% 341 301568 sun.nio.ch.NativeThread.current 13 0.01% 99.71% 187 301535 org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize 14 0.01% 99.72% 186 301538 org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize 15 0.01% 99.73% 170 301625 org.apache.hadoop.util.DataChecksum.update 16 0.01% 99.74% 164 301579 sun.nio.ch.EPollArrayWrapper.epollWait 17 0.01% 99.75% 149 300938 sun.nio.ch.EPollArrayWrapper.epollWait TRACE 300612: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:305) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:619) TRACE 301351: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) java.io.BufferedInputStream.fill(BufferedInputStream.java:218) java.io.BufferedInputStream.read1(BufferedInputStream.java:258) java.io.BufferedInputStream.read(BufferedInputStream.java:317) java.io.DataInputStream.read(DataInputStream.java:132) org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1389) org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237) org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1243) TRACE 300554: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) TRACE 301248: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.hadoop.hbase.ipc.HBaseServer$Responder.run(HBaseServer.java:588) TRACE 301249: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:414) TRACE 301247: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:498) org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192) org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708) org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) TRACE 301266: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.Select
-
Re: How to speedup Hbase query throughputJean-Daniel Cryans 2011-04-28, 21:56
Seems to be a case of HDFS-347.
J-D On Thu, Apr 28, 2011 at 12:55 AM, Weihua JIANG <[EMAIL PROTECTED]> wrote: > After solving HBASE-3561, I successfully run hprof for RS and DN. > Since block cache is useless in my case, I disabled it. I rerun my > test with 14 RS+DNs and 1 client with 200 threads. But, the throughput > is still only about 700. No scalability shown in this case. > > Below is the hot spots in RS: > CPU SAMPLES BEGIN (total = 1469756) Thu Apr 28 15:43:35 2011 > rank self accum count trace method > 1 44.33% 44.33% 651504 300612 sun.nio.ch.EPollArrayWrapper.epollWait > 2 19.88% 64.21% 292221 301351 sun.nio.ch.EPollArrayWrapper.epollWait > 3 8.88% 73.09% 130582 300554 sun.nio.ch.EPollArrayWrapper.epollWait > 4 4.43% 77.52% 65106 301248 sun.nio.ch.EPollArrayWrapper.epollWait > 5 4.43% 81.95% 65104 301249 sun.nio.ch.EPollArrayWrapper.epollWait > 6 4.43% 86.38% 65100 301247 sun.nio.ch.EPollArrayWrapper.epollWait > 7 4.43% 90.81% 65061 301266 sun.nio.ch.EPollArrayWrapper.epollWait > 8 4.32% 95.13% 63465 301565 sun.nio.ch.EPollArrayWrapper.epollWait > 9 2.31% 97.43% 33894 301555 sun.nio.ch.EPollArrayWrapper.epollWait > 10 1.76% 99.19% 25841 301588 sun.nio.ch.EPollArrayWrapper.epollWait > 11 0.48% 99.67% 7025 301443 sun.nio.ch.EPollArrayWrapper.epollWait > 12 0.02% 99.69% 341 301568 sun.nio.ch.NativeThread.current > 13 0.01% 99.71% 187 301535 > org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize > 14 0.01% 99.72% 186 301538 > org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize > 15 0.01% 99.73% 170 301625 org.apache.hadoop.util.DataChecksum.update > 16 0.01% 99.74% 164 301579 sun.nio.ch.EPollArrayWrapper.epollWait > 17 0.01% 99.75% 149 300938 sun.nio.ch.EPollArrayWrapper.epollWait > > TRACE 300612: > sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) > sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) > sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) > sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) > sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) > sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:305) > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > java.lang.Thread.run(Thread.java:619) > TRACE 301351: > sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) > sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) > sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) > sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) > sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > java.io.BufferedInputStream.read(BufferedInputStream.java:317) > java.io.DataInputStream.read(DataInputStream.java:132) > org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) > org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1389) > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237) > org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) > org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) > org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
-
Re: How to speedup Hbase query throughputStack 2011-04-28, 23:34
Yes, you could try applying hdfs-347 to your hdfs as J-D suggests. Do
your numbers change if you run your client from more than one machine? St.Ack On Thu, Apr 28, 2011 at 2:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > Seems to be a case of HDFS-347. > > J-D > > On Thu, Apr 28, 2011 at 12:55 AM, Weihua JIANG <[EMAIL PROTECTED]> wrote: >> After solving HBASE-3561, I successfully run hprof for RS and DN. >> Since block cache is useless in my case, I disabled it. I rerun my >> test with 14 RS+DNs and 1 client with 200 threads. But, the throughput >> is still only about 700. No scalability shown in this case. >> >> Below is the hot spots in RS: >> CPU SAMPLES BEGIN (total = 1469756) Thu Apr 28 15:43:35 2011 >> rank self accum count trace method >> 1 44.33% 44.33% 651504 300612 sun.nio.ch.EPollArrayWrapper.epollWait >> 2 19.88% 64.21% 292221 301351 sun.nio.ch.EPollArrayWrapper.epollWait >> 3 8.88% 73.09% 130582 300554 sun.nio.ch.EPollArrayWrapper.epollWait >> 4 4.43% 77.52% 65106 301248 sun.nio.ch.EPollArrayWrapper.epollWait >> 5 4.43% 81.95% 65104 301249 sun.nio.ch.EPollArrayWrapper.epollWait >> 6 4.43% 86.38% 65100 301247 sun.nio.ch.EPollArrayWrapper.epollWait >> 7 4.43% 90.81% 65061 301266 sun.nio.ch.EPollArrayWrapper.epollWait >> 8 4.32% 95.13% 63465 301565 sun.nio.ch.EPollArrayWrapper.epollWait >> 9 2.31% 97.43% 33894 301555 sun.nio.ch.EPollArrayWrapper.epollWait >> 10 1.76% 99.19% 25841 301588 sun.nio.ch.EPollArrayWrapper.epollWait >> 11 0.48% 99.67% 7025 301443 sun.nio.ch.EPollArrayWrapper.epollWait >> 12 0.02% 99.69% 341 301568 sun.nio.ch.NativeThread.current >> 13 0.01% 99.71% 187 301535 >> org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize >> 14 0.01% 99.72% 186 301538 >> org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize >> 15 0.01% 99.73% 170 301625 org.apache.hadoop.util.DataChecksum.update >> 16 0.01% 99.74% 164 301579 sun.nio.ch.EPollArrayWrapper.epollWait >> 17 0.01% 99.75% 149 300938 sun.nio.ch.EPollArrayWrapper.epollWait >> >> TRACE 300612: >> sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) >> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) >> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:305) >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> java.lang.Thread.run(Thread.java:619) >> TRACE 301351: >> sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) >> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) >> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) >> java.io.BufferedInputStream.read1(BufferedInputStream.java:258) >> java.io.BufferedInputStream.read(BufferedInputStream.java:317) >> java.io.DataInputStream.read(DataInputStream.java:132) >> org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) >> org.apache.hadoop.hdfs.DFSClient$BlockReader.readChunk(DFSClient.java:1389)
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-05-17, 06:18
I have not applied hdfs-347, but done some other experiments.
I increased client thread to 2000 to put enough pressure on cluster. I disabled RS block cache. The total TPS is still low (with Month+User as row key, it is about 1300 for 10 RS+DN and with User+Month it is 700). I used BTrace to log the time spent on each HTable.get on RS. It shows that most of the GETs use 20~50ms and there are many GETs need >1000ms. And almost all these times are spent on DFSClient$BlockReader to read data from DN. But, the network usage is not high (<100Mb/s, we have a giganet), so network is not a problem. Since for each DFS block read, there is a socket connection created. I use netstat to caculate the TCP connections on 50010 port (DN listen port) for each RS+DN server. It shows that there are always one or two DNs have high such connection number (>200) while other DNs have low number (<20). And the high connection DNs have high disk I/O usage (about 100%) while other DNs have low disk I/O. This phenoma lasts for days and the hot machine is always the hot one. The high connection number mainly comes from local region server request (~80%). According to the source code of DFSClient, it prefers to use local DN to fetch block. But, why certain machine is so popular? All my servers have almost the same configuration. 2011/4/29 Stack <[EMAIL PROTECTED]>: > Yes, you could try applying hdfs-347 to your hdfs as J-D suggests. Do > your numbers change if you run your client from more than one machine? > St.Ack > > On Thu, Apr 28, 2011 at 2:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: >> Seems to be a case of HDFS-347. >> >> J-D >> >> On Thu, Apr 28, 2011 at 12:55 AM, Weihua JIANG <[EMAIL PROTECTED]> wrote: >>> After solving HBASE-3561, I successfully run hprof for RS and DN. >>> Since block cache is useless in my case, I disabled it. I rerun my >>> test with 14 RS+DNs and 1 client with 200 threads. But, the throughput >>> is still only about 700. No scalability shown in this case. >>> >>> Below is the hot spots in RS: >>> CPU SAMPLES BEGIN (total = 1469756) Thu Apr 28 15:43:35 2011 >>> rank self accum count trace method >>> 1 44.33% 44.33% 651504 300612 sun.nio.ch.EPollArrayWrapper.epollWait >>> 2 19.88% 64.21% 292221 301351 sun.nio.ch.EPollArrayWrapper.epollWait >>> 3 8.88% 73.09% 130582 300554 sun.nio.ch.EPollArrayWrapper.epollWait >>> 4 4.43% 77.52% 65106 301248 sun.nio.ch.EPollArrayWrapper.epollWait >>> 5 4.43% 81.95% 65104 301249 sun.nio.ch.EPollArrayWrapper.epollWait >>> 6 4.43% 86.38% 65100 301247 sun.nio.ch.EPollArrayWrapper.epollWait >>> 7 4.43% 90.81% 65061 301266 sun.nio.ch.EPollArrayWrapper.epollWait >>> 8 4.32% 95.13% 63465 301565 sun.nio.ch.EPollArrayWrapper.epollWait >>> 9 2.31% 97.43% 33894 301555 sun.nio.ch.EPollArrayWrapper.epollWait >>> 10 1.76% 99.19% 25841 301588 sun.nio.ch.EPollArrayWrapper.epollWait >>> 11 0.48% 99.67% 7025 301443 sun.nio.ch.EPollArrayWrapper.epollWait >>> 12 0.02% 99.69% 341 301568 sun.nio.ch.NativeThread.current >>> 13 0.01% 99.71% 187 301535 >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize >>> 14 0.01% 99.72% 186 301538 >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize >>> 15 0.01% 99.73% 170 301625 org.apache.hadoop.util.DataChecksum.update >>> 16 0.01% 99.74% 164 301579 sun.nio.ch.EPollArrayWrapper.epollWait >>> 17 0.01% 99.75% 149 300938 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> TRACE 300612: >>> sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) >>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >>> sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) >>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:305)
-
Re: How to speedup Hbase query throughputTed Dunning 2011-05-17, 13:50
Are your keys arranged so that you have a problem with a hot region?
On Mon, May 16, 2011 at 11:18 PM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > I have not applied hdfs-347, but done some other experiments. > > I increased client thread to 2000 to put enough pressure on cluster. I > disabled RS block cache. The total TPS is still low (with Month+User > as row key, it is about 1300 for 10 RS+DN and with User+Month it is > 700). > > I used BTrace to log the time spent on each HTable.get on RS. It shows > that most of the GETs use 20~50ms and there are many GETs need > >1000ms. And almost all these times are spent on DFSClient$BlockReader > to read data from DN. But, the network usage is not high (<100Mb/s, we > have a giganet), so network is not a problem. > > Since for each DFS block read, there is a socket connection created. I > use netstat to caculate the TCP connections on 50010 port (DN listen > port) for each RS+DN server. It shows that there are always one or two > DNs have high such connection number (>200) while other DNs have low > number (<20). And the high connection DNs have high disk I/O usage > (about 100%) while other DNs have low disk I/O. This phenoma lasts > for days and the hot machine is always the hot one. > > The high connection number mainly comes from local region server > request (~80%). > > According to the source code of DFSClient, it prefers to use local DN > to fetch block. But, why certain machine is so popular? All my servers > have almost the same configuration. > > 2011/4/29 Stack <[EMAIL PROTECTED]>: > > Yes, you could try applying hdfs-347 to your hdfs as J-D suggests. Do > > your numbers change if you run your client from more than one machine? > > St.Ack > > > > On Thu, Apr 28, 2011 at 2:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> > wrote: > >> Seems to be a case of HDFS-347. > >> > >> J-D > >> > >> On Thu, Apr 28, 2011 at 12:55 AM, Weihua JIANG <[EMAIL PROTECTED]> > wrote: > >>> After solving HBASE-3561, I successfully run hprof for RS and DN. > >>> Since block cache is useless in my case, I disabled it. I rerun my > >>> test with 14 RS+DNs and 1 client with 200 threads. But, the throughput > >>> is still only about 700. No scalability shown in this case. > >>> > >>> Below is the hot spots in RS: > >>> CPU SAMPLES BEGIN (total = 1469756) Thu Apr 28 15:43:35 2011 > >>> rank self accum count trace method > >>> 1 44.33% 44.33% 651504 300612 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 2 19.88% 64.21% 292221 301351 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 3 8.88% 73.09% 130582 300554 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 4 4.43% 77.52% 65106 301248 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 5 4.43% 81.95% 65104 301249 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 6 4.43% 86.38% 65100 301247 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 7 4.43% 90.81% 65061 301266 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 8 4.32% 95.13% 63465 301565 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 9 2.31% 97.43% 33894 301555 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 10 1.76% 99.19% 25841 301588 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 11 0.48% 99.67% 7025 301443sun.nio.ch.EPollArrayWrapper.epollWait > >>> 12 0.02% 99.69% 341 301568 sun.nio.ch.NativeThread.current > >>> 13 0.01% 99.71% 187 301535 > >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize > >>> 14 0.01% 99.72% 186 301538 > >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize > >>> 15 0.01% 99.73% 170 301625 > org.apache.hadoop.util.DataChecksum.update > >>> 16 0.01% 99.74% 164 301579 sun.nio.ch.EPollArrayWrapper.epollWait > >>> 17 0.01% 99.75% 149 300938 sun.nio.ch.EPollArrayWrapper.epollWait > >>> > >>> TRACE 300612: > >>> > sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) > >>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) > >>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-05-17, 13:57
No. The key is generated randomly. In theory, it shall distributed to
all the RSs equally. Thanks Weihua 2011/5/17 Ted Dunning <[EMAIL PROTECTED]>: > Are your keys arranged so that you have a problem with a hot region? > > On Mon, May 16, 2011 at 11:18 PM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > >> I have not applied hdfs-347, but done some other experiments. >> >> I increased client thread to 2000 to put enough pressure on cluster. I >> disabled RS block cache. The total TPS is still low (with Month+User >> as row key, it is about 1300 for 10 RS+DN and with User+Month it is >> 700). >> >> I used BTrace to log the time spent on each HTable.get on RS. It shows >> that most of the GETs use 20~50ms and there are many GETs need >> >1000ms. And almost all these times are spent on DFSClient$BlockReader >> to read data from DN. But, the network usage is not high (<100Mb/s, we >> have a giganet), so network is not a problem. >> >> Since for each DFS block read, there is a socket connection created. I >> use netstat to caculate the TCP connections on 50010 port (DN listen >> port) for each RS+DN server. It shows that there are always one or two >> DNs have high such connection number (>200) while other DNs have low >> number (<20). And the high connection DNs have high disk I/O usage >> (about 100%) while other DNs have low disk I/O. This phenoma lasts >> for days and the hot machine is always the hot one. >> >> The high connection number mainly comes from local region server >> request (~80%). >> >> According to the source code of DFSClient, it prefers to use local DN >> to fetch block. But, why certain machine is so popular? All my servers >> have almost the same configuration. >> >> 2011/4/29 Stack <[EMAIL PROTECTED]>: >> > Yes, you could try applying hdfs-347 to your hdfs as J-D suggests. ��Do >> > your numbers change if you run your client from more than one machine? >> > St.Ack >> > >> > On Thu, Apr 28, 2011 at 2:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> >> wrote: >> >> Seems to be a case of HDFS-347. >> >> >> >> J-D >> >> >> >> On Thu, Apr 28, 2011 at 12:55 AM, Weihua JIANG <[EMAIL PROTECTED]> >> wrote: >> >>> After solving HBASE-3561, I successfully run hprof for RS and DN. >> >>> Since block cache is useless in my case, I disabled it. I rerun my >> >>> test with 14 RS+DNs and 1 client with 200 threads. But, the throughput >> >>> is still only about 700. No scalability shown in this case. >> >>> >> >>> Below is the hot spots in RS: >> >>> CPU SAMPLES BEGIN (total = 1469756) Thu Apr 28 15:43:35 2011 >> >>> rank self accum count trace method >> >>> 1 44.33% 44.33% 651504 300612 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 2 19.88% 64.21% 292221 301351 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 3 8.88% 73.09% 130582 300554 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 4 4.43% 77.52% 65106 301248 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 5 4.43% 81.95% 65104 301249 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 6 4.43% 86.38% 65100 301247 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 7 4.43% 90.81% 65061 301266 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 8 4.32% 95.13% 63465 301565 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 9 2.31% 97.43% 33894 301555 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 10 1.76% 99.19% 25841 301588 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 11 0.48% 99.67% 7025 301443sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 12 0.02% 99.69% 341 301568 sun.nio.ch.NativeThread.current >> >>> 13 0.01% 99.71% 187 301535 >> >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize >> >>> 14 0.01% 99.72% 186 301538 >> >>> org.apache.hadoop.hbase.io.hfile.HFile$Reader.indexSize >> >>> 15 0.01% 99.73% 170 301625 >> org.apache.hadoop.util.DataChecksum.update >> >>> 16 0.01% 99.74% 164 301579 sun.nio.ch.EPollArrayWrapper.epollWait >> >>> 17 0.01% 99.75% 149 300938 sun.nio.ch.EPollArrayWrapper.epollWait
-
Re: How to speedup Hbase query throughputStack 2011-05-17, 14:33
Nice analysis.
Can you figure the most popular blocks requested? You could figure which files they belong too by grepping the blocks in namenode log. It is odd that you have the sort of a request profile if your loading was even. I'd expect the DN distribution to be even. Sounds like hdfs-347 would help for sure. St.Ack On Tue, May 17, 2011 at 6:57 AM, Weihua JIANG <[EMAIL PROTECTED]> wrote: > No. The key is generated randomly. In theory, it shall distributed to > all the RSs equally. > > Thanks > Weihua > > 2011/5/17 Ted Dunning <[EMAIL PROTECTED]>: >> Are your keys arranged so that you have a problem with a hot region? >> >> On Mon, May 16, 2011 at 11:18 PM, Weihua JIANG <[EMAIL PROTECTED]>wrote: >> >>> I have not applied hdfs-347, but done some other experiments. >>> >>> I increased client thread to 2000 to put enough pressure on cluster. I >>> disabled RS block cache. The total TPS is still low (with Month+User >>> as row key, it is about 1300 for 10 RS+DN and with User+Month it is >>> 700). >>> >>> I used BTrace to log the time spent on each HTable.get on RS. It shows >>> that most of the GETs use 20~50ms and there are many GETs need >>> >1000ms. And almost all these times are spent on DFSClient$BlockReader >>> to read data from DN. But, the network usage is not high (<100Mb/s, we >>> have a giganet), so network is not a problem. >>> >>> Since for each DFS block read, there is a socket connection created. I >>> use netstat to caculate the TCP connections on 50010 port (DN listen >>> port) for each RS+DN server. It shows that there are always one or two >>> DNs have high such connection number (>200) while other DNs have low >>> number (<20). And the high connection DNs have high disk I/O usage >>> (about 100%) while other DNs have low disk I/O. This phenoma lasts >>> for days and the hot machine is always the hot one. >>> >>> The high connection number mainly comes from local region server >>> request (~80%). >>> >>> According to the source code of DFSClient, it prefers to use local DN >>> to fetch block. But, why certain machine is so popular? All my servers >>> have almost the same configuration. >>> >>> 2011/4/29 Stack <[EMAIL PROTECTED]>: >>> > Yes, you could try applying hdfs-347 to your hdfs as J-D suggests. Do >>> > your numbers change if you run your client from more than one machine? >>> > St.Ack >>> > >>> > On Thu, Apr 28, 2011 at 2:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> >>> wrote: >>> >> Seems to be a case of HDFS-347. >>> >> >>> >> J-D >>> >> >>> >> On Thu, Apr 28, 2011 at 12:55 AM, Weihua JIANG <[EMAIL PROTECTED]> >>> wrote: >>> >>> After solving HBASE-3561, I successfully run hprof for RS and DN. >>> >>> Since block cache is useless in my case, I disabled it. I rerun my >>> >>> test with 14 RS+DNs and 1 client with 200 threads. But, the throughput >>> >>> is still only about 700. No scalability shown in this case. >>> >>> >>> >>> Below is the hot spots in RS: >>> >>> CPU SAMPLES BEGIN (total = 1469756) Thu Apr 28 15:43:35 2011 >>> >>> rank self accum count trace method >>> >>> 1 44.33% 44.33% 651504 300612 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 2 19.88% 64.21% 292221 301351 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 3 8.88% 73.09% 130582 300554 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 4 4.43% 77.52% 65106 301248 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 5 4.43% 81.95% 65104 301249 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 6 4.43% 86.38% 65100 301247 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 7 4.43% 90.81% 65061 301266 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 8 4.32% 95.13% 63465 301565 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 9 2.31% 97.43% 33894 301555 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 10 1.76% 99.19% 25841 301588 sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 11 0.48% 99.67% 7025 301443sun.nio.ch.EPollArrayWrapper.epollWait >>> >>> 12 0.02% 99.69% 341 301568 sun.nio.ch.NativeThread.current
-
RE: How to speedup Hbase query throughputMichael Segel 2011-05-17, 14:47
Sorry to jump in on the tail end. What do you mean to say that they key is generated randomly? I mean are you using a key and then applying a SHA-1 hash? Which node is serving your -ROOT- and META tables? Have you applied the GC hints recommended by Todd L in his blog? Also you said: ' And almost all these times are spent on DFSClient$BlockReader to read data from DN. ' What speed disks are you using and how many disks per node? (you could be blocked on disk i/o.) -Mike ---------------------------------------- > Date: Tue, 17 May 2011 07:33:34 -0700 > Subject: Re: How to speedup Hbase query throughput > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Nice analysis. > > Can you figure the most popular blocks requested? You could figure > which files they belong too by grepping the blocks in namenode log. > > It is odd that you have the sort of a request profile if your loading > was even. I'd expect the DN distribution to be even. > > Sounds like hdfs-347 would help for sure. > > St.Ack > > > On Tue, May 17, 2011 at 6:57 AM, Weihua JIANG wrote: > > No. The key is generated randomly. In theory, it shall distributed to > > all the RSs equally. > > > > Thanks > > Weihua > > > > 2011/5/17 Ted Dunning : > >> Are your keys arranged so that you have a problem with a hot region? > >> > >> On Mon, May 16, 2011 at 11:18 PM, Weihua JIANG wrote: > >> > >>> I have not applied hdfs-347, but done some other experiments. > >>> > >>> I increased client thread to 2000 to put enough pressure on cluster. I > >>> disabled RS block cache. The total TPS is still low (with Month+User > >>> as row key, it is about 1300 for 10 RS+DN and with User+Month it is > >>> 700). > >>> > >>> I used BTrace to log the time spent on each HTable.get on RS. It shows > >>> that most of the GETs use 20~50ms and there are many GETs need > >>> >1000ms. And almost all these times are spent on DFSClient$BlockReader > >>> to read data from DN. But, the network usage is not high (<100Mb/s, we > >>> have a giganet), so network is not a problem. > >>> > >>> Since for each DFS block read, there is a socket connection created. I > >>> use netstat to caculate the TCP connections on 50010 port (DN listen > >>> port) for each RS+DN server. It shows that there are always one or two > >>> DNs have high such connection number (>200) while other DNs have low > >>> number (<20). And the high connection DNs have high disk I/O usage > >>> (about 100%) while other DNs have low disk I/O. This phenoma lasts > >>> for days and the hot machine is always the hot one. > >>> > >>> The high connection number mainly comes from local region server > >>> request (~80%). > >>> > >>> According to the source code of DFSClient, it prefers to use local DN > >>> to fetch block. But, why certain machine is so popular? All my servers > >>> have almost the same configuration. > >>> > >>> 2011/4/29 Stack : > >>> > Yes, you could try applying hdfs-347 to your hdfs as J-D suggests. Do > >>> > your numbers change if you run your client from more than one machine? > >>> > St.Ack > >>> > > >>> > On Thu, Apr 28, 2011 at 2:56 PM, Jean-Daniel Cryans > >>> wrote: > >>> >> Seems to be a case of HDFS-347. > >>> >> > >>> >> J-D > >>> >> > >>> >> On Thu, Apr 28, 2011 at 12:55 AM, Weihua JIANG > >>> wrote: > >>> >>> After solving HBASE-3561, I successfully run hprof for RS and DN. > >>> >>> Since block cache is useless in my case, I disabled it. I rerun my > >>> >>> test with 14 RS+DNs and 1 client with 200 threads. But, the throughput > >>> >>> is still only about 700. No scalability shown in this case. > >>> >>> > >>> >>> Below is the hot spots in RS: > >>> >>> CPU SAMPLES BEGIN (total = 1469756) Thu Apr 28 15:43:35 2011 > >>> >>> rank self accum count trace method > >>> >>> 1 44.33% 44.33% 651504 300612 sun.nio.ch.EPollArrayWrapper.epollWait > >>> >>> 2 19.88% 64.21% 292221 301351 sun.nio.ch.EPollArrayWrapper.epollWait > >>> >>> 3 8.88% 73.09% 130582 300554 sun.nio.ch.EPollArrayWrapper.epollWait
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-05-18, 03:03
-ROOT- and .META. table are not served by these hot region servers.
I generate the key in random and verified at client by grepping .META. table and record the mapping from each query to its serving region server. It shows that each RS serves almost the same number of query requests. For GC hints, can you give me a link? I only found Todd's posts about GC tuning for write. But, in my case, I only perform query. So, the one I found seems no help to me. Thanks Weihua 2011/5/17 Michael Segel <[EMAIL PROTECTED]>: > > Sorry to jump in on the tail end. > > What do you mean to say that they key is generated randomly? > > I mean are you using a key and then applying a SHA-1 hash? > > Which node is serving your -ROOT- and META tables? > > Have you applied the GC hints recommended by Todd L in his blog? > > > Also you said: > ' > And almost all these times are spent on DFSClient$BlockReader > to read data from DN. > ' > What speed disks are you using and how many disks per node? > (you could be blocked on disk i/o.) > > > -Mike > > > ---------------------------------------- >> Date: Tue, 17 May 2011 07:33:34 -0700 >> Subject: Re: How to speedup Hbase query throughput >> From: [EMAIL PROTECTED] >> To: [EMAIL PROTECTED] >> >> Nice analysis. >> >> Can you figure the most popular blocks requested? You could figure >> which files they belong too by grepping the blocks in namenode log. >> >> It is odd that you have the sort of a request profile if your loading >> was even. I'd expect the DN distribution to be even. >> >> Sounds like hdfs-347 would help for sure. >> >> St.Ack >> >> >> On Tue, May 17, 2011 at 6:57 AM, Weihua JIANG wrote: >> > No. The key is generated randomly. In theory, it shall distributed to >> > all the RSs equally. >> > >> > Thanks >> > Weihua >> > >> > 2011/5/17 Ted Dunning : >> >> Are your keys arranged so that you have a problem with a hot region? >> >> >> >> On Mon, May 16, 2011 at 11:18 PM, Weihua JIANG wrote: >> >> >> >>> I have not applied hdfs-347, but done some other experiments. >> >>> >> >>> I increased client thread to 2000 to put enough pressure on cluster. I >> >>> disabled RS block cache. The total TPS is still low (with Month+User >> >>> as row key, it is about 1300 for 10 RS+DN and with User+Month it is >> >>> 700). >> >>> >> >>> I used BTrace to log the time spent on each HTable.get on RS. It shows >> >>> that most of the GETs use 20~50ms and there are many GETs need >> >>> >1000ms. And almost all these times are spent on DFSClient$BlockReader >> >>> to read data from DN. But, the network usage is not high (<100Mb/s, we >> >>> have a giganet), so network is not a problem. >> >>> >> >>> Since for each DFS block read, there is a socket connection created. I >> >>> use netstat to caculate the TCP connections on 50010 port (DN listen >> >>> port) for each RS+DN server. It shows that there are always one or two >> >>> DNs have high such connection number (>200) while other DNs have low >> >>> number (<20). And the high connection DNs have high disk I/O usage >> >>> (about 100%) while other DNs have low disk I/O. This phenoma lasts >> >>> for days and the hot machine is always the hot one. >> >>> >> >>> The high connection number mainly comes from local region server >> >>> request (~80%). >> >>> >> >>> According to the source code of DFSClient, it prefers to use local DN >> >>> to fetch block. But, why certain machine is so popular? All my servers >> >>> have almost the same configuration. >> >>> >> >>> 2011/4/29 Stack : >> >>> > Yes, you could try applying hdfs-347 to your hdfs as J-D suggests. Do >> >>> > your numbers change if you run your client from more than one machine? >> >>> > St.Ack >> >>> > >> >>> > On Thu, Apr 28, 2011 at 2:56 PM, Jean-Daniel Cryans >> >>> wrote: >> >>> >> Seems to be a case of HDFS-347. >> >>> >> >> >>> >> J-D >> >>> >> >> >>> >> On Thu, Apr 28, 2011 at 12:55 AM, Weihua JIANG >> >>> wrote: >> >>> >>> After solving HBASE-3561, I successfully run hprof for RS and DN.
-
Re: How to speedup Hbase query throughputStack 2011-05-18, 14:50
Are there more blocks on these hot DNs than there are on the cool
ones? If you run a major compaction and then run your tests, does it make a difference? St.Ack On Tue, May 17, 2011 at 8:03 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote: > -ROOT- and .META. table are not served by these hot region servers. > > I generate the key in random and verified at client by grepping .META. > table and record the mapping from each query to its serving region > server. It shows that each RS serves almost the same number of query > requests. > > For GC hints, can you give me a link? I only found Todd's posts about > GC tuning for write. But, in my case, I only perform query. So, the > one I found seems no help to me. > > Thanks > Weihua > > 2011/5/17 Michael Segel <[EMAIL PROTECTED]>: >> >> Sorry to jump in on the tail end. >> >> What do you mean to say that they key is generated randomly? >> >> I mean are you using a key and then applying a SHA-1 hash? >> >> Which node is serving your -ROOT- and META tables? >> >> Have you applied the GC hints recommended by Todd L in his blog? >> >> >> Also you said: >> ' >> And almost all these times are spent on DFSClient$BlockReader >> to read data from DN. >> ' >> What speed disks are you using and how many disks per node? >> (you could be blocked on disk i/o.) >> >> >> -Mike >> >> >> ---------------------------------------- >>> Date: Tue, 17 May 2011 07:33:34 -0700 >>> Subject: Re: How to speedup Hbase query throughput >>> From: [EMAIL PROTECTED] >>> To: [EMAIL PROTECTED] >>> >>> Nice analysis. >>> >>> Can you figure the most popular blocks requested? You could figure >>> which files they belong too by grepping the blocks in namenode log. >>> >>> It is odd that you have the sort of a request profile if your loading >>> was even. I'd expect the DN distribution to be even. >>> >>> Sounds like hdfs-347 would help for sure. >>> >>> St.Ack >>> >>> >>> On Tue, May 17, 2011 at 6:57 AM, Weihua JIANG wrote: >>> > No. The key is generated randomly. In theory, it shall distributed to >>> > all the RSs equally. >>> > >>> > Thanks >>> > Weihua >>> > >>> > 2011/5/17 Ted Dunning : >>> >> Are your keys arranged so that you have a problem with a hot region? >>> >> >>> >> On Mon, May 16, 2011 at 11:18 PM, Weihua JIANG wrote: >>> >> >>> >>> I have not applied hdfs-347, but done some other experiments. >>> >>> >>> >>> I increased client thread to 2000 to put enough pressure on cluster. I >>> >>> disabled RS block cache. The total TPS is still low (with Month+User >>> >>> as row key, it is about 1300 for 10 RS+DN and with User+Month it is >>> >>> 700). >>> >>> >>> >>> I used BTrace to log the time spent on each HTable.get on RS. It shows >>> >>> that most of the GETs use 20~50ms and there are many GETs need >>> >>> >1000ms. And almost all these times are spent on DFSClient$BlockReader >>> >>> to read data from DN. But, the network usage is not high (<100Mb/s, we >>> >>> have a giganet), so network is not a problem. >>> >>> >>> >>> Since for each DFS block read, there is a socket connection created. I >>> >>> use netstat to caculate the TCP connections on 50010 port (DN listen >>> >>> port) for each RS+DN server. It shows that there are always one or two >>> >>> DNs have high such connection number (>200) while other DNs have low >>> >>> number (<20). And the high connection DNs have high disk I/O usage >>> >>> (about 100%) while other DNs have low disk I/O. This phenoma lasts >>> >>> for days and the hot machine is always the hot one. >>> >>> >>> >>> The high connection number mainly comes from local region server >>> >>> request (~80%). >>> >>> >>> >>> According to the source code of DFSClient, it prefers to use local DN >>> >>> to fetch block. But, why certain machine is so popular? All my servers >>> >>> have almost the same configuration. >>> >>> >>> >>> 2011/4/29 Stack : >>> >>> > Yes, you could try applying hdfs-347 to your hdfs as J-D suggests. Do >>> >>> > your numbers change if you run your client from more than one machine?
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-05-19, 00:11
All the DNs almost have the same number of blocks. Major compaction
makes no difference. Thanks Weihua 2011/5/18 Stack <[EMAIL PROTECTED]>: > Are there more blocks on these hot DNs than there are on the cool > ones? If you run a major compaction and then run your tests, does it > make a difference? > St.Ack > > On Tue, May 17, 2011 at 8:03 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote: >> -ROOT- and .META. table are not served by these hot region servers. >> >> I generate the key in random and verified at client by grepping .META. >> table and record the mapping from each query to its serving region >> server. It shows that each RS serves almost the same number of query >> requests. >> >> For GC hints, can you give me a link? I only found Todd's posts about >> GC tuning for write. But, in my case, I only perform query. So, the >> one I found seems no help to me. >> >> Thanks >> Weihua >> >> 2011/5/17 Michael Segel <[EMAIL PROTECTED]>: >>> >>> Sorry to jump in on the tail end. >>> >>> What do you mean to say that they key is generated randomly? >>> >>> I mean are you using a key and then applying a SHA-1 hash? >>> >>> Which node is serving your -ROOT- and META tables? >>> >>> Have you applied the GC hints recommended by Todd L in his blog? >>> >>> >>> Also you said: >>> ' >>> And almost all these times are spent on DFSClient$BlockReader >>> to read data from DN. >>> ' >>> What speed disks are you using and how many disks per node? >>> (you could be blocked on disk i/o.) >>> >>> >>> -Mike >>> >>> >>> ---------------------------------------- >>>> Date: Tue, 17 May 2011 07:33:34 -0700 >>>> Subject: Re: How to speedup Hbase query throughput >>>> From: [EMAIL PROTECTED] >>>> To: [EMAIL PROTECTED] >>>> >>>> Nice analysis. >>>> >>>> Can you figure the most popular blocks requested? You could figure >>>> which files they belong too by grepping the blocks in namenode log. >>>> >>>> It is odd that you have the sort of a request profile if your loading >>>> was even. I'd expect the DN distribution to be even. >>>> >>>> Sounds like hdfs-347 would help for sure. >>>> >>>> St.Ack >>>> >>>> >>>> On Tue, May 17, 2011 at 6:57 AM, Weihua JIANG wrote: >>>> > No. The key is generated randomly. In theory, it shall distributed to >>>> > all the RSs equally. >>>> > >>>> > Thanks >>>> > Weihua >>>> > >>>> > 2011/5/17 Ted Dunning : >>>> >> Are your keys arranged so that you have a problem with a hot region? >>>> >> >>>> >> On Mon, May 16, 2011 at 11:18 PM, Weihua JIANG wrote: >>>> >> >>>> >>> I have not applied hdfs-347, but done some other experiments. >>>> >>> >>>> >>> I increased client thread to 2000 to put enough pressure on cluster. I >>>> >>> disabled RS block cache. The total TPS is still low (with Month+User >>>> >>> as row key, it is about 1300 for 10 RS+DN and with User+Month it is >>>> >>> 700). >>>> >>> >>>> >>> I used BTrace to log the time spent on each HTable.get on RS. It shows >>>> >>> that most of the GETs use 20~50ms and there are many GETs need >>>> >>> >1000ms. And almost all these times are spent on DFSClient$BlockReader >>>> >>> to read data from DN. But, the network usage is not high (<100Mb/s, we >>>> >>> have a giganet), so network is not a problem. >>>> >>> >>>> >>> Since for each DFS block read, there is a socket connection created. I >>>> >>> use netstat to caculate the TCP connections on 50010 port (DN listen >>>> >>> port) for each RS+DN server. It shows that there are always one or two >>>> >>> DNs have high such connection number (>200) while other DNs have low >>>> >>> number (<20). And the high connection DNs have high disk I/O usage >>>> >>> (about 100%) while other DNs have low disk I/O. This phenoma lasts >>>> >>> for days and the hot machine is always the hot one. >>>> >>> >>>> >>> The high connection number mainly comes from local region server >>>> >>> request (~80%). >>>> >>> >>>> >>> According to the source code of DFSClient, it prefers to use local DN >>>> >>> to fetch block. But, why certain machine is so popular? All my servers
-
Re: How to speedup Hbase query throughputStack 2011-05-19, 04:27
On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote:
> All the DNs almost have the same number of blocks. Major compaction > makes no difference. > I would expect major compaction to even the number of blocks across the cluster and it'd move the data for each region local to the regionserver. The only explanation that I can see is that the hot DNs must be carrying the hot blocks (The client querys are not random). I do not know what else it could be. St.Ack
-
Re: How to speedup Hbase query throughputMichel Segel 2011-05-19, 11:42
I had asked the question about how he created random keys... Hadn't seen a response.
Sent from a remote device. Please excuse any typos... Mike Segel On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote: > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote: >> All the DNs almost have the same number of blocks. Major compaction >> makes no difference. >> > > I would expect major compaction to even the number of blocks across > the cluster and it'd move the data for each region local to the > regionserver. > > The only explanation that I can see is that the hot DNs must be > carrying the hot blocks (The client querys are not random). I do not > know what else it could be. > > St.Ack >
-
Re: How to speedup Hbase query throughputMatt Corgan 2011-05-19, 15:15
I wanted to do some more investigation before posting to the list, but it
seems relevant to this conversation... Is it possible that major compactions don't always localize the data blocks? Our cluster had a bunch of regions full of historical analytics data that were already major compacted, then we added a new datanode/regionserver. We have a job that triggers major compactions at a minimum of once per week by hashing the region name and giving it a time slot. It's been several weeks and the original nodes each have ~480gb used in hdfs, while the new node has only 240gb. Regions are scattered pretty randomly and evenly among the regionservers. The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); My guess is that if a region is already major compacted and no new data has been added to it, then compaction is skipped. That's definitely an essential feature during typical operation, but it's a problem if you're relying on major compaction to balance the cluster. Matt On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[EMAIL PROTECTED]>wrote: > I had asked the question about how he created random keys... Hadn't seen a > response. > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote: > > > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED]> > wrote: > >> All the DNs almost have the same number of blocks. Major compaction > >> makes no difference. > >> > > > > I would expect major compaction to even the number of blocks across > > the cluster and it'd move the data for each region local to the > > regionserver. > > > > The only explanation that I can see is that the hot DNs must be > > carrying the hot blocks (The client querys are not random). I do not > > know what else it could be. > > > > St.Ack > > >
-
Re: How to speedup Hbase query throughputJoey Echeverria 2011-05-19, 15:23
Am I right to assume that all of your data is in HBase, ie you don't
keep anything in just HDFS files? -Joey On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > I wanted to do some more investigation before posting to the list, but it > seems relevant to this conversation... > > Is it possible that major compactions don't always localize the data blocks? > Our cluster had a bunch of regions full of historical analytics data that > were already major compacted, then we added a new datanode/regionserver. We > have a job that triggers major compactions at a minimum of once per week by > hashing the region name and giving it a time slot. It's been several weeks > and the original nodes each have ~480gb used in hdfs, while the new node has > only 240gb. Regions are scattered pretty randomly and evenly among the > regionservers. > > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); > > My guess is that if a region is already major compacted and no new data has > been added to it, then compaction is skipped. That's definitely an > essential feature during typical operation, but it's a problem if you're > relying on major compaction to balance the cluster. > > Matt > > > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[EMAIL PROTECTED]>wrote: > >> I had asked the question about how he created random keys... Hadn't seen a >> response. >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote: >> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED]> >> wrote: >> >> All the DNs almost have the same number of blocks. Major compaction >> >> makes no difference. >> >> >> > >> > I would expect major compaction to even the number of blocks across >> > the cluster and it'd move the data for each region local to the >> > regionserver. >> > >> > The only explanation that I can see is that the hot DNs must be >> > carrying the hot blocks (The client querys are not random). I do not >> > know what else it could be. >> > >> > St.Ack >> > >> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
-
Re: How to speedup Hbase query throughputMatt Corgan 2011-05-19, 15:35
that's right
On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > Am I right to assume that all of your data is in HBase, ie you don't > keep anything in just HDFS files? > > -Joey > > On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > I wanted to do some more investigation before posting to the list, but it > > seems relevant to this conversation... > > > > Is it possible that major compactions don't always localize the data > blocks? > > Our cluster had a bunch of regions full of historical analytics data > that > > were already major compacted, then we added a new datanode/regionserver. > We > > have a job that triggers major compactions at a minimum of once per week > by > > hashing the region name and giving it a time slot. It's been several > weeks > > and the original nodes each have ~480gb used in hdfs, while the new node > has > > only 240gb. Regions are scattered pretty randomly and evenly among the > > regionservers. > > > > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); > > > > My guess is that if a region is already major compacted and no new data > has > > been added to it, then compaction is skipped. That's definitely an > > essential feature during typical operation, but it's a problem if you're > > relying on major compaction to balance the cluster. > > > > Matt > > > > > > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[EMAIL PROTECTED] > >wrote: > > > >> I had asked the question about how he created random keys... Hadn't seen > a > >> response. > >> > >> Sent from a remote device. Please excuse any typos... > >> > >> Mike Segel > >> > >> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote: > >> > >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED] > > > >> wrote: > >> >> All the DNs almost have the same number of blocks. Major compaction > >> >> makes no difference. > >> >> > >> > > >> > I would expect major compaction to even the number of blocks across > >> > the cluster and it'd move the data for each region local to the > >> > regionserver. > >> > > >> > The only explanation that I can see is that the hot DNs must be > >> > carrying the hot blocks (The client querys are not random). I do not > >> > know what else it could be. > >> > > >> > St.Ack > >> > > >> > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
-
Re: How to speedup Hbase query throughputJoey Echeverria 2011-05-19, 15:39
I'm surprised the major compactions didn't balance the cluster better.
I wonder if you've stumbled upon a bug in HBase that's causing it to leak old HFiles. Is the total amount of data in HDFS what you expect? -Joey On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > that's right > > > On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > >> Am I right to assume that all of your data is in HBase, ie you don't >> keep anything in just HDFS files? >> >> -Joey >> >> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: >> > I wanted to do some more investigation before posting to the list, but it >> > seems relevant to this conversation... >> > >> > Is it possible that major compactions don't always localize the data >> blocks? >> > Our cluster had a bunch of regions full of historical analytics data >> that >> > were already major compacted, then we added a new datanode/regionserver. >> We >> > have a job that triggers major compactions at a minimum of once per week >> by >> > hashing the region name and giving it a time slot. It's been several >> weeks >> > and the original nodes each have ~480gb used in hdfs, while the new node >> has >> > only 240gb. Regions are scattered pretty randomly and evenly among the >> > regionservers. >> > >> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); >> > >> > My guess is that if a region is already major compacted and no new data >> has >> > been added to it, then compaction is skipped. That's definitely an >> > essential feature during typical operation, but it's a problem if you're >> > relying on major compaction to balance the cluster. >> > >> > Matt >> > >> > >> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel <[EMAIL PROTECTED] >> >wrote: >> > >> >> I had asked the question about how he created random keys... Hadn't seen >> a >> >> response. >> >> >> >> Sent from a remote device. Please excuse any typos... >> >> >> >> Mike Segel >> >> >> >> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote: >> >> >> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG <[EMAIL PROTECTED] >> > >> >> wrote: >> >> >> All the DNs almost have the same number of blocks. Major compaction >> >> >> makes no difference. >> >> >> >> >> > >> >> > I would expect major compaction to even the number of blocks across >> >> > the cluster and it'd move the data for each region local to the >> >> > regionserver. >> >> > >> >> > The only explanation that I can see is that the hot DNs must be >> >> > carrying the hot blocks (The client querys are not random). I do not >> >> > know what else it could be. >> >> > >> >> > St.Ack >> >> > >> >> >> > >> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
-
Re: How to speedup Hbase query throughputMatt Corgan 2011-05-19, 19:41
I think i traced this to a bug in my compaction scheduler that would have
missed scheduling about half the regions, hence the 240gb vs 480gb. To confirm: major compaction will always run when asked, even if the region is already major compacted, the table settings haven't changed, and it was last major compacted on that same server. [potential hbase optimization here for clusters with many cold regions]. So my theory about not localizing blocks is false. Weihua - why do you think your throughput doubled when you went from user+month to month+user keys? Are your queries using an even distribution of months? I'm not exactly clear on your schema or query pattern. On Thu, May 19, 2011 at 8:39 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > I'm surprised the major compactions didn't balance the cluster better. > I wonder if you've stumbled upon a bug in HBase that's causing it to > leak old HFiles. > > Is the total amount of data in HDFS what you expect? > > -Joey > > On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > that's right > > > > > > On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[EMAIL PROTECTED]> > wrote: > > > >> Am I right to assume that all of your data is in HBase, ie you don't > >> keep anything in just HDFS files? > >> > >> -Joey > >> > >> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> > wrote: > >> > I wanted to do some more investigation before posting to the list, but > it > >> > seems relevant to this conversation... > >> > > >> > Is it possible that major compactions don't always localize the data > >> blocks? > >> > Our cluster had a bunch of regions full of historical analytics data > >> that > >> > were already major compacted, then we added a new > datanode/regionserver. > >> We > >> > have a job that triggers major compactions at a minimum of once per > week > >> by > >> > hashing the region name and giving it a time slot. It's been several > >> weeks > >> > and the original nodes each have ~480gb used in hdfs, while the new > node > >> has > >> > only 240gb. Regions are scattered pretty randomly and evenly among > the > >> > regionservers. > >> > > >> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); > >> > > >> > My guess is that if a region is already major compacted and no new > data > >> has > >> > been added to it, then compaction is skipped. That's definitely an > >> > essential feature during typical operation, but it's a problem if > you're > >> > relying on major compaction to balance the cluster. > >> > > >> > Matt > >> > > >> > > >> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel < > [EMAIL PROTECTED] > >> >wrote: > >> > > >> >> I had asked the question about how he created random keys... Hadn't > seen > >> a > >> >> response. > >> >> > >> >> Sent from a remote device. Please excuse any typos... > >> >> > >> >> Mike Segel > >> >> > >> >> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote: > >> >> > >> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG < > [EMAIL PROTECTED] > >> > > >> >> wrote: > >> >> >> All the DNs almost have the same number of blocks. Major > compaction > >> >> >> makes no difference. > >> >> >> > >> >> > > >> >> > I would expect major compaction to even the number of blocks across > >> >> > the cluster and it'd move the data for each region local to the > >> >> > regionserver. > >> >> > > >> >> > The only explanation that I can see is that the hot DNs must be > >> >> > carrying the hot blocks (The client querys are not random). I do > not > >> >> > know what else it could be. > >> >> > > >> >> > St.Ack > >> >> > > >> >> > >> > > >> > >> > >> > >> -- > >> Joseph Echeverria > >> Cloudera, Inc. > >> 443.305.9434 > >> > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
-
Re: How to speedup Hbase query throughputWeihua JIANG 2011-05-20, 00:08
Sorry for missing the background.
We assume user is more interested in his latest bills than his old bills. Thus, the query generator is worked as below: 1. randomly generate a number and reverse it as user id. 2. randomly generate a prioritied month based on the above assumpation. 3. ask HBase to query this user + month. Thanks Weihua 2011/5/20 Matt Corgan <[EMAIL PROTECTED]>: > I think i traced this to a bug in my compaction scheduler that would have > missed scheduling about half the regions, hence the 240gb vs 480gb. ��To > confirm: major compaction will always run when asked, even if the region is > already major compacted, the table settings haven't changed, and it was last > major compacted on that same server. [potential hbase optimization here for > clusters with many cold regions]. So my theory about not localizing blocks > is false. > > Weihua - why do you think your throughput doubled when you went from > user+month to month+user keys? Are your queries using an even distribution > of months? I'm not exactly clear on your schema or query pattern. > > > On Thu, May 19, 2011 at 8:39 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > >> I'm surprised the major compactions didn't balance the cluster better. >> I wonder if you've stumbled upon a bug in HBase that's causing it to >> leak old HFiles. >> >> Is the total amount of data in HDFS what you expect? >> >> -Joey >> >> On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: >> > that's right >> > >> > >> > On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[EMAIL PROTECTED]> >> wrote: >> > >> >> Am I right to assume that all of your data is in HBase, ie you don't >> >> keep anything in just HDFS files? >> >> >> >> -Joey >> >> >> >> On Thu, May 19, 2011 at 8:15 AM, Matt Corgan <[EMAIL PROTECTED]> >> wrote: >> >> > I wanted to do some more investigation before posting to the list, but >> it >> >> > seems relevant to this conversation... >> >> > >> >> > Is it possible that major compactions don't always localize the data >> >> blocks? >> >> > Our cluster had a bunch of regions full of historical analytics data >> >> that >> >> > were already major compacted, then we added a new >> datanode/regionserver. >> >> We >> >> > have a job that triggers major compactions at a minimum of once per >> week >> >> by >> >> > hashing the region name and giving it a time slot. It's been several >> >> weeks >> >> > and the original nodes each have ~480gb used in hdfs, while the new >> node >> >> has >> >> > only 240gb. Regions are scattered pretty randomly and evenly among >> the >> >> > regionservers. >> >> > >> >> > The job calls hBaseAdmin.majorCompact(hRegionInfo.getRegionName()); >> >> > >> >> > My guess is that if a region is already major compacted and no new >> data >> >> has >> >> > been added to it, then compaction is skipped. That's definitely an >> >> > essential feature during typical operation, but it's a problem if >> you're >> >> > relying on major compaction to balance the cluster. >> >> > >> >> > Matt >> >> > >> >> > >> >> > On Thu, May 19, 2011 at 4:42 AM, Michel Segel < >> [EMAIL PROTECTED] >> >> >wrote: >> >> > >> >> >> I had asked the question about how he created random keys... Hadn't >> seen >> >> a >> >> >> response. >> >> >> >> >> >> Sent from a remote device. Please excuse any typos... >> >> >> >> >> >> Mike Segel >> >> >> >> >> >> On May 18, 2011, at 11:27 PM, Stack <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG < >> [EMAIL PROTECTED] >> >> > >> >> >> wrote: >> >> >> >> All the DNs almost have the same number of blocks. Major >> compaction >> >> >> >> makes no difference. >> >> >> >> >> >> >> > >> >> >> > I would expect major compaction to even the number of blocks across >> >> >> > the cluster and it'd move the data for each region local to the >> >> >> > regionserver. >> >> >> > >> >> >> > The only explanation that I can see is that the hot DNs must be >> >> >> > carrying the hot blocks (The client querys are not random). ��I do
-
Re: How to speedup Hbase query throughputMichel Segel 2011-05-20, 06:15
Ok.
This why I asked you earlier about how you were generating your user ids. You're not going to get a good distribution. First, random numbers usually aren't that random. How many users do you want to simulate? Try this... Create n number of type 5 uuids. These are uuids that have been generated, then hashed using a SHA-1hashing algo, and then truncated to the right number of bits. This will give you a more realistic random distribution of user ids. Note that you will have to remember the user ids! It will also be alpha numeric. Then you can use your 'month' as part of your key. However... I have to question your design again. Your billing by months means that you will only have 12 months of data and the data generation really isn't random. Meaning you don't generate your data out of sequence. Just a suggestion... It sounds like you're trying to simulate queries where users get created mid stream and don't always stick around. So when you create a user, you can also simulate his start/join date and his end date and then generate his 'billing' information. I would suggest that instead of using a random number for billing month that you actually create your own time stamp... I am also assuming that you are generating the data first and then running queries against a static data set? If this is true, and you create both the uuids and then the billing data, you'll get a better random data set that is going to be more realistic... Having said all of this... You have a couple of options.. First you can make your key month+userid, assuming you only have 12 months of data. Or you can make your key userid+month. This has the additional benefit of collocating your user's data. Or you could choose a third option.... You are trying to retrieve a user's billing data. This could be an object. So you could store the bill as a column in a table where the column id is the timestamp of the bill. If you want the last date first, you can do a simple trick... If you are using months... make the column id 99 - the month so that your data is in reverse order. Sent from a remote device. Please excuse any typos... Mike Segel On May 19, 2011, at 7:08 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote: > Sorry for missing the background. > > We assume user is more interested in his latest bills than his old > bills. Thus, the query generator is worked as below: > 1. randomly generate a number and reverse it as user id. > 2. randomly generate a prioritied month based on the above assumpation. > 3. ask HBase to query this user + month. > > Thanks > Weihua > > 2011/5/20 Matt Corgan <[EMAIL PROTECTED]>: >> I think i traced this to a bug in my compaction scheduler that would have >> missed scheduling about half the regions, hence the 240gb vs 480gb. To >> confirm: major compaction will always run when asked, even if the region is >> already major compacted, the table settings haven't changed, and it was last >> major compacted on that same server. [potential hbase optimization here for >> clusters with many cold regions]. So my theory about not localizing blocks >> is false. >> >> Weihua - why do you think your throughput doubled when you went from >> user+month to month+user keys? Are your queries using an even distribution >> of months? I'm not exactly clear on your schema or query pattern. >> >> >> On Thu, May 19, 2011 at 8:39 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: >> >>> I'm surprised the major compactions didn't balance the cluster better. >>> I wonder if you've stumbled upon a bug in HBase that's causing it to >>> leak old HFiles. >>> >>> Is the total amount of data in HDFS what you expect? >>> >>> -Joey >>> >>> On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: >>>> that's right >>>> >>>> >>>> On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria <[EMAIL PROTECTED]> >>> wrote: >>>> >>>>> Am I right to assume that all of your data is in HBase, ie you don't >>>>> keep anything in just HDFS files?
-
Re: How to speedup Hbase query throughputSegel, Mike 2011-05-20, 15:35
Not sure whybthis didn't make the list...
Sent from a remote device. Please excuse any typos... Mike Segel On May 20, 2011, at 1:15 AM, "Michel Segel" <[EMAIL PROTECTED]> wrote: > Ok. > This why I asked you earlier about how you were generating your user ids. > > You're not going to get a good distribution. > > First, random numbers usually aren't that random. > > How many users do you want to simulate? > Try this... > Create n number of type 5 uuids. These are uuids that have been generated, then hashed using a SHA-1hashing algo, and then truncated to the right number of bits. > > This will give you a more realistic random distribution of user ids. Note that you will have to remember the user ids! It will also be alpha numeric. > Then you can use your 'month' as part of your key. However... I have to question your design again. Your billing by months means that you will only have 12 months of data and the data generation really isn't random. Meaning you don't generate your data out of sequence. > > Just a suggestion... It sounds like you're trying to simulate queries where users get created mid stream and don't always stick around. So when you create a user, you can also simulate his start/join date and his end date and then generate his 'billing' information. I would suggest that instead of using a random number for billing month that you actually create your own time stamp... > > I am also assuming that you are generating the data first and then running queries against a static data set? > > If this is true, and you create both the uuids and then the billing data, you'll get a better random data set that is going to be more realistic... > > Having said all of this... > > You have a couple of options.. > > First you can make your key month+userid, assuming you only have 12 months of data. > Or you can make your key userid+month. This has the additional benefit of collocating your user's data. > > Or you could choose a third option.... > You are trying to retrieve a user's billing data. This could be an object. So you could store the bill as a column in a table where the column id is the timestamp of the bill. > > If you want the last date first, you can do a simple trick... If you are using months... make the column id 99 - the month so that your data is in reverse order. > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 19, 2011, at 7:08 PM, Weihua JIANG <[EMAIL PROTECTED]> wrote: > >> Sorry for missing the background. >> >> We assume user is more interested in his latest bills than his old >> bills. Thus, the query generator is worked as below: >> 1. randomly generate a number and reverse it as user id. >> 2. randomly generate a prioritied month based on the above assumpation. >> 3. ask HBase to query this user + month. >> >> Thanks >> Weihua >> >> 2011/5/20 Matt Corgan <[EMAIL PROTECTED]>: >>> I think i traced this to a bug in my compaction scheduler that would have >>> missed scheduling about half the regions, hence the 240gb vs 480gb. To >>> confirm: major compaction will always run when asked, even if the region is >>> already major compacted, the table settings haven't changed, and it was last >>> major compacted on that same server. [potential hbase optimization here for >>> clusters with many cold regions]. So my theory about not localizing blocks >>> is false. >>> >>> Weihua - why do you think your throughput doubled when you went from >>> user+month to month+user keys? Are your queries using an even distribution >>> of months? I'm not exactly clear on your schema or query pattern. >>> >>> >>> On Thu, May 19, 2011 at 8:39 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: >>> >>>> I'm surprised the major compactions didn't balance the cluster better. >>>> I wonder if you've stumbled upon a bug in HBase that's causing it to >>>> leak old HFiles. >>>> >>>> Is the total amount of data in HDFS what you expect? >>>> >>>> -Joey >>>> >>>> On Thu, May 19, 2011 at 8:35 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files. |