|
Jacob Isaac
2010-05-27, 19:09
Jean-Daniel Cryans
2010-05-27, 20:27
Jacob Isaac
2010-05-27, 21:57
Jean-Daniel Cryans
2010-05-27, 22:27
Vidhyashankar Venkatarama...
2010-05-28, 17:12
Jean-Daniel Cryans
2010-05-28, 17:15
Jacob Isaac
2010-05-28, 19:28
Jean-Daniel Cryans
2010-05-28, 19:42
Jacob Isaac
2010-05-28, 20:13
Vidhyashankar Venkatarama...
2010-05-28, 20:16
Jean-Daniel Cryans
2010-05-28, 20:20
Jacob Isaac
2010-05-28, 20:36
Jacob Isaac
2010-05-28, 23:11
Jean-Daniel Cryans
2010-05-29, 02:16
jacob@...)
2010-05-29, 03:25
Jean-Daniel Cryans
2010-05-29, 04:04
Stack
2010-05-29, 17:53
Stack
2010-05-29, 19:04
Jacob Isaac
2010-05-30, 00:52
Jacob Isaac
2010-05-30, 01:36
Stack
2010-05-30, 14:04
Stack
2010-05-30, 14:08
Jacob Isaac
2010-05-30, 16:22
Jacob Isaac
2010-05-30, 16:29
Stack
2010-05-31, 15:37
Vidhyashankar Venkatarama...
2010-06-01, 23:20
Jonathan Gray
2010-06-02, 00:10
Vidhyashankar Venkatarama...
2010-06-02, 15:24
Jacob Isaac
2010-06-02, 16:39
Stack
2010-06-02, 16:55
Jacob Isaac
2010-06-02, 20:17
|
-
Performance at large number of regions/nodeJacob Isaac 2010-05-27, 19:09
Hi
Wanted to find the group's experience on HBase performance with increasing number of regions/node. Also wanted to find out if there is an optimal number of regions one should aim for? We are currently using 17 node HBase(0.20.4) cluster on a 20 node Hadoop(0.20.2) cluster 16G RAM per node, 4G RAM for HBase space available for (Hadoop + HBase) ~ 1.5T /per node We are currently loading 2 tables each with ~100m rows resulting in ~ 4000 regions (Using the default for hbase.hregion.max.filesize=256m) and half the number of region when we double the value for hbase.hregion.max.filesize to 512m Although the two runs did not differ in the time taken ~ 9hrs With the current load we are only using 10% of the disk space available, full utilization would result in increased # of regions and hence wanted to find group's experience/suggestions in this regards. ~Jacob
-
Re: Performance at large number of regions/nodeJean-Daniel Cryans 2010-05-27, 20:27
With beefy nodes, don't be afraid of using bigger regions... and LZO.
At stumbleupon we have 1GB maxfilesize on our >13B rows table and LZO enabled on every table. The number of regions per node is a factor of so many things... size of rows, acces pattern, hardware, etc. FWIW, I would say that you should definitely not try to host as much data as available per machine, not even 50%. In fact, do the calculation of how much data you think you need to serve from the block cache. By default, it can grow to as much as 20% of the available RAM so in your case it's a bit less than 1GB, times the number of region servers so ~15GB available cluster-wide. You can tweak hfile.block.cache.size for more caching. J-D On Thu, May 27, 2010 at 12:09 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > Hi > > Wanted to find the group's experience on HBase performance with increasing > number of regions/node. > Also wanted to find out if there is an optimal number of regions one should > aim for? > > We are currently using > > 17 node HBase(0.20.4) cluster on a 20 node Hadoop(0.20.2) cluster > > 16G RAM per node, 4G RAM for HBase > space available for (Hadoop + HBase) ~ 1.5T /per node > > > > We are currently loading 2 tables each with ~100m rows resulting in > ~ 4000 regions (Using the default for hbase.hregion.max.filesize=256m) > and half the number of region when we double the value > for hbase.hregion.max.filesize to 512m > Although the two runs did not differ in the time taken ~ 9hrs > > With the current load we are only using 10% of the disk space available, > full utilization would result in increased # of regions > and hence wanted to find group's experience/suggestions in this regards. > > ~Jacob >
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-27, 21:57
Thanks J-D
Currently we are trying to find/optimize our load/write times - although in prod we expect it to be 25/75 (writes/reads) ratio. We are using long table model with only one column - row-size is typically ~ 4-5k As to your suggestion on not using even 50% of disk space - I agree and was planning to use only ~30-40% (1.5T of 4T) for HDFS and as I reported earlier 4000 regions@256m per region(with 3 replications) on 20 nodes == 150G per/node == 10% utilization while using 1GB as maxfilesize did you have to adjust other params such as hbase.hstore.compactionThreshold and hbase.hregion.memstore.flush.size. There is an interesting observation by Jonathan Gray documented/reported in HBASE-2375 - wondering whether that issue gets compounded when using 1G as the hbase.hregion.max.filesize Thx Jacob On Thu, May 27, 2010 at 1:27 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > With beefy nodes, don't be afraid of using bigger regions... and LZO. > At stumbleupon we have 1GB maxfilesize on our >13B rows table and LZO > enabled on every table. The number of regions per node is a factor of > so many things... size of rows, acces pattern, hardware, etc. FWIW, I > would say that you should definitely not try to host as much data as > available per machine, not even 50%. In fact, do the calculation of > how much data you think you need to serve from the block cache. By > default, it can grow to as much as 20% of the available RAM so in your > case it's a bit less than 1GB, times the number of region servers so > ~15GB available cluster-wide. You can tweak hfile.block.cache.size for > more caching. > > J-D > > On Thu, May 27, 2010 at 12:09 PM, Jacob Isaac <[EMAIL PROTECTED]> > wrote: > > Hi > > > > Wanted to find the group's experience on HBase performance with > increasing > > number of regions/node. > > Also wanted to find out if there is an optimal number of regions one > should > > aim for? > > > > We are currently using > > > > 17 node HBase(0.20.4) cluster on a 20 node Hadoop(0.20.2) cluster > > > > 16G RAM per node, 4G RAM for HBase > > space available for (Hadoop + HBase) ~ 1.5T /per node > > > > > > > > We are currently loading 2 tables each with ~100m rows resulting in > > ~ 4000 regions (Using the default for hbase.hregion.max.filesize=256m) > > and half the number of region when we double the value > > for hbase.hregion.max.filesize to 512m > > Although the two runs did not differ in the time taken ~ 9hrs > > > > With the current load we are only using 10% of the disk space available, > > full utilization would result in increased # of regions > > and hence wanted to find group's experience/suggestions in this regards. > > > > ~Jacob > > >
-
Re: Performance at large number of regions/nodeJean-Daniel Cryans 2010-05-27, 22:27
Well we do have a couple of other configs for high write throughput:
<property> <name>hbase.hstore.blockingStoreFiles</name> <value>15</value> </property> <property> <name>hbase.hregion.memstore.block.multiplier</name> <value>8</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>60</value> </property> <property> <name>hbase.regions.percheckin</name> <value>100</value> </property> The last one is for restarts. Uploading very fast, you will more likely hit all the upper limits (blocking store file and memstore) and this will lower your throughput. Those configs relax that. Also for speedier uploads we disable writing to the WAL http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean). If the job fails or any machines fails you'll have to restart it or figure the whole, and you absolutely need to force flushes when the MR is done. J-D On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > Thanks J-D > > Currently we are trying to find/optimize our load/write times - although in > prod we expect it to be 25/75 (writes/reads) ratio. > We are using long table model with only one column - row-size is typically ~ > 4-5k > > As to your suggestion on not using even 50% of disk space - I agree and was > planning to use only ~30-40% (1.5T of 4T) for HDFS > and as I reported earlier > 4000 regions@256m per region(with 3 replications) on 20 nodes == 150G > per/node == 10% utilization > > while using 1GB as maxfilesize did you have to adjust other params such > as hbase.hstore.compactionThreshold and hbase.hregion.memstore.flush.size. > There is an interesting observation by Jonathan Gray documented/reported in > HBASE-2375 - > wondering whether that issue gets compounded when using 1G as the > hbase.hregion.max.filesize > > Thx > Jacob > >
-
Re: Performance at large number of regions/nodeVidhyashankar Venkatarama... 2010-05-28, 17:12
I am not sure if I understood this right, but does changing hfile.block.cache.size also help?
On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <[EMAIL PROTECTED]> wrote: Well we do have a couple of other configs for high write throughput: <property> <name>hbase.hstore.blockingStoreFiles</name> <value>15</value> </property> <property> <name>hbase.hregion.memstore.block.multiplier</name> <value>8</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>60</value> </property> <property> <name>hbase.regions.percheckin</name> <value>100</value> </property> The last one is for restarts. Uploading very fast, you will more likely hit all the upper limits (blocking store file and memstore) and this will lower your throughput. Those configs relax that. Also for speedier uploads we disable writing to the WAL http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean). If the job fails or any machines fails you'll have to restart it or figure the whole, and you absolutely need to force flushes when the MR is done. J-D On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > Thanks J-D > > Currently we are trying to find/optimize our load/write times - although in > prod we expect it to be 25/75 (writes/reads) ratio. > We are using long table model with only one column - row-size is typically ~ > 4-5k > > As to your suggestion on not using even 50% of disk space - I agree and was > planning to use only ~30-40% (1.5T of 4T) for HDFS > and as I reported earlier > 4000 regions@256m per region(with 3 replications) on 20 nodes == 150G > per/node == 10% utilization > > while using 1GB as maxfilesize did you have to adjust other params such > as hbase.hstore.compactionThreshold and hbase.hregion.memstore.flush.size. > There is an interesting observation by Jonathan Gray documented/reported in > HBASE-2375 - > wondering whether that issue gets compounded when using 1G as the > hbase.hregion.max.filesize > > Thx > Jacob > >
-
Re: Performance at large number of regions/nodeJean-Daniel Cryans 2010-05-28, 17:15
Like I said in my first email, it helps for random reading when lots
of RAM is available to HBase. But it won't help the write throughput. J-D On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman <[EMAIL PROTECTED]> wrote: > I am not sure if I understood this right, but does changing hfile.block.cache.size also help? > > > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <[EMAIL PROTECTED]> wrote: > > Well we do have a couple of other configs for high write throughput: > > <property> > <name>hbase.hstore.blockingStoreFiles</name> > <value>15</value> > </property> > <property> > <name>hbase.hregion.memstore.block.multiplier</name> > <value>8</value> > </property> > <property> > <name>hbase.regionserver.handler.count</name> > <value>60</value> > </property> > <property> > <name>hbase.regions.percheckin</name> > <value>100</value> > </property> > > The last one is for restarts. Uploading very fast, you will more > likely hit all the upper limits (blocking store file and memstore) and > this will lower your throughput. Those configs relax that. Also for > speedier uploads we disable writing to the WAL > http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean). > If the job fails or any machines fails you'll have to restart it or > figure the whole, and you absolutely need to force flushes when the MR > is done. > > J-D > > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: >> Thanks J-D >> >> Currently we are trying to find/optimize our load/write times - although in >> prod we expect it to be 25/75 (writes/reads) ratio. >> We are using long table model with only one column - row-size is typically ~ >> 4-5k >> >> As to your suggestion on not using even 50% of disk space - I agree and was >> planning to use only ~30-40% (1.5T of 4T) for HDFS >> and as I reported earlier >> 4000 regions@256m per region(with 3 replications) on 20 nodes == 150G >> per/node == 10% utilization >> >> while using 1GB as maxfilesize did you have to adjust other params such >> as hbase.hstore.compactionThreshold and hbase.hregion.memstore.flush.size. >> There is an interesting observation by Jonathan Gray documented/reported in >> HBASE-2375 - >> wondering whether that issue gets compounded when using 1G as the >> hbase.hregion.max.filesize >> >> Thx >> Jacob >> >> > >
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-28, 19:28
Did a run yesterday, posted the relevant parameters below.
Did not see any difference in throughput or total run time (~9 hrs) I am consistently getting about 5k rows/sec, each row around ~4-5k using a 17 node Hbase on 20 node HDFS cluster How does it compare?? Can I juice it more? ~Jacob <property> <name>hbase.regionserver.handler.count</name> <value>60</value> </property> <property> <name>hbase.hregion.max.filesize</name> <value>1073741824</value> </property> <property> <name>hbase.hregion.memstore.flush.size</name> <value>100663296</value> </property> <property> <name>hbase.hstore.blockingStoreFiles</name> <value>15</value> </property> <property> <name>hbase.hstore.compactionThreshold</name> <value>4</value> </property> <property> <name>hbase.hregion.memstore.block.multiplier</name> <value>8</value> </property> On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > Like I said in my first email, it helps for random reading when lots > of RAM is available to HBase. But it won't help the write throughput. > > J-D > > On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman > <[EMAIL PROTECTED]> wrote: > > I am not sure if I understood this right, but does changing > hfile.block.cache.size also help? > > > > > > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <[EMAIL PROTECTED]> wrote: > > > > Well we do have a couple of other configs for high write throughput: > > > > <property> > > <name>hbase.hstore.blockingStoreFiles</name> > > <value>15</value> > > </property> > > <property> > > <name>hbase.hregion.memstore.block.multiplier</name> > > <value>8</value> > > </property> > > <property> > > <name>hbase.regionserver.handler.count</name> > > <value>60</value> > > </property> > > <property> > > <name>hbase.regions.percheckin</name> > > <value>100</value> > > </property> > > > > The last one is for restarts. Uploading very fast, you will more > > likely hit all the upper limits (blocking store file and memstore) and > > this will lower your throughput. Those configs relax that. Also for > > speedier uploads we disable writing to the WAL > > > http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean) > . > > If the job fails or any machines fails you'll have to restart it or > > figure the whole, and you absolutely need to force flushes when the MR > > is done. > > > > J-D > > > > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > >> Thanks J-D > >> > >> Currently we are trying to find/optimize our load/write times - although > in > >> prod we expect it to be 25/75 (writes/reads) ratio. > >> We are using long table model with only one column - row-size is > typically ~ > >> 4-5k > >> > >> As to your suggestion on not using even 50% of disk space - I agree and > was > >> planning to use only ~30-40% (1.5T of 4T) for HDFS > >> and as I reported earlier > >> 4000 regions@256m per region(with 3 replications) on 20 nodes == 150G > >> per/node == 10% utilization > >> > >> while using 1GB as maxfilesize did you have to adjust other params such > >> as hbase.hstore.compactionThreshold and > hbase.hregion.memstore.flush.size. > >> There is an interesting observation by Jonathan Gray documented/reported > in > >> HBASE-2375 - > >> wondering whether that issue gets compounded when using 1G as the > >> hbase.hregion.max.filesize > >> > >> Thx > >> Jacob > >> > >> > > > > >
-
Re: Performance at large number of regions/nodeJean-Daniel Cryans 2010-05-28, 19:42
If the table was already created, changing hbase.hregion.max.filesize
and hbase.hregion.memstore.flush.size won't be considered, those are the default values for new tables. You can set it in the shell too, see the "alter" command. Also, did you restart HBase? Did you push the configs to all nodes? Did you disable writing to the WAL? If not, because durability is still important to you but you want to upload as fast as you can, I would recommend changing this too: hbase.regionserver.hlog.blocksize 134217728 hbase.regionserver.maxlogs 128 I forgot you had quite largish values, so that must affect the log rolling a _lot_. Finally, did you LZOed the table? From experience, it will only do good http://wiki.apache.org/hadoop/UsingLzoCompression And finally (for real this time), how are you uploading to HBase? How many clients? Are you even using the write buffer? http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) J-D On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > Did a run yesterday, posted the relevant parameters below. > Did not see any difference in throughput or total run time (~9 hrs) > > I am consistently getting about 5k rows/sec, each row around ~4-5k > using a 17 node Hbase on 20 node HDFS cluster > > How does it compare?? Can I juice it more? > > ~Jacob > > > <property> > <name>hbase.regionserver.handler.count</name> > <value>60</value> > </property> > > <property> > <name>hbase.hregion.max.filesize</name> > <value>1073741824</value> > </property> > > <property> > <name>hbase.hregion.memstore.flush.size</name> > <value>100663296</value> > </property> > > <property> > <name>hbase.hstore.blockingStoreFiles</name> > <value>15</value> > </property> > > <property> > <name>hbase.hstore.compactionThreshold</name> > <value>4</value> > </property> > > <property> > <name>hbase.hregion.memstore.block.multiplier</name> > <value>8</value> > </property> > > > > On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > >> Like I said in my first email, it helps for random reading when lots >> of RAM is available to HBase. But it won't help the write throughput. >> >> J-D >> >> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman >> <[EMAIL PROTECTED]> wrote: >> > I am not sure if I understood this right, but does changing >> hfile.block.cache.size also help? >> > >> > >> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <[EMAIL PROTECTED]> wrote: >> > >> > Well we do have a couple of other configs for high write throughput: >> > >> > <property> >> > <name>hbase.hstore.blockingStoreFiles</name> >> > <value>15</value> >> > </property> >> > <property> >> > <name>hbase.hregion.memstore.block.multiplier</name> >> > <value>8</value> >> > </property> >> > <property> >> > <name>hbase.regionserver.handler.count</name> >> > <value>60</value> >> > </property> >> > <property> >> > <name>hbase.regions.percheckin</name> >> > <value>100</value> >> > </property> >> > >> > The last one is for restarts. Uploading very fast, you will more >> > likely hit all the upper limits (blocking store file and memstore) and >> > this will lower your throughput. Those configs relax that. Also for >> > speedier uploads we disable writing to the WAL >> > >> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Put.html#setWriteToWAL(boolean) >> . >> > If the job fails or any machines fails you'll have to restart it or >> > figure the whole, and you absolutely need to force flushes when the MR >> > is done. >> > >> > J-D >> > >> > On Thu, May 27, 2010 at 2:57 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: >> >> Thanks J-D >> >> >> >> Currently we are trying to find/optimize our load/write times - although >> in >> >> prod we expect it to be 25/75 (writes/reads) ratio. >> >> We are using long table model with only one column - row-size is >> typically ~ >> >> 4-5k >> >
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-28, 20:13
Hi J-D
The run was done on a reformatted hdfs. Disabling WAL is not an option for us bcos this will be our normal mode of operation and durability is important to us. It was poor choice of words - 'upload' by me - it is more like periodic/continous writes hbase.regionserver.maxlogs was 256 although hbase.regionserver.hlog.blocksize was the default. Did not use compression. And autoflush is default (true) Each of the 20 node is running custom server program that's reading and writing to HBase Max of 6 write threads per node and 1 thread reading Also wanted to point out that in the current tests we are writing to two tables and reading from only one ~Jacob On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > If the table was already created, changing hbase.hregion.max.filesize > and hbase.hregion.memstore.flush.size won't be considered, those are > the default values for new tables. You can set it in the shell too, > see the "alter" command. > > Also, did you restart HBase? Did you push the configs to all nodes? > Did you disable writing to the WAL? If not, because durability is > still important to you but you want to upload as fast as you can, I > would recommend changing this too: > > hbase.regionserver.hlog.blocksize 134217728 > > hbase.regionserver.maxlogs 128 > > I forgot you had quite largish values, so that must affect the log > rolling a _lot_. > > Finally, did you LZOed the table? From experience, it will only do > good http://wiki.apache.org/hadoop/UsingLzoCompression > > And finally (for real this time), how are you uploading to HBase? How > many clients? Are you even using the write buffer? > > http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) > > J-D > > On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > > Did a run yesterday, posted the relevant parameters below. > > Did not see any difference in throughput or total run time (~9 hrs) > > > > I am consistently getting about 5k rows/sec, each row around ~4-5k > > using a 17 node Hbase on 20 node HDFS cluster > > > > How does it compare?? Can I juice it more? > > > > ~Jacob > > > > > > <property> > > <name>hbase.regionserver.handler.count</name> > > <value>60</value> > > </property> > > > > <property> > > <name>hbase.hregion.max.filesize</name> > > <value>1073741824</value> > > </property> > > > > <property> > > <name>hbase.hregion.memstore.flush.size</name> > > <value>100663296</value> > > </property> > > > > <property> > > <name>hbase.hstore.blockingStoreFiles</name> > > <value>15</value> > > </property> > > > > <property> > > <name>hbase.hstore.compactionThreshold</name> > > <value>4</value> > > </property> > > > > <property> > > <name>hbase.hregion.memstore.block.multiplier</name> > > <value>8</value> > > </property> > > > > > > > > On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans < > [EMAIL PROTECTED]>wrote: > > > >> Like I said in my first email, it helps for random reading when lots > >> of RAM is available to HBase. But it won't help the write throughput. > >> > >> J-D > >> > >> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman > >> <[EMAIL PROTECTED]> wrote: > >> > I am not sure if I understood this right, but does changing > >> hfile.block.cache.size also help? > >> > > >> > > >> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <[EMAIL PROTECTED]> wrote: > >> > > >> > Well we do have a couple of other configs for high write throughput: > >> > > >> > <property> > >> > <name>hbase.hstore.blockingStoreFiles</name> > >> > <value>15</value> > >> > </property> > >> > <property> > >> > <name>hbase.hregion.memstore.block.multiplier</name> > >> > <value>8</value> > >> > </property> > >> > <property> > >> > <name>hbase.regionserver.handler.count</name> > >> > <value>60</value> > >> > </property> > >> > <property> > >> > <name>hbase.regions.percheckin</name> > >> > <value>100</value>
-
Re: Performance at large number of regions/nodeVidhyashankar Venkatarama... 2010-05-28, 20:16
Jacob,
Just curious: Is your observed upload throughput that of bulk importing or using the Hbase API? Thanks Vidhya On 5/28/10 1:13 PM, "Jacob Isaac" <[EMAIL PROTECTED]> wrote: Hi J-D The run was done on a reformatted hdfs. Disabling WAL is not an option for us bcos this will be our normal mode of operation and durability is important to us. It was poor choice of words - 'upload' by me - it is more like periodic/continous writes hbase.regionserver.maxlogs was 256 although hbase.regionserver.hlog.blocksize was the default. Did not use compression. And autoflush is default (true) Each of the 20 node is running custom server program that's reading and writing to HBase Max of 6 write threads per node and 1 thread reading Also wanted to point out that in the current tests we are writing to two tables and reading from only one ~Jacob On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > If the table was already created, changing hbase.hregion.max.filesize > and hbase.hregion.memstore.flush.size won't be considered, those are > the default values for new tables. You can set it in the shell too, > see the "alter" command. > > Also, did you restart HBase? Did you push the configs to all nodes? > Did you disable writing to the WAL? If not, because durability is > still important to you but you want to upload as fast as you can, I > would recommend changing this too: > > hbase.regionserver.hlog.blocksize 134217728 > > hbase.regionserver.maxlogs 128 > > I forgot you had quite largish values, so that must affect the log > rolling a _lot_. > > Finally, did you LZOed the table? From experience, it will only do > good http://wiki.apache.org/hadoop/UsingLzoCompression > > And finally (for real this time), how are you uploading to HBase? How > many clients? Are you even using the write buffer? > > http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) > > J-D > > On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > > Did a run yesterday, posted the relevant parameters below. > > Did not see any difference in throughput or total run time (~9 hrs) > > > > I am consistently getting about 5k rows/sec, each row around ~4-5k > > using a 17 node Hbase on 20 node HDFS cluster > > > > How does it compare?? Can I juice it more? > > > > ~Jacob > > > > > > <property> > > <name>hbase.regionserver.handler.count</name> > > <value>60</value> > > </property> > > > > <property> > > <name>hbase.hregion.max.filesize</name> > > <value>1073741824</value> > > </property> > > > > <property> > > <name>hbase.hregion.memstore.flush.size</name> > > <value>100663296</value> > > </property> > > > > <property> > > <name>hbase.hstore.blockingStoreFiles</name> > > <value>15</value> > > </property> > > > > <property> > > <name>hbase.hstore.compactionThreshold</name> > > <value>4</value> > > </property> > > > > <property> > > <name>hbase.hregion.memstore.block.multiplier</name> > > <value>8</value> > > </property> > > > > > > > > On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans < > [EMAIL PROTECTED]>wrote: > > > >> Like I said in my first email, it helps for random reading when lots > >> of RAM is available to HBase. But it won't help the write throughput. > >> > >> J-D > >> > >> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman > >> <[EMAIL PROTECTED]> wrote: > >> > I am not sure if I understood this right, but does changing > >> hfile.block.cache.size also help? > >> > > >> > > >> > On 5/27/10 3:27 PM, "Jean-Daniel Cryans" <[EMAIL PROTECTED]> wrote: > >> > > >> > Well we do have a couple of other configs for high write throughput: > >> > > >> > <property> > >> > <name>hbase.hstore.blockingStoreFiles</name> > >> > <value>15</value> > >> > </property> > >> > <property> > >> > <name>hbase.hregion.memstore.block.multiplier</name> > >> > <value>8</value> > >> > </property> > >> > <property>
-
Re: Performance at large number of regions/nodeJean-Daniel Cryans 2010-05-28, 20:20
On Fri, May 28, 2010 at 1:13 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote:
> Hi J-D > > hbase.regionserver.maxlogs was 256 although > hbase.regionserver.hlog.blocksize was the default. > > Did not use compression. And autoflush is default (true) You should, and if you are uploading in big batches then disable autoflush then make sure you commit. > > Each of the 20 node is running custom server program that's reading and > writing to HBase > Max of 6 write threads per node and 1 thread reading > Also wanted to point out that in the current tests we are writing to two > tables and reading from only one That's a very important information! Currently reading is slower than writing in HBase (unless you are scanning lots of rows at a time, using scanner caching) so this might easily be your bottleneck right? Do you benchmark all the steps by any chance? Can you show at the end of your (for a lack of better word) "testing" the average/total time it took to read/write? J-D
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-28, 20:36
Vidhya - This is using HBase API.
J-D - I do have timing info for inserts and gets - Let me process the data and will post the results. ~Jacob. On Fri, May 28, 2010 at 1:16 PM, Vidhyashankar Venkataraman < [EMAIL PROTECTED]> wrote: > Jacob, > Just curious: Is your observed upload throughput that of bulk importing > or using the Hbase API? > Thanks > Vidhya > > On 5/28/10 1:13 PM, "Jacob Isaac" <[EMAIL PROTECTED]> wrote: > > Hi J-D > > The run was done on a reformatted hdfs. > > Disabling WAL is not an option for us bcos this will be our normal mode of > operation and durability is important to us. > It was poor choice of words - 'upload' by me - it is more like > periodic/continous writes > > hbase.regionserver.maxlogs was 256 although > hbase.regionserver.hlog.blocksize was the default. > > Did not use compression. And autoflush is default (true) > > Each of the 20 node is running custom server program that's reading and > writing to HBase > Max of 6 write threads per node and 1 thread reading > Also wanted to point out that in the current tests we are writing to two > tables and reading from only one > > ~Jacob > > On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] > >wrote: > > > If the table was already created, changing hbase.hregion.max.filesize > > and hbase.hregion.memstore.flush.size won't be considered, those are > > the default values for new tables. You can set it in the shell too, > > see the "alter" command. > > > > Also, did you restart HBase? Did you push the configs to all nodes? > > Did you disable writing to the WAL? If not, because durability is > > still important to you but you want to upload as fast as you can, I > > would recommend changing this too: > > > > hbase.regionserver.hlog.blocksize 134217728 > > > > hbase.regionserver.maxlogs 128 > > > > I forgot you had quite largish values, so that must affect the log > > rolling a _lot_. > > > > Finally, did you LZOed the table? From experience, it will only do > > good http://wiki.apache.org/hadoop/UsingLzoCompression > > > > And finally (for real this time), how are you uploading to HBase? How > > many clients? Are you even using the write buffer? > > > > > http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) > > > > J-D > > > > On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > > > Did a run yesterday, posted the relevant parameters below. > > > Did not see any difference in throughput or total run time (~9 hrs) > > > > > > I am consistently getting about 5k rows/sec, each row around ~4-5k > > > using a 17 node Hbase on 20 node HDFS cluster > > > > > > How does it compare?? Can I juice it more? > > > > > > ~Jacob > > > > > > > > > <property> > > > <name>hbase.regionserver.handler.count</name> > > > <value>60</value> > > > </property> > > > > > > <property> > > > <name>hbase.hregion.max.filesize</name> > > > <value>1073741824</value> > > > </property> > > > > > > <property> > > > <name>hbase.hregion.memstore.flush.size</name> > > > <value>100663296</value> > > > </property> > > > > > > <property> > > > <name>hbase.hstore.blockingStoreFiles</name> > > > <value>15</value> > > > </property> > > > > > > <property> > > > <name>hbase.hstore.compactionThreshold</name> > > > <value>4</value> > > > </property> > > > > > > <property> > > > <name>hbase.hregion.memstore.block.multiplier</name> > > > <value>8</value> > > > </property> > > > > > > > > > > > > On Fri, May 28, 2010 at 10:15 AM, Jean-Daniel Cryans < > > [EMAIL PROTECTED]>wrote: > > > > > >> Like I said in my first email, it helps for random reading when lots > > >> of RAM is available to HBase. But it won't help the write throughput. > > >> > > >> J-D > > >> > > >> On Fri, May 28, 2010 at 10:12 AM, Vidhyashankar Venkataraman > > >> <[EMAIL PROTECTED]> wrote: > > >> > I am not sure if I understood this right, but does changing > > >> hfile.block.cache.size also help?
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-28, 23:11
Here is the summary of the runs
puts (~4-5k per row) regionsize #rows Total time (ms) 1G 82282053*2 301943742 512M 82287593*2 313119378 256M 82246314*2 433200105 gets ((~4-5k per row) regionsize #rows Total time (ms) 1G 82427685 90116726 512M 82421943 94878466 256M 82395487 108160178 Note : for the 256m run the hbase.hregion.memstore.flush.size=64m and for the other two runs the hbase.hregion.memstore.flush.size=96m Regarding disabling autoflush - since there are large number of writes(~4k per row) happening we would have hit the hbase.client.write.buffer size every few seconds. ~Jacob On Fri, May 28, 2010 at 1:36 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > Vidhya - This is using HBase API. > > J-D - I do have timing info for inserts and gets - Let me process the data > and will post the results. > > ~Jacob. > > > On Fri, May 28, 2010 at 1:16 PM, Vidhyashankar Venkataraman < > [EMAIL PROTECTED]> wrote: > >> Jacob, >> Just curious: Is your observed upload throughput that of bulk importing >> or using the Hbase API? >> Thanks >> Vidhya >> >> On 5/28/10 1:13 PM, "Jacob Isaac" <[EMAIL PROTECTED]> wrote: >> >> Hi J-D >> >> The run was done on a reformatted hdfs. >> >> Disabling WAL is not an option for us bcos this will be our normal mode of >> operation and durability is important to us. >> It was poor choice of words - 'upload' by me - it is more like >> periodic/continous writes >> >> hbase.regionserver.maxlogs was 256 although >> hbase.regionserver.hlog.blocksize was the default. >> >> Did not use compression. And autoflush is default (true) >> >> Each of the 20 node is running custom server program that's reading and >> writing to HBase >> Max of 6 write threads per node and 1 thread reading >> Also wanted to point out that in the current tests we are writing to two >> tables and reading from only one >> >> ~Jacob >> >> On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] >> >wrote: >> >> > If the table was already created, changing hbase.hregion.max.filesize >> > and hbase.hregion.memstore.flush.size won't be considered, those are >> > the default values for new tables. You can set it in the shell too, >> > see the "alter" command. >> > >> > Also, did you restart HBase? Did you push the configs to all nodes? >> > Did you disable writing to the WAL? If not, because durability is >> > still important to you but you want to upload as fast as you can, I >> > would recommend changing this too: >> > >> > hbase.regionserver.hlog.blocksize 134217728 >> > >> > hbase.regionserver.maxlogs 128 >> > >> > I forgot you had quite largish values, so that must affect the log >> > rolling a _lot_. >> > >> > Finally, did you LZOed the table? From experience, it will only do >> > good http://wiki.apache.org/hadoop/UsingLzoCompression >> > >> > And finally (for real this time), how are you uploading to HBase? How >> > many clients? Are you even using the write buffer? >> > >> > >> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) >> > >> > J-D >> > >> > On Fri, May 28, 2010 at 12:28 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: >> > > Did a run yesterday, posted the relevant parameters below. >> > > Did not see any difference in throughput or total run time (~9 hrs) >> > > >> > > I am consistently getting about 5k rows/sec, each row around ~4-5k >> > > using a 17 node Hbase on 20 node HDFS cluster >> > > >> > > How does it compare?? Can I juice it more? >> > > >> > > ~Jacob >> > > >> > > >> > > <property> >> > > <name>hbase.regionserver.handler.count</name> >> > > <value>60</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hregion.max.filesize</name> >> > > <value>1073741824</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hregion.memstore.flush.size</name> >> > > <value>100663296</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hstore.blockingStoreFiles</name>
-
Re: Performance at large number of regions/nodeJean-Daniel Cryans 2010-05-29, 02:16
Looks like you spend 1/6 of your time doing the gets, good to know.
For autoflush=false, if you fit the 4-5KB in a single Put, then it won't help as 1 put = 1 rpc. Batching them together almost always improve performance. The default buffer size is 2MB btw. LZO should give you another big boost, at least if your data can be compressed in any way. Also watch out for stuff that takes a lot of time in your code like instantiating lots of HTables (reuse same as much as you can inside a single thread), use finals, etc. I saw a good bunch of people shooting themselves in the foot by writing poorly performant code, crazy how running 800M times the same slowish thing ends up taking hours! J-D On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > Here is the summary of the runs > > puts (~4-5k per row) > regionsize #rows Total time (ms) > 1G 82282053*2 301943742 > 512M 82287593*2 313119378 > 256M 82246314*2 433200105 > > gets ((~4-5k per row) > regionsize #rows Total time (ms) > 1G 82427685 90116726 > 512M 82421943 94878466 > 256M 82395487 108160178 > > Note : for the 256m run the hbase.hregion.memstore.flush.size=64m > and for the other two runs the hbase.hregion.memstore.flush.size=96m > > Regarding disabling autoflush - since there are large number of writes(~4k > per row) happening we would > have hit the hbase.client.write.buffer size every few seconds. > > ~Jacob > > On Fri, May 28, 2010 at 1:36 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > >> Vidhya - This is using HBase API. >> >> J-D - I do have timing info for inserts and gets - Let me process the data >> and will post the results. >> >> ~Jacob. >> >> >> On Fri, May 28, 2010 at 1:16 PM, Vidhyashankar Venkataraman < >> [EMAIL PROTECTED]> wrote: >> >>> Jacob, >>> Just curious: Is your observed upload throughput that of bulk importing >>> or using the Hbase API? >>> Thanks >>> Vidhya >>> >>> On 5/28/10 1:13 PM, "Jacob Isaac" <[EMAIL PROTECTED]> wrote: >>> >>> Hi J-D >>> >>> The run was done on a reformatted hdfs. >>> >>> Disabling WAL is not an option for us bcos this will be our normal mode of >>> operation and durability is important to us. >>> It was poor choice of words - 'upload' by me - it is more like >>> periodic/continous writes >>> >>> hbase.regionserver.maxlogs was 256 although >>> hbase.regionserver.hlog.blocksize was the default. >>> >>> Did not use compression. And autoflush is default (true) >>> >>> Each of the 20 node is running custom server program that's reading and >>> writing to HBase >>> Max of 6 write threads per node and 1 thread reading >>> Also wanted to point out that in the current tests we are writing to two >>> tables and reading from only one >>> >>> ~Jacob >>> >>> On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] >>> >wrote: >>> >>> > If the table was already created, changing hbase.hregion.max.filesize >>> > and hbase.hregion.memstore.flush.size won't be considered, those are >>> > the default values for new tables. You can set it in the shell too, >>> > see the "alter" command. >>> > >>> > Also, did you restart HBase? Did you push the configs to all nodes? >>> > Did you disable writing to the WAL? If not, because durability is >>> > still important to you but you want to upload as fast as you can, I >>> > would recommend changing this too: >>> > >>> > hbase.regionserver.hlog.blocksize 134217728 >>> > >>> > hbase.regionserver.maxlogs 128 >>> > >>> > I forgot you had quite largish values, so that must affect the log >>> > rolling a _lot_. >>> > >>> > Finally, did you LZOed the table? From experience, it will only do >>> > good http://wiki.apache.org/hadoop/UsingLzoCompression >>> > >>> > And finally (for real this time), how are you uploading to HBase? How >>> > many clients? Are you even using the write buffer? >>> > >>> > >>> http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean) >>> > >>> > J-D
-
Re: Performance at large number of regions/nodejacob@...) 2010-05-29, 03:25
Our data can be characterized as a list of sets and 1 row == element
of a set. Our puts and gets work on a set at a time. Our sets typically range from 1~1000 elements and few can range from (1k-20k) elements. Can't guarantee it is a perfect codebase but do use HTablePool for reusing HTable. What I wanted out of this discussion was to find out whether I am in the ballpark of what I can juice out of HBase or I am way off the mark. ~Jacob On May 28, 2010, at 7:16 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > Looks like you spend 1/6 of your time doing the gets, good to know. > > For autoflush=false, if you fit the 4-5KB in a single Put, then it > won't help as 1 put = 1 rpc. Batching them together almost always > improve performance. The default buffer size is 2MB btw. > > LZO should give you another big boost, at least if your data can be > compressed in any way. Also watch out for stuff that takes a lot of > time in your code like instantiating lots of HTables (reuse same as > much as you can inside a single thread), use finals, etc. I saw a good > bunch of people shooting themselves in the foot by writing poorly > performant code, crazy how running 800M times the same slowish thing > ends up taking hours! > > J-D > > On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: >> Here is the summary of the runs >> >> puts (~4-5k per row) >> regionsize #rows Total time (ms) >> 1G 82282053*2 301943742 >> 512M 82287593*2 313119378 >> 256M 82246314*2 433200105 >> >> gets ((~4-5k per row) >> regionsize #rows Total time (ms) >> 1G 82427685 90116726 >> 512M 82421943 94878466 >> 256M 82395487 108160178 >> >> Note : for the 256m run the hbase.hregion.memstore.flush.size=64m >> and for the other two runs the hbase.hregion.memstore.flush.size=96m >> >> Regarding disabling autoflush - since there are large number of >> writes(~4k >> per row) happening we would >> have hit the hbase.client.write.buffer size every few seconds. >> >> ~Jacob >> >> On Fri, May 28, 2010 at 1:36 PM, Jacob Isaac <[EMAIL PROTECTED]> >> wrote: >> >>> Vidhya - This is using HBase API. >>> >>> J-D - I do have timing info for inserts and gets - Let me process >>> the data >>> and will post the results. >>> >>> ~Jacob. >>> >>> >>> On Fri, May 28, 2010 at 1:16 PM, Vidhyashankar Venkataraman < >>> [EMAIL PROTECTED]> wrote: >>> >>>> Jacob, >>>> Just curious: Is your observed upload throughput that of bulk >>>> importing >>>> or using the Hbase API? >>>> Thanks >>>> Vidhya >>>> >>>> On 5/28/10 1:13 PM, "Jacob Isaac" <[EMAIL PROTECTED]> wrote: >>>> >>>> Hi J-D >>>> >>>> The run was done on a reformatted hdfs. >>>> >>>> Disabling WAL is not an option for us bcos this will be our >>>> normal mode of >>>> operation and durability is important to us. >>>> It was poor choice of words - 'upload' by me - it is more like >>>> periodic/continous writes >>>> >>>> hbase.regionserver.maxlogs was 256 although >>>> hbase.regionserver.hlog.blocksize was the default. >>>> >>>> Did not use compression. And autoflush is default (true) >>>> >>>> Each of the 20 node is running custom server program that's >>>> reading and >>>> writing to HBase >>>> Max of 6 write threads per node and 1 thread reading >>>> Also wanted to point out that in the current tests we are writing >>>> to two >>>> tables and reading from only one >>>> >>>> ~Jacob >>>> >>>> On Fri, May 28, 2010 at 12:42 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] >>>>> wrote: >>>> >>>>> If the table was already created, changing >>>>> hbase.hregion.max.filesize >>>>> and hbase.hregion.memstore.flush.size won't be considered, those >>>>> are >>>>> the default values for new tables. You can set it in the shell >>>>> too, >>>>> see the "alter" command. >>>>> >>>>> Also, did you restart HBase? Did you push the configs to all >>>>> nodes? >>>>> Did you disable writing to the WAL? If not, because durability is >>>>> still important to you but you want to upload as fast as you
-
Re: Performance at large number of regions/nodeJean-Daniel Cryans 2010-05-29, 04:04
> What I wanted out of this discussion was to find out whether I am in the
> ballpark of what I can juice out of HBase or I am way off the mark. > I understand... but this is a distributed system we're talking about. Unless I have the same code, hbase/hadoop version, configuration, number of nodes, cpu, RAM, # of HDDs, OS, network equipment, data set, etc... it's really hard to assess right? For starters, I don't think you specified the number of drives you have per machine, and HBase is mostly IO-bound. FWIW, here's our experience. At StumbleUpon, we uploaded our main data set consisting of 13B*2 rows on 20 machines (2xi7, 24GB (8 for HBase), 4x 1TB JBOD) with MapReduce (using 8 maps per machine) pulling from a MySQL cluster (we were selecting large ranges in batches), inserting at an average rate of 150-200k rows per second, peaks at 1M. Our rows are a few bytes, mostly integers and some text. We did it in the time with HBase 0.20.3 + the parallel-put patch we wrote here (available in trunk) with the configuration I pasted previously. For that upload the WAL was disabled and ALL our tables are LZOed (can't stress enough the importance of compressing your tables!) and 1GB max file size. My guess is yes you can juice it out more, first by using LZO ;) Also, are your machines even stressed during the test? Do you monitor? Could you increase the number of clients? Sorry I can't give you a very clear answer, but without using a common benchmark to compare numbers we're pretty much all in the dark. YCSB is one, but IIRC it needs some patches to work efficiently (Todd Lipcon from Cloudera has them in his github). J-D
-
Re: Performance at large number of regions/nodeStack 2010-05-29, 17:53
On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote:
> Here is the summary of the runs > > puts (~4-5k per row) > regionsize #rows Total time (ms) > 1G 82282053*2 301943742 > 512M 82287593*2 313119378 > 256M 82246314*2 433200105 > So about 0.3ms per 5k write (presuming 100M writes?)? > gets ((~4-5k per row) > regionsize #rows Total time (ms) > 1G 82427685 90116726 > 512M 82421943 94878466 > 256M 82395487 108160178 > How many reads are you doing? Are they going on concurrent with the writing? > Note : for the 256m run the hbase.hregion.memstore.flush.size=64m > and for the other two runs the hbase.hregion.memstore.flush.size=96m > I wonder what is going on in a typical regionserver log during these runs? Do you see lots of blocking going on (we'll block if memory is full or compaction has been overrun -- with the 96M flushing you might be generating lots of store files provoking lots of compacting -- as per the issue you cite earlier by jgray. If you look in master UI, is there a steady state of requests across all regionservers? Or do they fall to zero alot? Would be good to check regionserver log at these times). Are the servers loaded at all? Thanks Jacob, St.Ack
-
Re: Performance at large number of regions/nodeStack 2010-05-29, 19:04
On Sat, May 29, 2010 at 10:53 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: >> Here is the summary of the runs >> >> puts (~4-5k per row) >> regionsize #rows Total time (ms) >> 1G 82282053*2 301943742 >> 512M 82287593*2 313119378 >> 256M 82246314*2 433200105 >> > > So about 0.3ms per 5k write (presuming 100M writes?)? > I just tried loading 100M 1k rows into a 4 regionserver cluster where each node had two clients writing at any one time and it took just over an hour. If you tell me more about your loading job and if reading is happening concurrently, I can try and mock it here so we can compare (no lzo and all defaults on my cluster). St.Ack
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-30, 00:52
Wow !! That's almost twice the throughput I got with less that 1/4 the
cluster size. The general flow of the loading program is 1. Reading/processing data from source (a local file on the machine) 2. Writing data to HBase 3. Reading the data from HBase and processing it. steps 1 and 2 happen on the same node step 3 may or may not be on the same machine that wrote it. Yes the reads and writes are happening concurrently and another thing to note is that the read for a particular set is almost immediately after it is written In the master UI - there is steady # of request (typically around ~ 500 request/RS). I must admit we have not monitored it to say that's the steady rate throughout the 9 hr run - we have manually refresh the UI during the first two hrs and that's been the observation. The average load on these machines ~5 as reported by top/htop and datacenter monitoring UI . The typical messages I see in the RS logs are - and the typical pattern is few of them in a sudden burst and periodically every 1-3 min Finished snapshotting, commencing flushing stores - Started memstore flush for region Finished memstore flush Starting compaction on region compaction completed on region Failed openScanner removing old hlog file hlogs to remove out of total Updates disabled for region, ~jacob On Sat, May 29, 2010 at 12:04 PM, Stack <[EMAIL PROTECTED]> wrote: > On Sat, May 29, 2010 at 10:53 AM, Stack <[EMAIL PROTECTED]> wrote: >> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: >>> Here is the summary of the runs >>> >>> puts (~4-5k per row) >>> regionsize #rows Total time (ms) >>> 1G 82282053*2 301943742 >>> 512M 82287593*2 313119378 >>> 256M 82246314*2 433200105 >>> >> >> So about 0.3ms per 5k write (presuming 100M writes?)? >> > > I just tried loading 100M 1k rows into a 4 regionserver cluster where > each node had two clients writing at any one time and it took just > over an hour. If you tell me more about your loading job and if > reading is happening concurrently, I can try and mock it here so we > can compare (no lzo and all defaults on my cluster). > > St.Ack >
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-30, 01:36
Hi J-D
We have 8 drives (~500G per drive - total 4G) per machine The metrics from my run indicate that I achieve around for writes - around 1 row(5k) in 2ms => 500 rows(5K) in 1 sec => 2.5 Mb/sec and from your the observation at StumbleUpon 200k rows (presuming 100 bytes per row)/sec => 20Mb/sec Wow !! that an order of difference I am sure disabling WAL during the writes is giving you a significant boost. Are you reading the data at the same time as you are writing? Thx Jacob On Fri, May 28, 2010 at 9:04 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: >> What I wanted out of this discussion was to find out whether I am in the >> ballpark of what I can juice out of HBase or I am way off the mark. >> > > I understand... but this is a distributed system we're talking about. > Unless I have the same code, hbase/hadoop version, configuration, > number of nodes, cpu, RAM, # of HDDs, OS, network equipment, data set, > etc... it's really hard to assess right? For starters, I don't think > you specified the number of drives you have per machine, and HBase is > mostly IO-bound. > > FWIW, here's our experience. At StumbleUpon, we uploaded our main data > set consisting of 13B*2 rows on 20 machines (2xi7, 24GB (8 for HBase), > 4x 1TB JBOD) with MapReduce (using 8 maps per machine) pulling from a > MySQL cluster (we were selecting large ranges in batches), inserting > at an average rate of 150-200k rows per second, peaks at 1M. Our rows > are a few bytes, mostly integers and some text. We did it in the time > with HBase 0.20.3 + the parallel-put patch we wrote here (available in > trunk) with the configuration I pasted previously. For that upload the > WAL was disabled and ALL our tables are LZOed (can't stress enough the > importance of compressing your tables!) and 1GB max file size. > > My guess is yes you can juice it out more, first by using LZO ;) > > Also, are your machines even stressed during the test? Do you monitor? > Could you increase the number of clients? > > Sorry I can't give you a very clear answer, but without using a common > benchmark to compare numbers we're pretty much all in the dark. YCSB > is one, but IIRC it needs some patches to work efficiently (Todd > Lipcon from Cloudera has them in his github). > > J-D >
-
Re: Performance at large number of regions/nodeStack 2010-05-30, 14:04
On Sat, May 29, 2010 at 5:52 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote:
> Wow !! That's almost twice the throughput I got with less that 1/4 the > cluster size. > I'm just writing. > The general flow of the loading program is > > 1. Reading/processing data from source (a local file on the machine) > 2. Writing data to HBase > 3. Reading the data from HBase and processing it. > > steps 1 and 2 happen on the same node OK. So all 17 nodes have a file local? The data is keyed? Are the keys sorted? The writing is not necessarily to the local node, right? We'll write to the region responsible for the key which could be anywhere out on the cluster. > step 3 may or may not be on the same machine that wrote it. > This is is probably whats taking the time. When you read, its a random accesss? Does the processing take much time? You can't scan and process a batch of documents at a time? > Yes the reads and writes are happening concurrently > and another thing to note is that the read for a particular set is > almost immediately after it is written > You'd think then that the data would be up in the memstore still, or at least, it would be ideal that if when most of the reads came in, that they'd find the data in memstore and not have to go to the filesystem (Reading from our memstore is not the best apparantly, speed-wise -- it needs some work -- but still better than going to the filesystem). > In the master UI - there is steady # of request (typically around ~ > 500 request/RS). > I must admit we have not monitored it to say that's the steady rate > throughout the 9 hr run - > we have manually refresh the UI during the first two hrs and that's > been the observation. > OK. Steady is good. > The average load on these machines ~5 as reported by top/htop and > datacenter monitoring UI . > OK. Can you figure more about the load. Is it mostly cpu or is it i/o? > The typical messages I see in the RS logs are - > > and the typical pattern is few of them in a sudden burst and > periodically every 1-3 min > > Finished snapshotting, commencing flushing stores - > Started memstore flush for region > Finished memstore flush > Starting compaction on region > compaction completed on region > Failed openScanner > removing old hlog file > hlogs to remove out of total > Updates disabled for region, > You see any blocking because too many storefiles or because regionserver has hit the global memory limit? If not, it might help upping your storefile size from 96M. Perhaps double it so less frequent flushes (more likely the reads will find the data out of memory). What rate would make you happy? St.Ack > ~jacob > > > On Sat, May 29, 2010 at 12:04 PM, Stack <[EMAIL PROTECTED]> wrote: >> On Sat, May 29, 2010 at 10:53 AM, Stack <[EMAIL PROTECTED]> wrote: >>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: >>>> Here is the summary of the runs >>>> >>>> puts (~4-5k per row) >>>> regionsize #rows Total time (ms) >>>> 1G 82282053*2 301943742 >>>> 512M 82287593*2 313119378 >>>> 256M 82246314*2 433200105 >>>> >>> >>> So about 0.3ms per 5k write (presuming 100M writes?)? >>> >> >> I just tried loading 100M 1k rows into a 4 regionserver cluster where >> each node had two clients writing at any one time and it took just >> over an hour. If you tell me more about your loading job and if >> reading is happening concurrently, I can try and mock it here so we >> can compare (no lzo and all defaults on my cluster). >> >> St.Ack >> >
-
Re: Performance at large number of regions/nodeStack 2010-05-30, 14:08
On Sat, May 29, 2010 at 6:36 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote:
> The metrics from my run indicate that I achieve around > for writes - > around 1 row(5k) in 2ms => 500 rows(5K) in 1 sec => 2.5 Mb/sec > > and from your the observation at StumbleUpon > > 200k rows (presuming 100 bytes per row)/sec => 20Mb/sec > Wow !! that an order of difference > I am sure disabling WAL during the writes is giving you a significant boost. > There is also the compression that J-D has been suggesting. > Are you reading the data at the same time as you are writing? > I don't think so. Is there any locality to the reading you are doing or is it pure random access. St.Ack
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-30, 16:22
On Sun, May 30, 2010 at 7:04 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Sat, May 29, 2010 at 5:52 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > > Wow !! That's almost twice the throughput I got with less that 1/4 the > > cluster size. > > > I'm just writing. > > That is true. And I hear reading is not as efficient as writing? > > The general flow of the loading program is > > > > 1. Reading/processing data from source (a local file on the machine) > > 2. Writing data to HBase > > 3. Reading the data from HBase and processing it. > > > > steps 1 and 2 happen on the same node > > OK. So all 17 nodes have a file local? > > Our source data (files) is uniformly distributed across all 20*8 disks. > The data is keyed? Are the keys sorted? The writing is not > necessarily to the local node, right? We'll write to the region > responsible for the key which could be anywhere out on the cluster. > > As explained in one of my earlier emails - we do gets and puts on a given set a set can contain anywhere from 1~20k elements (but 95% < 1000 elements) Key is a composite-key <SHA1>:<element #> So it is pretty random and we see good distribution happening very soon. > > step 3 may or may not be on the same machine that wrote it. > > > This is is probably whats taking the time. > > When you read, its a random accesss? Does the processing take much > time? You can't scan and process a batch of documents at a time? > > Our writes and reads are pretty random (we rely on HBase handling the distribution) except that we read a set almost immediately after it written. Since our gets is for a set - we are scanning a bunch of rows at a time. working on multiple sets at a time - don't know whether that would help? > > Yes the reads and writes are happening concurrently > > and another thing to note is that the read for a particular set is > > almost immediately after it is written > > > You'd think then that the data would be up in the memstore still, or > at least, it would be ideal that if when most of the reads came in, > that they'd find the data in memstore and not have to go to the > filesystem (Reading from our memstore is not the best apparantly, > speed-wise -- it needs some work -- but still better than going to the > filesystem). > > The Failed openScanner messages seems to suggest some region name cache is getting stale with so many splits taking place. > > > In the master UI - there is steady # of request (typically around ~ > > 500 request/RS). > > I must admit we have not monitored it to say that's the steady rate > > throughout the 9 hr run - > > we have manually refresh the UI during the first two hrs and that's > > been the observation. > > > OK. Steady is good. > > > The average load on these machines ~5 as reported by top/htop and > > datacenter monitoring UI . > > > > OK. Can you figure more about the load. Is it mostly cpu or is it i/o? > > > The typical messages I see in the RS logs are - > > > > and the typical pattern is few of them in a sudden burst and > > periodically every 1-3 min > > > > Finished snapshotting, commencing flushing stores - > > Started memstore flush for region > > Finished memstore flush > > Starting compaction on region > > compaction completed on region > > Failed openScanner > > removing old hlog file > > hlogs to remove out of total > > Updates disabled for region, > > > You see any blocking because too many storefiles or because > regionserver has hit the global memory limit? > > Do see 'Forced flushing of XXXX because global memstore limit of 1.6g ...." every 3-4 min > If not, it might help upping your storefile size from 96M. Perhaps > double it so less frequent flushes (more likely the reads will find > the data out of memory). > > What rate would make you happy? > :-) I think from an acceptable threshold - we are good!! We are trying to size up our capacity handling metrics and wanted to get a sense that we not way off the mark. Also was looking for ideas and suggestions that we may have missed. ~Jacob St.Ack
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-05-30, 16:29
On Sun, May 30, 2010 at 7:08 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Sat, May 29, 2010 at 6:36 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > > The metrics from my run indicate that I achieve around > > for writes - > > around 1 row(5k) in 2ms => 500 rows(5K) in 1 sec => 2.5 Mb/sec > > > > and from your the observation at StumbleUpon > > > > 200k rows (presuming 100 bytes per row)/sec => 20Mb/sec > > Wow !! that an order of difference > > I am sure disabling WAL during the writes is giving you a significant > boost. > > > > There is also the compression that J-D has been suggesting. > Will look into compression - it's been on my to-try list. > > > > Are you reading the data at the same time as you are writing? > > > > I don't think so. > > Is there any locality to the reading you are doing or is it pure random > access. > That's right it purely random from both perspective - 1. Which of our server will issue the read. 2. Which RS will have the data. > > St.Ack >
-
Re: Performance at large number of regions/nodeStack 2010-05-31, 15:37
On Sun, May 30, 2010 at 9:22 AM, Jacob Isaac <[EMAIL PROTECTED]> wrote:
> On Sun, May 30, 2010 at 7:04 AM, Stack <[EMAIL PROTECTED]> wrote: > Our writes and reads are pretty random (we rely on HBase handling the > distribution) > except that we read a set almost immediately after it written. > > Since our gets is for a set - we are scanning a bunch of rows at a time. > working on multiple sets at a time - don't know whether that would help? > So, you are scanning (looks like you can given your key type assuming the sha-1 is the set identifier). > The Failed openScanner messages seems to suggest some region name cache is > getting stale with so many splits taking place. > Paste the exception. > Do see 'Forced flushing of XXXX because global memstore limit of 1.6g ...." > every 3-4 min > Do these periods last a while or are they short? You think the scenario described by Jon Gray over in HBASE-2375? > We are trying to size up our capacity handling metrics and > wanted to get a sense that we not way off the mark. > Well, you seem to have the basics right and you seem to have a good handle on how the systems interact. All that is left, it would seem is to try lzo as J-D suggests. Good stuff Jacob, St.Ack > Also was looking for ideas and suggestions that we may have missed. > > ~Jacob > > St.Ack >> >> >> > ~jacob >> > >> > >> > On Sat, May 29, 2010 at 12:04 PM, Stack <[EMAIL PROTECTED]> wrote: >> >> On Sat, May 29, 2010 at 10:53 AM, Stack <[EMAIL PROTECTED]> wrote: >> >>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <[EMAIL PROTECTED]> wrote: >> >>>> Here is the summary of the runs >> >>>> >> >>>> puts (~4-5k per row) >> >>>> regionsize #rows Total time (ms) >> >>>> 1G 82282053*2 301943742 >> >>>> 512M 82287593*2 313119378 >> >>>> 256M 82246314*2 433200105 >> >>>> >> >>> >> >>> So about 0.3ms per 5k write (presuming 100M writes?)? >> >>> >> >> >> >> I just tried loading 100M 1k rows into a 4 regionserver cluster where >> >> each node had two clients writing at any one time and it took just >> >> over an hour. If you tell me more about your loading job and if >> >> reading is happening concurrently, I can try and mock it here so we >> >> can compare (no lzo and all defaults on my cluster). >> >> >> >> St.Ack >> >> >> > >> >
-
Re: Performance at large number of regions/nodeVidhyashankar Venkatarama... 2010-06-01, 23:20
I have a related question: I tried a simple load experiment too using Hbase's Java API.. (The nodes do only loading: nothing else.. The client programs generate random data on the fly to load.. So, no reads of the input data)..
120m rows 15KB each. 2 column families. 5 region servers, ran around 4 or 5 clients per node on the 5 nodes that run the region servers.. 2MB block size, 2gigs region size, WAL disabled, auto flush disabled.. 2MB write buffer.. Major compactions disabled.. The other configs are quite similar to the configs discussed in this thread.. And I get a throughput of around 1.5 MB per second per node.. (500 rows per second for the entire cluster).. Do these values seem reasonable? Thanks Vidhya On 5/29/10 6:36 PM, "Jacob Isaac" <[EMAIL PROTECTED]> wrote: Hi J-D We have 8 drives (~500G per drive - total 4G) per machine The metrics from my run indicate that I achieve around for writes - around 1 row(5k) in 2ms => 500 rows(5K) in 1 sec => 2.5 Mb/sec and from your the observation at StumbleUpon 200k rows (presuming 100 bytes per row)/sec => 20Mb/sec Wow !! that an order of difference I am sure disabling WAL during the writes is giving you a significant boost. Are you reading the data at the same time as you are writing? Thx Jacob On Fri, May 28, 2010 at 9:04 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: >> What I wanted out of this discussion was to find out whether I am in the >> ballpark of what I can juice out of HBase or I am way off the mark. >> > > I understand... but this is a distributed system we're talking about. > Unless I have the same code, hbase/hadoop version, configuration, > number of nodes, cpu, RAM, # of HDDs, OS, network equipment, data set, > etc... it's really hard to assess right? For starters, I don't think > you specified the number of drives you have per machine, and HBase is > mostly IO-bound. > > FWIW, here's our experience. At StumbleUpon, we uploaded our main data > set consisting of 13B*2 rows on 20 machines (2xi7, 24GB (8 for HBase), > 4x 1TB JBOD) with MapReduce (using 8 maps per machine) pulling from a > MySQL cluster (we were selecting large ranges in batches), inserting > at an average rate of 150-200k rows per second, peaks at 1M. Our rows > are a few bytes, mostly integers and some text. We did it in the time > with HBase 0.20.3 + the parallel-put patch we wrote here (available in > trunk) with the configuration I pasted previously. For that upload the > WAL was disabled and ALL our tables are LZOed (can't stress enough the > importance of compressing your tables!) and 1GB max file size. > > My guess is yes you can juice it out more, first by using LZO ;) > > Also, are your machines even stressed during the test? Do you monitor? > Could you increase the number of clients? > > Sorry I can't give you a very clear answer, but without using a common > benchmark to compare numbers we're pretty much all in the dark. YCSB > is one, but IIRC it needs some patches to work efficiently (Todd > Lipcon from Cloudera has them in his github). > > J-D >
-
RE: Performance at large number of regions/nodeJonathan Gray 2010-06-02, 00:10
This is significantly lower than the top write speeds I've seen, like an order of magnitude. And you are running on 4 disks per node so should be way faster. One thing to keep in mind though is HBase does not support concurrent compactions so we don't always fully utilize multi-disk setups. Multiple compactions should be included in the next major release.
What's going on in your logs, especially region servers? Do you see blocking of updates? Anything unusual like flushes that aren't from hitting the flush size? Are you starting from an empty table? Are your insertion keys random? Your 1.5MB/sec/node comes from a steady-state insertion load once the table is evenly distributed across nodes? How many regions at this time and do you see even or uneven load across RS? What I remember was being on the order of 1/2 or 1/4 the raw write throughput of the drives, something in that range though I'm forgetting the details. There's no architectural reason not to be in that range or better. In these calculations, however, all the writes to disk were being used in the calculation (io used for flushes, compactions, etc). Your calculation is based on the actual size of the data, though behind the scenes HBase is writing this multiple times. Did you change the MemStore flush size? You're going to end up doing a ton of compactions if you are flushing small MemStores but have a big max region size. The flush size is one factor. The total heap on each RS and the number of regions per RS will also impact the sizes of flushed files. Each time you do a compaction, you rewrite data, this kills io. There are lots of changes coming up in the next release. Follow along HBASE-2375 and related jiras for the compaction/split/flush improvements being worked on. JG > -----Original Message----- > From: Vidhyashankar Venkataraman [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, June 01, 2010 4:21 PM > To: [EMAIL PROTECTED] > Subject: Re: Performance at large number of regions/node > > I have a related question: I tried a simple load experiment too using > Hbase's Java API.. (The nodes do only loading: nothing else.. The > client programs generate random data on the fly to load.. So, no reads > of the input data).. > > 120m rows 15KB each. 2 column families. > 5 region servers, ran around 4 or 5 clients per node on the 5 nodes > that run the region servers.. > > 2MB block size, 2gigs region size, WAL disabled, auto flush disabled.. > 2MB write buffer.. Major compactions disabled.. > > The other configs are quite similar to the configs discussed in this > thread.. > > And I get a throughput of around 1.5 MB per second per node.. > (500 rows per second for the entire cluster).. Do these values seem > reasonable? > > Thanks > Vidhya > > On 5/29/10 6:36 PM, "Jacob Isaac" <[EMAIL PROTECTED]> wrote: > > Hi J-D > > We have 8 drives (~500G per drive - total 4G) per machine > > The metrics from my run indicate that I achieve around > for writes - > around 1 row(5k) in 2ms => 500 rows(5K) in 1 sec => 2.5 Mb/sec > > and from your the observation at StumbleUpon > > 200k rows (presuming 100 bytes per row)/sec => 20Mb/sec > Wow !! that an order of difference > I am sure disabling WAL during the writes is giving you a significant > boost. > > Are you reading the data at the same time as you are writing? > > Thx > Jacob > > On Fri, May 28, 2010 at 9:04 PM, Jean-Daniel Cryans > <[EMAIL PROTECTED]> wrote: > >> What I wanted out of this discussion was to find out whether I am in > the > >> ballpark of what I can juice out of HBase or I am way off the mark. > >> > > > > I understand... but this is a distributed system we're talking about. > > Unless I have the same code, hbase/hadoop version, configuration, > > number of nodes, cpu, RAM, # of HDDs, OS, network equipment, data > set, > > etc... it's really hard to assess right? For starters, I don't think > > you specified the number of drives you have per machine, and HBase is > > mostly IO-bound.
-
Re: Performance at large number of regions/nodeVidhyashankar Venkatarama... 2010-06-02, 15:24
I increased the flush size to 800M.... Now, 2 things have started to happen: (I forgot to add in my previous mails that the data is uncompressed since the data is chosen at random anyways and this is the likely size of the compressed data we will be handling)..
(a) Flush sizes arent reached because the global memstore is hitting the max more often: I will change the max and min global memstore and see what happens.. 2010-06-02 15:07:28,424 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of DocData,68372423,1275489341661 because global memstore limit of 1.2g exceeded; currently 1.2g and flushing till 1.0g2010-06-02 15:07:28,424 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for region DocData,68372423,1275489341661. Current region memstore size 164.1m 2010-06-02 15:07:29,516 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://b5120231.yst.yahoo.net:4600/hbase/ DocData/55952207/CONTENT/55062789930 30353376, entries=5700, sequenceid=17022, memsize=87.6m, filesize=87.2m to DocData,68372423,12754893416612010-06-02 15:07:30,345 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://b5120231.yst.yahoo.net:4600/hbase/DocData/55952207/bigColumn/130333135 3726747525, entries=627000, sequenceid=17022, memsize=76.5m, filesize=29.9m to DocData,68372423,1275489341661 2010-06-02 15:07:30,346 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~164.1m for region DocData,68372423,1275489341661 in 1922ms, sequence id=17022, compaction requested=true (b) Compactions still keep happening (the flushes happen frequently anyways).. 2010-06-02 15:07:30,346 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region DocData,68372423,1275489341661/55952207 because: regionserver/74.6.71.48:60020.cacheFlusher2010-06-02 15:07:30,346 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region DocData,68372423,1275489341661 2010-06-02 15:07:30,349 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of CONTENT: 1.9g; Skipped 1 file(s), size: 13629604912010-06-02 15:07:30,349 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 3 file(s) into /hbase/DocData/compaction.dir/55952207, seqid =170222010-06-02 15:07:50,935 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of CONTENT; new storefile is hdfs://b5120231.yst.yahoo.net:460 0/hbase/DocData/55952207/CONTENT/4437999784964187538; store size is 1.9g2010-06-02 15:07:50,937 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of bigColumn: 643.5m; Skipped 1 file(s), size: 460567753 2010-06-02 15:07:50,937 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 3 file(s) into /hbase/DocData/compaction.dir/55952207, seqid=17022 2010-06-02 15:08:01,173 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of bigColumn; new storefile is hdfs://b5120231.yst.yahoo.net:4600/hbase/DocData/55952207/bigColumn/8483730575294442675; store size is 643.5m 2010-06-02 15:08:01,175 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region DocData,68372423,1275489341661 in 30sec2010-06-02 15:08:15,825 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=4.2187347MB (4423664), Free=895.61255MB (939117840), This was a reply that I sent to Jonathan but not to the entire group: Forgot to add this: Do I increase the hbase.hstore.compactionThreshold (right now, 3) as well? The block multiplier is already 8... >> What's going on in your logs, especially region servers? Do you see blocking of updates? Yes, I do see blocking of updates.. Minor compactions running.. I now realize I had been only seeing the iowait field (consistently around 2-3%) shown by sar but that wouldn't catch the actual io performed due to minor compactions (since updates are blocked anyways).. The memory flush size is around 100M... Do I increase it to around 1 Gig or something? Flushes are mostly hitting the full size of 100M... Are you asking to verify if load is balanced? I am running the clients in such a way that they cover mutually exclusive ranges. And each client inserts rows sequentially in order.. How will it matter if I choose keys at random or sequentially? (I think I am missing something here).. Region splits happen only when region sizes get too high. But what row range do the regions get split into after a split? Can you give an example? (I should look into the logs to see if I can get an answer) Total heap size is 3 gigs.. Around the max that I can use for 32-bit java.. Thanks Vidhya On 6/1/10 5:10 PM, "Jonathan Gray" <[EMAIL PROTECTED]> wrote: This is significantly lower than the top write speeds I've seen, like an order of magnitude. And you are running on 4 disks per node so should be way faster. One thing to keep in mind though is HBase does not support concurrent compactions so we don't always fully utilize multi-disk setups. Multiple compactions should be included in the next major release. What's going on in your logs, especially region servers? Do you see blocking of updates? Anything unusual like flushes that aren't from hitting the flush size? Are you starting from an empty table? Are your insertion keys random? Your 1.5MB/sec/node comes from a steady-state insertion load once the table is evenly distributed across nodes? How many regions at this time and do you see even or uneven load across RS? What I remember was being on the order of 1/2 or 1/4 the raw write throughput of the drives, something in that range though I'm forgetting the details. There's no architectural reason not to be in that range or better. In these calculations, however, all the writes to disk were being used in the calculation (io used for flushes, compactions, etc). Your calculation is based on the actual size of the data, though behind the scenes HBase is writing
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-06-02, 16:39
On Mon, May 31, 2010 at 8:37 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Sun, May 30, 2010 at 9:22 AM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > > On Sun, May 30, 2010 at 7:04 AM, Stack <[EMAIL PROTECTED]> wrote: > > Our writes and reads are pretty random (we rely on HBase handling the > > distribution) > > except that we read a set almost immediately after it written. > > > > Since our gets is for a set - we are scanning a bunch of rows at a time. > > working on multiple sets at a time - don't know whether that would help? > > > > So, you are scanning (looks like you can given your key type assuming > the sha-1 is the set identifier). > > That's right sha-1 is my set identifier. and yes I scan since I know the start-row and end-row - I create a Scan object with startRow and stopRow. > > > The Failed openScanner messages seems to suggest some region name cache > is > > getting stale with so many splits taking place. > > > > Paste the exception. > > 2010-05-26 20:37:57,161 ERROR [IPC Server handler 20 on 60020] regionserver.HRegionServer(864): Failed openScanner org.apache.hadoop.hbase.NotServingRegionException: XXXXX-table-name-XXXXX,249B69DED2DB14DCFD894C7F4A01F282DAAA87D6:00114,1274912608415 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2278) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1857) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2010-05-26 20:37:57,408 ERROR [IPC Server handler 4 on 60020] regionserver.HRegionServer(864): Failed openScanner org.apache.hadoop.hbase.NotServingRegionException: XXXXX-table-name-XXXXX,D70FDA6553BE1A63EF62A7CA0585344A9819CA43:00190,1274914545783 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2278) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1857) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > > Do see 'Forced flushing of XXXX because global memstore limit of 1.6g > ...." > > every 3-4 min > > > Do these periods last a while or are they short? > > hbase.regionserver.global.memstore.upperLimit=0.40 hbase.regionserver.global.memstore.lowerLimit=0.35 Hence the forced flushing only lasts typically between 10-30 secs taking the global.memstore from 1.6g (0.40) to 1.4g (0.35) > You think the scenario described by Jon Gray over in HBASE-2375? > > > > We are trying to size up our capacity handling metrics and > > wanted to get a sense that we not way off the mark. > > > > Well, you seem to have the basics right and you seem to have a good > handle on how the systems interact. All that is left, it would seem > is to try lzo as J-D suggests. > > Good stuff Jacob, > St.Ack > > > Also was looking for ideas and suggestions that we may have missed. > > > > ~Jacob > > > > St.Ack > >> > >> > >> > ~jacob > >> > > >> > > >> > On Sat, May 29, 2010 at 12:04 PM, Stack <[EMAIL PROTECTED]> wrote: > >> >> On Sat, May 29, 2010 at 10:53 AM, Stack <[EMAIL PROTECTED]> wrote: > >> >>> On Fri, May 28, 2010 at 4:11 PM, Jacob Isaac <[EMAIL PROTECTED]> > wrote: > >> >>>> Here is the summary of the runs > >> >>>> > >> >>>> puts (~4-5k per row) > >> >>>> regionsize #rows Total time (ms) > >> >>>> 1G 82282053*2 301943742 > >> >>>> 512M 82287593*2 313119378 > >> >>>> 256M 82246314*2 433200105
-
Re: Performance at large number of regions/nodeStack 2010-06-02, 16:55
On Wed, Jun 2, 2010 at 9:39 AM, Jacob Isaac <[EMAIL PROTECTED]> wrote:
> That's right sha-1 is my set identifier. > and yes I scan since I know the start-row and end-row - I create a Scan > object with startRow and stopRow. > > Good. >> >> > The Failed openScanner messages seems to suggest some region name cache >> is >> > getting stale with so many splits taking place. >> > >> >> Paste the exception. >> >> > 2010-05-26 20:37:57,161 ERROR [IPC Server handler 20 on 60020] > regionserver.HRegionServer(864): Failed openScanner > org.apache.hadoop.hbase.NotServingRegionException: > XXXXX-table-name-XXXXX,249B69DED2DB14DCFD894C7F4A01F282DAAA87D6:00114,1274912608415 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2278) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1857) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > 2010-05-26 20:37:57,408 ERROR [IPC Server handler 4 on 60020] > regionserver.HRegionServer(864): Failed openScanner > org.apache.hadoop.hbase.NotServingRegionException: > XXXXX-table-name-XXXXX,D70FDA6553BE1A63EF62A7CA0585344A9819CA43:00190,1274914545783 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2278) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1857) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > Does it recover eventually? In other words, after a while you can successfully fetch the startkey from above region? Its been deployed? If you track history of this region, can you see what is going on? Grep it in master log. What is happening around the time of the above message? This is 0.20.4 right, so you should be getting the benefit of smarter close where we'll only but up the barriers that provoke the NSRE after we've done flushing of the bullk of memory where before we'd put it up around the flush and close. Otherwise, could up the retries. > > >> >> > Do see 'Forced flushing of XXXX because global memstore limit of 1.6g >> ...." >> > every 3-4 min >> > >> Do these periods last a while or are they short? >> >> > hbase.regionserver.global.memstore.upperLimit=0.40 > hbase.regionserver.global.memstore.lowerLimit=0.35 > Hence the forced flushing only lasts typically between 10-30 secs taking > the global.memstore from 1.6g (0.40) to 1.4g (0.35) 10-30 seconds is a pretty long time. Seems like hbase is struggling to clear its memstores promptly enough. Is that how you'd read the logs? The above NSRE might be because close is held up by a flush that is behind others that are going on inside the regionserver at the time. St.Ack
-
Re: Performance at large number of regions/nodeJacob Isaac 2010-06-02, 20:17
On Wed, Jun 2, 2010 at 9:55 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 2, 2010 at 9:39 AM, Jacob Isaac <[EMAIL PROTECTED]> wrote: > > That's right sha-1 is my set identifier. > > and yes I scan since I know the start-row and end-row - I create a Scan > > object with startRow and stopRow. > > > > > > Good. > > >> > >> > The Failed openScanner messages seems to suggest some region name > cache > >> is > >> > getting stale with so many splits taking place. > >> > > >> > >> Paste the exception. > >> > >> > > 2010-05-26 20:37:57,161 ERROR [IPC Server handler 20 on 60020] > > regionserver.HRegionServer(864): Failed openScanner > > org.apache.hadoop.hbase.NotServingRegionException: > > > XXXXX-table-name-XXXXX,249B69DED2DB14DCFD894C7F4A01F282DAAA87D6:00114,1274912608415 > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2278) > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1857) > > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > 2010-05-26 20:37:57,408 ERROR [IPC Server handler 4 on 60020] > > regionserver.HRegionServer(864): Failed openScanner > > org.apache.hadoop.hbase.NotServingRegionException: > > > XXXXX-table-name-XXXXX,D70FDA6553BE1A63EF62A7CA0585344A9819CA43:00190,1274914545783 > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2278) > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1857) > > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > > > Does it recover eventually? In other words, after a while you can > successfully fetch the startkey from above region? Its been deployed? > > I don't see any application level errors - It's from the RegionManager.metaScanner as the log snippets below indicate. > If you track history of this region, can you see what is going on? > Grep it in master log. What is happening around the time of the above > message? This is 0.20.4 right, so you should be getting the benefit > of smarter close where we'll only but up the barriers that provoke the > NSRE after we've done flushing of the bullk of memory where before > we'd put it up around the flush and close. > > Otherwise, could up the retries. > > Result of a grep for region which eventually NSRE on Master- ----------------- hadoop$>grep '46B0876B87DB493034FB353C67FB267B0C4342D0:00270' /hadoop/hbase/logs/hbase-hadoop-master-node-13.log.2010-05-27 2010-05-27 02:16:22,307 INFO [IPC Server handler 14 on 60000] master.ServerManager(441): Processing MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS: XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274943417046: Daughters; XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274951772247, XXXXX-table-name-1-XXXXX,47567C9D81CDF3898C78B641B829272A2755BE31:00013,1274951772247 from XXXXX-node-20-XXXXX,60020,1274934804792; 1 of 1 2010-05-27 02:16:22,468 INFO [IPC Server handler 19 on 60000] master.RegionManager(337): Assigning region XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274951772247 to XXXXX-node-20-XXXXX,60020,1274934804792 2010-05-27 02:16:23,327 INFO [IPC Server handler 4 on 60000] master.ServerManager(441): Processing MSG_REPORT_OPEN: XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274951772247 from XXXXX-node-20-XXXXX,60020,1274934804792; 1 of 1 2010-05-27 02:16:23,327 INFO [HMaster] master.ProcessRegionOpen(70): XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274951772247 open on 10.10.24.35:60020 2010-05-27 02:16:23,328 INFO [HMaster] master.ProcessRegionOpen(80): Updated row XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274951772247 in region .META.,,1 with startcode=1274934804792, server=10.10.24.35:60020 2010-05-27 02:17:34,060 DEBUG [RegionManager.metaScanner] master.BaseScanner(465): XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274951772247/1680882591 no longer has references to XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274943417046 2010-05-27 02:17:34,061 DEBUG [RegionManager.metaScanner] master.BaseScanner(465): XXXXX-table-name-1-XXXXX,47567C9D81CDF3898C78B641B829272A2755BE31:00013,1274951772247/1767038200 no longer has references to XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274943417046 2010-05-27 02:17:34,062 INFO [RegionManager.metaScanner] master.BaseScanner(304): Deleting region XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274943417046 (encoded=771830170) because daughter splits no longer hold references and on the regionserver hadoop$>grep '46B0876B87DB493034FB353C67FB267B0C4342D0:00270' /hadoop/hbase/logs/hbase-hadoop-regionserver-XXXXX-node-20-XXXXX.log.2010-05-27 2010-05-27 02:16:07,218 INFO [IPC Server handler 15 on 60020] regionserver.MemStoreFlusher(376): Forced flushing of XXXXX-table-name-1-XXXXX,46B0876B87DB493034FB353C67FB267B0C4342D0:00270,1274943417046 because global memstore limit of 1.6g exceeded; currently 1.5g and flushing till 1.4g 2010-05-27 02:16:07,218 DEBUG [IPC Server handler 15 on 60020] regionserver.HRegion(950): Started memstore flush for region XX |