|
Amit Jain
2011-11-16, 23:06
lars hofhansl
2011-11-16, 23:14
Amit Jain
2011-11-16, 23:26
Stack
2011-11-16, 23:35
lars hofhansl
2011-11-16, 23:36
Amit Jain
2011-11-17, 00:09
Matt Corgan
2011-11-17, 00:30
Amit Jain
2011-11-17, 00:37
Stack
2011-11-17, 03:58
Ramkrishna S Vasudevan
2011-11-17, 05:07
Doug Meil
2011-11-17, 19:49
Amit Jain
2011-11-17, 22:20
Amit Jain
2011-11-17, 22:26
Doug Meil
2011-11-18, 13:59
|
-
Help with continuous loading configurationAmit Jain 2011-11-16, 23:06
Hello,
We're doing a proof-of-concept study to see if HBase is a good fit for an application we're planning to build. The application will be recording a continuous stream of sensor data throughout the day and the data needs to be online immediately. Our test cluster consists of 16 machines, each with 16 cores and 32GB of RAM and 8TB local storage running CDH3u2. We're using the HBase client Put class, and have set the table "auto flush" to false and the write buffer size to 12MB. Here are the region server JVM options: export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" And here are the property settings that we're using in the hbase-site.xml file: hbase.rootdir=hdfs://master:9000/hbase hbase.regionserver.handler.count=20 hbase.cluster.distributed=true hbase.zookeeper.quorum=zk01,zk02,zk03 hfile.block.cache.size=0 hbase.hregion.max.filesize=1073741824 hbase.regionserver.global.memstore.upperLimit=0.79 hbase.regionserver.global.memstore.lowerLimit=0.70 hbase.hregion.majorcompaction=0 hbase.hstore.compactionThreshold=15 hbase.hstore.blockingStoreFiles=20 hbase.rpc.timeout=0 zookeeper.session.timeout=3600000 It's taking about 24 hours to load 4TB of data which isn't quite fast enough for our application. Is there a more optimal configuration that we can use to improve loading performance? - Amit
-
Re: Help with continuous loading configurationlars hofhansl 2011-11-16, 23:14
Hi Amit,
12MB write buffer might be a bit high. How are you generating your keys? You might hot spot a single region server if (for example) you create monotonically increasing keys. When you look at the HBase monitoring page, do you see a single region server getting all the requests? Anything weird in the GC logs? Do they all log similar? -- Lars ________________________________ From: Amit Jain <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wednesday, November 16, 2011 3:06 PM Subject: Help with continuous loading configuration Hello, We're doing a proof-of-concept study to see if HBase is a good fit for an application we're planning to build. The application will be recording a continuous stream of sensor data throughout the day and the data needs to be online immediately. Our test cluster consists of 16 machines, each with 16 cores and 32GB of RAM and 8TB local storage running CDH3u2. We're using the HBase client Put class, and have set the table "auto flush" to false and the write buffer size to 12MB. Here are the region server JVM options: export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" And here are the property settings that we're using in the hbase-site.xml file: hbase.rootdir=hdfs://master:9000/hbase hbase.regionserver.handler.count=20 hbase.cluster.distributed=true hbase.zookeeper.quorum=zk01,zk02,zk03 hfile.block.cache.size=0 hbase.hregion.max.filesize=1073741824 hbase.regionserver.global.memstore.upperLimit=0.79 hbase.regionserver.global.memstore.lowerLimit=0.70 hbase.hregion.majorcompaction=0 hbase.hstore.compactionThreshold=15 hbase.hstore.blockingStoreFiles=20 hbase.rpc.timeout=0 zookeeper.session.timeout=3600000 It's taking about 24 hours to load 4TB of data which isn't quite fast enough for our application. Is there a more optimal configuration that we can use to improve loading performance? - Amit
-
Re: Help with continuous loading configurationAmit Jain 2011-11-16, 23:26
Hi Lars,
The keys are arriving in random order. The HBase monitoring page shows evenly distributed load across all of the region servers. I didn't see anything weird in the gc logs, no mention of any failures. I'm a little unclear about what the optimal values for the following properties should be: hbase.hstore.compactionThreshold hbase.hstore.blockingStoreFiles Is there some rule of thumb that I can use to determine good values for these properties? - Amit On Wed, Nov 16, 2011 at 3:14 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Hi Amit, > > 12MB write buffer might be a bit high. > > How are you generating your keys? You might hot spot a single region > server if (for example) you create > monotonically increasing keys. When you look at the HBase monitoring page, > do you see a single region server > getting all the requests? > > > Anything weird in the GC logs? Do they all log similar? > > > -- Lars > > > > ________________________________ > From: Amit Jain <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Wednesday, November 16, 2011 3:06 PM > Subject: Help with continuous loading configuration > > Hello, > > We're doing a proof-of-concept study to see if HBase is a good fit for an > application we're planning to build. The application will be recording a > continuous stream of sensor data throughout the day and the data needs to > be online immediately. Our test cluster consists of 16 machines, each with > 16 cores and 32GB of RAM and 8TB local storage running CDH3u2. We're using > the HBase client Put class, and have set the table "auto flush" to false > and the write buffer size to 12MB. Here are the region server JVM options: > > export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" > > And here are the property settings that we're using in the hbase-site.xml > file: > > hbase.rootdir=hdfs://master:9000/hbase > hbase.regionserver.handler.count=20 > hbase.cluster.distributed=true > hbase.zookeeper.quorum=zk01,zk02,zk03 > hfile.block.cache.size=0 > hbase.hregion.max.filesize=1073741824 > hbase.regionserver.global.memstore.upperLimit=0.79 > hbase.regionserver.global.memstore.lowerLimit=0.70 > hbase.hregion.majorcompaction=0 > hbase.hstore.compactionThreshold=15 > hbase.hstore.blockingStoreFiles=20 > hbase.rpc.timeout=0 > zookeeper.session.timeout=3600000 > > It's taking about 24 hours to load 4TB of data which isn't quite fast > enough for our application. Is there a more optimal configuration that we > can use to improve loading performance? > > - Amit >
-
Re: Help with continuous loading configurationStack 2011-11-16, 23:35
On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote:
> Hi Lars, > > The keys are arriving in random order. The HBase monitoring page shows > evenly distributed load across all of the region servers. What kind of ops rates are you seeing? They are running nice and smooth across all servers? No stuttering? Whats your regionserver logs look like? Are you presplitting your table or just letting hbase run and do up the splits? > I didn't see > anything weird in the gc logs, no mention of any failures. I'm a little > unclear about what the optimal values for the following properties should > be: > > hbase.hstore.compactionThreshold Default is 3. Look in regionserver logs. See how many files you have on average by region columnfamily (you could also look in filesystem). Are we constantly rewriting them? If write only load mostly, you might up this putting off compactions till more files around (but looking in regionserver logs, if high write rate, we might be having trouble keeping up with this default threshold anyways?). > hbase.hstore.blockingStoreFiles > The higher this is, the bigger the price you'll pay if a server crashes because this will be the upper bound on how many WAL logs we need to split for the server before its regions come back on line again. Leave it default I'd say for now. > Is there some rule of thumb that I can use to determine good values for > these properties? > You've checked out this section of the book: http://hbase.apache.org/book.html#performance Are you filling the machines? Are they burning cpu? Or io-bound? If not, perhaps open the front gate wider by upping the number of concurrent handlers. St.Ack
-
Re: Help with continuous loading configurationlars hofhansl 2011-11-16, 23:36
hbase.hstore.blockingStoreFiles is the maximum number of store files HBase will allow before
it will block writes in order to catch up with compacting files. Default is 7. If this is too low you'll see warning about blocking writers in the logs. I found that for some test load I had, I needed to increase this 20 along with changing hbase.hregion.memstore.block.multiplier to 4 (this allows the memstore to grow larger, be careful with this :) ). hbase.hstore.compactionThreshold is the number of store files that will trigger a compaction. Changing this won't help with throughput... But I'll let somebody else jump in with more operational experience. ________________________________ From: Amit Jain <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> Sent: Wednesday, November 16, 2011 3:26 PM Subject: Re: Help with continuous loading configuration Hi Lars, The keys are arriving in random order. The HBase monitoring page shows evenly distributed load across all of the region servers. I didn't see anything weird in the gc logs, no mention of any failures. I'm a little unclear about what the optimal values for the following properties should be: hbase.hstore.compactionThreshold hbase.hstore.blockingStoreFiles Is there some rule of thumb that I can use to determine good values for these properties? - Amit On Wed, Nov 16, 2011 at 3:14 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Hi Amit, > > 12MB write buffer might be a bit high. > > How are you generating your keys? You might hot spot a single region > server if (for example) you create > monotonically increasing keys. When you look at the HBase monitoring page, > do you see a single region server > getting all the requests? > > > Anything weird in the GC logs? Do they all log similar? > > > -- Lars > > > > ________________________________ > From: Amit Jain <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Wednesday, November 16, 2011 3:06 PM > Subject: Help with continuous loading configuration > > Hello, > > We're doing a proof-of-concept study to see if HBase is a good fit for an > application we're planning to build. The application will be recording a > continuous stream of sensor data throughout the day and the data needs to > be online immediately. Our test cluster consists of 16 machines, each with > 16 cores and 32GB of RAM and 8TB local storage running CDH3u2. We're using > the HBase client Put class, and have set the table "auto flush" to false > and the write buffer size to 12MB. Here are the region server JVM options: > > export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" > > And here are the property settings that we're using in the hbase-site.xml > file: > > hbase.rootdir=hdfs://master:9000/hbase > hbase.regionserver.handler.count=20 > hbase.cluster.distributed=true > hbase.zookeeper.quorum=zk01,zk02,zk03 > hfile.block.cache.size=0 > hbase.hregion.max.filesize=1073741824 > hbase.regionserver.global.memstore.upperLimit=0.79 > hbase.regionserver.global.memstore.lowerLimit=0.70 > hbase.hregion.majorcompaction=0 > hbase.hstore.compactionThreshold=15 > hbase.hstore.blockingStoreFiles=20 > hbase.rpc.timeout=0 > zookeeper.session.timeout=3600000 > > It's taking about 24 hours to load 4TB of data which isn't quite fast > enough for our application. Is there a more optimal configuration that we > can use to improve loading performance? > > - Amit >
-
Re: Help with continuous loading configurationAmit Jain 2011-11-17, 00:09
Hi Stack,
Thanks for the feedback. Comments inline ... On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: > On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > > Hi Lars, > > > > The keys are arriving in random order. The HBase monitoring page shows > > evenly distributed load across all of the region servers. > > What kind of ops rates are you seeing? They are running nice and > smooth across all servers? No stuttering? Whats your regionserver > logs look like? > > Are you presplitting your table or just letting hbase run and do up the > splits? > As far as I can tell, the operations look smooth across all servers. We're not doing any pre-splitting, just letting HBase do the splits. > > I didn't see > > anything weird in the gc logs, no mention of any failures. I'm a little > > unclear about what the optimal values for the following properties should > > be: > > > > hbase.hstore.compactionThreshold > > Default is 3. Look in regionserver logs. See how many files you have > on average by region columnfamily (you could also look in filesystem). > Are we constantly rewriting them? If write only load mostly, you > might up this putting off compactions till more files around (but > looking in regionserver logs, if high write rate, we might be having > trouble keeping up with this default threshold anyways?). > Well, it looks like half of the regions are in the 25-32 file range and the other half just have 1 or 2 files. This was when we ran it with a compactionThreshold of 15. How can I tell by looking at the region server logs if we're seeing a "high write rate" ? We've got 48 clients sending load, 12 region servers total. We're pushing the system pretty hard. > > hbase.hstore.blockingStoreFiles > > > > The higher this is, the bigger the price you'll pay if a server > crashes because this will be the upper bound on how many WAL logs we > need to split for the server before its regions come back on line > again. Leave it default I'd say for now. > Ok, we'll leave it default for now. > > Is there some rule of thumb that I can use to determine good values for > > these properties? > > > > You've checked out this section of the book: > http://hbase.apache.org/book.html#performance > > Are you filling the machines? Are they burning cpu? Or io-bound? > If not, perhaps open the front gate wider by upping the number of > concurrent handlers. > I have read through that section of the HBase book. There is plenty of CPU available. How do I up the number of concurrent handlers? Increase hbase.regionserver.handler.count ? - Amit
-
Re: Help with continuous loading configurationMatt Corgan 2011-11-17, 00:30
You can set put.setWriteToWAL(false) to skip the write ahead logging which
slows down puts significantly. But, you will lose data if a regionserver crashes with data in its memstore. On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > Hi Stack, > > Thanks for the feedback. Comments inline ... > > On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: > > > On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > > > Hi Lars, > > > > > > The keys are arriving in random order. The HBase monitoring page shows > > > evenly distributed load across all of the region servers. > > > > What kind of ops rates are you seeing? They are running nice and > > smooth across all servers? No stuttering? Whats your regionserver > > logs look like? > > > > Are you presplitting your table or just letting hbase run and do up the > > splits? > > > > As far as I can tell, the operations look smooth across all servers. We're > not doing any pre-splitting, just letting HBase do the splits. > > > > > I didn't see > > > anything weird in the gc logs, no mention of any failures. I'm a > little > > > unclear about what the optimal values for the following properties > should > > > be: > > > > > > hbase.hstore.compactionThreshold > > > > Default is 3. Look in regionserver logs. See how many files you have > > on average by region columnfamily (you could also look in filesystem). > > Are we constantly rewriting them? If write only load mostly, you > > might up this putting off compactions till more files around (but > > looking in regionserver logs, if high write rate, we might be having > > trouble keeping up with this default threshold anyways?). > > > > Well, it looks like half of the regions are in the 25-32 file range and the > other half just have 1 or 2 files. This was when we ran it with a > compactionThreshold of 15. > > How can I tell by looking at the region server logs if we're seeing a "high > write rate" ? We've got 48 clients sending load, 12 region servers total. > We're pushing the system pretty hard. > > > > > hbase.hstore.blockingStoreFiles > > > > > > > The higher this is, the bigger the price you'll pay if a server > > crashes because this will be the upper bound on how many WAL logs we > > need to split for the server before its regions come back on line > > again. Leave it default I'd say for now. > > > > Ok, we'll leave it default for now. > > > > > Is there some rule of thumb that I can use to determine good values for > > > these properties? > > > > > > > You've checked out this section of the book: > > http://hbase.apache.org/book.html#performance > > > > Are you filling the machines? Are they burning cpu? Or io-bound? > > If not, perhaps open the front gate wider by upping the number of > > concurrent handlers. > > > > I have read through that section of the HBase book. There is plenty of CPU > available. How do I up the number of concurrent handlers? Increase > hbase.regionserver.handler.count ? > > - Amit >
-
Re: Help with continuous loading configurationAmit Jain 2011-11-17, 00:37
We would prefer not to do this. It's important that we have all of the
historical data without any loss. But thanks for the suggestion. - Amit On Wed, Nov 16, 2011 at 4:30 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > You can set put.setWriteToWAL(false) to skip the write ahead logging which > slows down puts significantly. But, you will lose data if a regionserver > crashes with data in its memstore. > > > On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > > > Hi Stack, > > > > Thanks for the feedback. Comments inline ... > > > > On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> > wrote: > > > > Hi Lars, > > > > > > > > The keys are arriving in random order. The HBase monitoring page > shows > > > > evenly distributed load across all of the region servers. > > > > > > What kind of ops rates are you seeing? They are running nice and > > > smooth across all servers? No stuttering? Whats your regionserver > > > logs look like? > > > > > > Are you presplitting your table or just letting hbase run and do up the > > > splits? > > > > > > > As far as I can tell, the operations look smooth across all servers. > We're > > not doing any pre-splitting, just letting HBase do the splits. > > > > > > > > I didn't see > > > > anything weird in the gc logs, no mention of any failures. I'm a > > little > > > > unclear about what the optimal values for the following properties > > should > > > > be: > > > > > > > > hbase.hstore.compactionThreshold > > > > > > Default is 3. Look in regionserver logs. See how many files you have > > > on average by region columnfamily (you could also look in filesystem). > > > Are we constantly rewriting them? If write only load mostly, you > > > might up this putting off compactions till more files around (but > > > looking in regionserver logs, if high write rate, we might be having > > > trouble keeping up with this default threshold anyways?). > > > > > > > Well, it looks like half of the regions are in the 25-32 file range and > the > > other half just have 1 or 2 files. This was when we ran it with a > > compactionThreshold of 15. > > > > How can I tell by looking at the region server logs if we're seeing a > "high > > write rate" ? We've got 48 clients sending load, 12 region servers > total. > > We're pushing the system pretty hard. > > > > > > > > hbase.hstore.blockingStoreFiles > > > > > > > > > > The higher this is, the bigger the price you'll pay if a server > > > crashes because this will be the upper bound on how many WAL logs we > > > need to split for the server before its regions come back on line > > > again. Leave it default I'd say for now. > > > > > > > Ok, we'll leave it default for now. > > > > > > > > Is there some rule of thumb that I can use to determine good values > for > > > > these properties? > > > > > > > > > > You've checked out this section of the book: > > > http://hbase.apache.org/book.html#performance > > > > > > Are you filling the machines? Are they burning cpu? Or io-bound? > > > If not, perhaps open the front gate wider by upping the number of > > > concurrent handlers. > > > > > > > I have read through that section of the HBase book. There is plenty of > CPU > > available. How do I up the number of concurrent handlers? Increase > > hbase.regionserver.handler.count ? > > > > - Amit > > >
-
Re: Help with continuous loading configurationStack 2011-11-17, 03:58
On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote:
> On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: > >> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote: >> > Hi Lars, >> > >> > The keys are arriving in random order. The HBase monitoring page shows >> > evenly distributed load across all of the region servers. >> >> What kind of ops rates are you seeing? They are running nice and >> smooth across all servers? No stuttering? Whats your regionserver >> logs look like? >> >> Are you presplitting your table or just letting hbase run and do up the >> splits? >> > > As far as I can tell, the operations look smooth across all servers. We're > not doing any pre-splitting, just letting HBase do the splits. > So, how many requests per second per server. How many column families? What size are the puts on average? > Well, it looks like half of the regions are in the 25-32 file range and the > other half just have 1 or 2 files. This was when we ran it with a > compactionThreshold of 15. > So, its this count even after the load comes off? Maybe compactions get a chance to cut in and it should shrink them. > How can I tell by looking at the region server logs if we're seeing a "high > write rate" ? Look at UI for basic ops/second. > I have read through that section of the HBase book. There is plenty of CPU > available. How do I up the number of concurrent handlers? Increase > hbase.regionserver.handler.count ? > Yes. You have it pretty low at the moment. What kinda of performance are you looking for? Post your configs so we can look at them. Post a bit of your regionserver log and your table schema. St.Ack
-
RE: Help with continuous loading configurationRamkrishna S Vasudevan 2011-11-17, 05:07
Hi Amit As you said the regions may be distributed evenly across RS, if you can see if the puts are reaching to a particular RS only at any point of time it will surely overload the RS. As Stack pointed out, what is your schema and how is your row key designed ? Regards Ram -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack Sent: Thursday, November 17, 2011 9:29 AM To: [EMAIL PROTECTED] Cc: lars hofhansl Subject: Re: Help with continuous loading configuration On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: > >> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote: >> > Hi Lars, >> > >> > The keys are arriving in random order. The HBase monitoring page shows >> > evenly distributed load across all of the region servers. >> >> What kind of ops rates are you seeing? They are running nice and >> smooth across all servers? No stuttering? Whats your regionserver >> logs look like? >> >> Are you presplitting your table or just letting hbase run and do up the >> splits? >> > > As far as I can tell, the operations look smooth across all servers. We're > not doing any pre-splitting, just letting HBase do the splits. > So, how many requests per second per server. How many column families? What size are the puts on average? > Well, it looks like half of the regions are in the 25-32 file range and the > other half just have 1 or 2 files. This was when we ran it with a > compactionThreshold of 15. > So, its this count even after the load comes off? Maybe compactions get a chance to cut in and it should shrink them. > How can I tell by looking at the region server logs if we're seeing a "high > write rate" ? Look at UI for basic ops/second. > I have read through that section of the HBase book. There is plenty of CPU > available. How do I up the number of concurrent handlers? Increase > hbase.regionserver.handler.count ? > Yes. You have it pretty low at the moment. What kinda of performance are you looking for? Post your configs so we can look at them. Post a bit of your regionserver log and your table schema. St.Ack
-
Re: Help with continuous loading configurationDoug Meil 2011-11-17, 19:49
Hi Amit, Per the rowkey comment, you'll want to review this: http://hbase.apache.org/book.html#rowkey.design (I apologize if this is a dup message, I'm having some periodic email issues) On 11/17/11 12:07 AM, "Ramkrishna S Vasudevan" <[EMAIL PROTECTED]> wrote: > >Hi Amit > >As you said the regions may be distributed evenly across RS, if you can >see >if the puts are reaching to a particular RS only at any point of time it >will surely overload the RS. > >As Stack pointed out, what is your schema and how is your row key >designed ? > >Regards >Ram > > > >-----Original Message----- >From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack >Sent: Thursday, November 17, 2011 9:29 AM >To: [EMAIL PROTECTED] >Cc: lars hofhansl >Subject: Re: Help with continuous loading configuration > >On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote: >> On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: >> >>> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote: >>> > Hi Lars, >>> > >>> > The keys are arriving in random order. The HBase monitoring page >>>shows >>> > evenly distributed load across all of the region servers. >>> >>> What kind of ops rates are you seeing? They are running nice and >>> smooth across all servers? No stuttering? Whats your regionserver >>> logs look like? >>> >>> Are you presplitting your table or just letting hbase run and do up the >>> splits? >>> >> >> As far as I can tell, the operations look smooth across all servers. > We're >> not doing any pre-splitting, just letting HBase do the splits. >> > >So, how many requests per second per server. > >How many column families? What size are the puts on average? > > >> Well, it looks like half of the regions are in the 25-32 file range and >the >> other half just have 1 or 2 files. This was when we ran it with a >> compactionThreshold of 15. >> > >So, its this count even after the load comes off? Maybe compactions >get a chance to cut in and it should shrink them. > > >> How can I tell by looking at the region server logs if we're seeing a >"high >> write rate" ? > >Look at UI for basic ops/second. > > >> I have read through that section of the HBase book. There is plenty of >CPU >> available. How do I up the number of concurrent handlers? Increase >> hbase.regionserver.handler.count ? >> > >Yes. You have it pretty low at the moment. > >What kinda of performance are you looking for? > >Post your configs so we can look at them. Post a bit of your >regionserver log and your table schema. >St.Ack > >
-
Re: Help with continuous loading configurationAmit Jain 2011-11-17, 22:20
Hi Stack,
Right now we're just testing. There's a single table with just one column family and the size of each put is about 5KB. We made some of the changes that you suggested (including upping handler count to 50) and have restarted the test. Attached are the config files that we're using (hbase-env.sh, hbase-site.xml, and hdfs-site.xml). I've also included a screen shot of the HBase admin console after about two hours of operation. We're shooting for getting 10TB of data loaded in about 48 hours. - Amit On Wed, Nov 16, 2011 at 7:58 PM, Stack <[EMAIL PROTECTED]> wrote: > On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > > On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: > > > >> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > >> > Hi Lars, > >> > > >> > The keys are arriving in random order. The HBase monitoring page > shows > >> > evenly distributed load across all of the region servers. > >> > >> What kind of ops rates are you seeing? They are running nice and > >> smooth across all servers? No stuttering? Whats your regionserver > >> logs look like? > >> > >> Are you presplitting your table or just letting hbase run and do up the > >> splits? > >> > > > > As far as I can tell, the operations look smooth across all servers. > We're > > not doing any pre-splitting, just letting HBase do the splits. > > > > So, how many requests per second per server. > > How many column families? What size are the puts on average? > > > > Well, it looks like half of the regions are in the 25-32 file range and > the > > other half just have 1 or 2 files. This was when we ran it with a > > compactionThreshold of 15. > > > > So, its this count even after the load comes off? Maybe compactions > get a chance to cut in and it should shrink them. > > > > How can I tell by looking at the region server logs if we're seeing a > "high > > write rate" ? > > Look at UI for basic ops/second. > > > > I have read through that section of the HBase book. There is plenty of > CPU > > available. How do I up the number of concurrent handlers? Increase > > hbase.regionserver.handler.count ? > > > > Yes. You have it pretty low at the moment. > > What kinda of performance are you looking for? > > Post your configs so we can look at them. Post a bit of your > regionserver log and your table schema. > St.Ack >
-
Re: Help with continuous loading configurationAmit Jain 2011-11-17, 22:26
Hi Ram,
For this test, the data is synthetically generated and the keys are just random fixed-width integers. We're loading into a single table with a one column family. The real data would be less uniform, but we just want to get an idea of whether or not it is feasible. - Amit On Wed, Nov 16, 2011 at 9:07 PM, Ramkrishna S Vasudevan < [EMAIL PROTECTED]> wrote: > > Hi Amit > > As you said the regions may be distributed evenly across RS, if you can see > if the puts are reaching to a particular RS only at any point of time it > will surely overload the RS. > > As Stack pointed out, what is your schema and how is your row key designed > ? > > Regards > Ram > > > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack > Sent: Thursday, November 17, 2011 9:29 AM > To: [EMAIL PROTECTED] > Cc: lars hofhansl > Subject: Re: Help with continuous loading configuration > > On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > > On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: > > > >> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote: > >> > Hi Lars, > >> > > >> > The keys are arriving in random order. The HBase monitoring page > shows > >> > evenly distributed load across all of the region servers. > >> > >> What kind of ops rates are you seeing? They are running nice and > >> smooth across all servers? No stuttering? Whats your regionserver > >> logs look like? > >> > >> Are you presplitting your table or just letting hbase run and do up the > >> splits? > >> > > > > As far as I can tell, the operations look smooth across all servers. > We're > > not doing any pre-splitting, just letting HBase do the splits. > > > > So, how many requests per second per server. > > How many column families? What size are the puts on average? > > > > Well, it looks like half of the regions are in the 25-32 file range and > the > > other half just have 1 or 2 files. This was when we ran it with a > > compactionThreshold of 15. > > > > So, its this count even after the load comes off? Maybe compactions > get a chance to cut in and it should shrink them. > > > > How can I tell by looking at the region server logs if we're seeing a > "high > > write rate" ? > > Look at UI for basic ops/second. > > > > I have read through that section of the HBase book. There is plenty of > CPU > > available. How do I up the number of concurrent handlers? Increase > > hbase.regionserver.handler.count ? > > > > Yes. You have it pretty low at the moment. > > What kinda of performance are you looking for? > > Post your configs so we can look at them. Post a bit of your > regionserver log and your table schema. > St.Ack > >
-
Re: Help with continuous loading configurationDoug Meil 2011-11-18, 13:59
Hi Amit, Per Ram's comment about rowkey design, you'll want to read this: http://hbase.apache.org/book.html#rowkey.design On 11/17/11 12:07 AM, "Ramkrishna S Vasudevan" <[EMAIL PROTECTED]> wrote: > >Hi Amit > >As you said the regions may be distributed evenly across RS, if you can >see >if the puts are reaching to a particular RS only at any point of time it >will surely overload the RS. > >As Stack pointed out, what is your schema and how is your row key >designed ? > >Regards >Ram > > > >-----Original Message----- >From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack >Sent: Thursday, November 17, 2011 9:29 AM >To: [EMAIL PROTECTED] >Cc: lars hofhansl >Subject: Re: Help with continuous loading configuration > >On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <[EMAIL PROTECTED]> wrote: >> On Wed, Nov 16, 2011 at 3:35 PM, Stack <[EMAIL PROTECTED]> wrote: >> >>> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <[EMAIL PROTECTED]> wrote: >>> > Hi Lars, >>> > >>> > The keys are arriving in random order. The HBase monitoring page >>>shows >>> > evenly distributed load across all of the region servers. >>> >>> What kind of ops rates are you seeing? They are running nice and >>> smooth across all servers? No stuttering? Whats your regionserver >>> logs look like? >>> >>> Are you presplitting your table or just letting hbase run and do up the >>> splits? >>> >> >> As far as I can tell, the operations look smooth across all servers. > We're >> not doing any pre-splitting, just letting HBase do the splits. >> > >So, how many requests per second per server. > >How many column families? What size are the puts on average? > > >> Well, it looks like half of the regions are in the 25-32 file range and >the >> other half just have 1 or 2 files. This was when we ran it with a >> compactionThreshold of 15. >> > >So, its this count even after the load comes off? Maybe compactions >get a chance to cut in and it should shrink them. > > >> How can I tell by looking at the region server logs if we're seeing a >"high >> write rate" ? > >Look at UI for basic ops/second. > > >> I have read through that section of the HBase book. There is plenty of >CPU >> available. How do I up the number of concurrent handlers? Increase >> hbase.regionserver.handler.count ? >> > >Yes. You have it pretty low at the moment. > >What kinda of performance are you looking for? > >Post your configs so we can look at them. Post a bit of your >regionserver log and your table schema. >St.Ack > > |