|
|
-
HBase scan performance decreases over time.
David Koch 2012-11-03, 15:12
Hello,
Every now and then we need to flatten our cluster and re-import all data from log files (changes in data format, etc.) Afterwards we notice a significant increase in scan performance. As data is added and shuffled around between region servers, performance goes down again over time (say a couple of weeks). Are there any routine operations that one should run manually, or settings to activate in the HBase configuration to keep the data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
Thank you,
/David
-
Re: HBase scan performance decreases over time.
Ted Yu 2012-11-03, 15:42
Can you tell us how often you run major compaction after the import ? Have you noticed imbalanced read / write requests in the cluster ? Meaning subset of region servers receive bulk of the writes.
We do some manual movement of regions when the above happens.
Cheers
On Sat, Nov 3, 2012 at 8:12 AM, David Koch <[EMAIL PROTECTED]> wrote:
> Hello, > > Every now and then we need to flatten our cluster and re-import all data > from log files (changes in data format, etc.) Afterwards we notice a > significant increase in scan performance. As data is added and shuffled > around between region servers, performance goes down again over time (say a > couple of weeks). Are there any routine operations that one should run > manually, or settings to activate in the HBase configuration to keep the > data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster. > > Thank you, > > /David >
-
Re: HBase scan performance decreases over time.
David Koch 2012-11-03, 19:50
Hello Ted,
We never initiate major compaction manually. I have not looked at I/O balance between nodes in detail. We have noticed that after running for a couple of weeks HBase seems to spend hours pushing blocks between nodes in order to optimize things. We add data daily in one ~30gb push to several tables. Sometimes nodes get added to the running system.
Where can I get more information on how to carry out performance related HBase administrative tasks?
Thank you,
/David On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> Can you tell us how often you run major compaction after the import ? > Have you noticed imbalanced read / write requests in the cluster ? Meaning > subset of region servers receive bulk of the writes. > > We do some manual movement of regions when the above happens. > > Cheers > > On Sat, Nov 3, 2012 at 8:12 AM, David Koch <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > Every now and then we need to flatten our cluster and re-import all data > > from log files (changes in data format, etc.) Afterwards we notice a > > significant increase in scan performance. As data is added and shuffled > > around between region servers, performance goes down again over time > (say a > > couple of weeks). Are there any routine operations that one should run > > manually, or settings to activate in the HBase configuration to keep the > > data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster. > > > > Thank you, > > > > /David > > >
-
Re: HBase scan performance decreases over time.
Ted Yu 2012-11-03, 20:14
Have you looked at http://hbase.apache.org/book.html#performance ? Thanks On Sat, Nov 3, 2012 at 12:50 PM, David Koch <[EMAIL PROTECTED]> wrote: > Hello Ted, > > We never initiate major compaction manually. I have not looked at I/O > balance between nodes in detail. We have noticed that after running for a > couple of weeks HBase seems to spend hours pushing blocks between nodes in > order to optimize things. We add data daily in one ~30gb push to several > tables. Sometimes nodes get added to the running system. > > Where can I get more information on how to carry out performance related > HBase administrative tasks? > > Thank you, > > /David > > > On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Can you tell us how often you run major compaction after the import ? > > Have you noticed imbalanced read / write requests in the cluster ? > Meaning > > subset of region servers receive bulk of the writes. > > > > We do some manual movement of regions when the above happens. > > > > Cheers > > > > On Sat, Nov 3, 2012 at 8:12 AM, David Koch <[EMAIL PROTECTED]> > wrote: > > > > > Hello, > > > > > > Every now and then we need to flatten our cluster and re-import all > data > > > from log files (changes in data format, etc.) Afterwards we notice a > > > significant increase in scan performance. As data is added and shuffled > > > around between region servers, performance goes down again over time > > (say a > > > couple of weeks). Are there any routine operations that one should run > > > manually, or settings to activate in the HBase configuration to keep > the > > > data well distributed? We use HBase 0.92 as part of a Cloudera4 > cluster. > > > > > > Thank you, > > > > > > /David > > > > > >
-
Re: HBase scan performance decreases over time.
Michael Segel 2012-11-05, 13:04
There's an HDFS bandwidth setting which is set to 10MB/s.
Way too low for even 1GBe.
Have you modified this setting yet?
-Mike
On Nov 3, 2012, at 2:50 PM, David Koch <[EMAIL PROTECTED]> wrote:
> Hello Ted, > > We never initiate major compaction manually. I have not looked at I/O > balance between nodes in detail. We have noticed that after running for a > couple of weeks HBase seems to spend hours pushing blocks between nodes in > order to optimize things. We add data daily in one ~30gb push to several > tables. Sometimes nodes get added to the running system. > > Where can I get more information on how to carry out performance related > HBase administrative tasks? > > Thank you, > > /David > > > On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> Can you tell us how often you run major compaction after the import ? >> Have you noticed imbalanced read / write requests in the cluster ? Meaning >> subset of region servers receive bulk of the writes. >> >> We do some manual movement of regions when the above happens. >> >> Cheers >> >> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <[EMAIL PROTECTED]> wrote: >> >>> Hello, >>> >>> Every now and then we need to flatten our cluster and re-import all data >>> from log files (changes in data format, etc.) Afterwards we notice a >>> significant increase in scan performance. As data is added and shuffled >>> around between region servers, performance goes down again over time >> (say a >>> couple of weeks). Are there any routine operations that one should run >>> manually, or settings to activate in the HBase configuration to keep the >>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster. >>> >>> Thank you, >>> >>> /David >>> >>
-
Re: HBase scan performance decreases over time.
Asaf Mesika 2012-11-05, 18:14
Where is this settings located?
Sent from my iPhone
On 5 בנוב 2012, at 15:05, Michael Segel <[EMAIL PROTECTED]> wrote:
> There's an HDFS bandwidth setting which is set to 10MB/s. > > Way too low for even 1GBe. > > Have you modified this setting yet? > > -Mike > > On Nov 3, 2012, at 2:50 PM, David Koch <[EMAIL PROTECTED]> wrote: > >> Hello Ted, >> >> We never initiate major compaction manually. I have not looked at I/O >> balance between nodes in detail. We have noticed that after running for a >> couple of weeks HBase seems to spend hours pushing blocks between nodes in >> order to optimize things. We add data daily in one ~30gb push to several >> tables. Sometimes nodes get added to the running system. >> >> Where can I get more information on how to carry out performance related >> HBase administrative tasks? >> >> Thank you, >> >> /David >> >> >> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >>> Can you tell us how often you run major compaction after the import ? >>> Have you noticed imbalanced read / write requests in the cluster ? Meaning >>> subset of region servers receive bulk of the writes. >>> >>> We do some manual movement of regions when the above happens. >>> >>> Cheers >>> >>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <[EMAIL PROTECTED]> wrote: >>> >>>> Hello, >>>> >>>> Every now and then we need to flatten our cluster and re-import all data >>>> from log files (changes in data format, etc.) Afterwards we notice a >>>> significant increase in scan performance. As data is added and shuffled >>>> around between region servers, performance goes down again over time >>> (say a >>>> couple of weeks). Are there any routine operations that one should run >>>> manually, or settings to activate in the HBase configuration to keep the >>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster. >>>> >>>> Thank you, >>>> >>>> /David >
-
Re: HBase scan performance decreases over time.
Michael Segel 2012-11-05, 18:49
hdfs-site.xml
Its an HDFS setting that may impact the balancing of HBase as well. (I'm sure someone can give a better response by looking at the code. ) On Nov 5, 2012, at 12:14 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:
> Where is this settings located? > > Sent from my iPhone > > On 5 בנוב 2012, at 15:05, Michael Segel <[EMAIL PROTECTED]> wrote: > >> There's an HDFS bandwidth setting which is set to 10MB/s. >> >> Way too low for even 1GBe. >> >> Have you modified this setting yet? >> >> -Mike >> >> On Nov 3, 2012, at 2:50 PM, David Koch <[EMAIL PROTECTED]> wrote: >> >>> Hello Ted, >>> >>> We never initiate major compaction manually. I have not looked at I/O >>> balance between nodes in detail. We have noticed that after running for a >>> couple of weeks HBase seems to spend hours pushing blocks between nodes in >>> order to optimize things. We add data daily in one ~30gb push to several >>> tables. Sometimes nodes get added to the running system. >>> >>> Where can I get more information on how to carry out performance related >>> HBase administrative tasks? >>> >>> Thank you, >>> >>> /David >>> >>> >>> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >>> >>>> Can you tell us how often you run major compaction after the import ? >>>> Have you noticed imbalanced read / write requests in the cluster ? Meaning >>>> subset of region servers receive bulk of the writes. >>>> >>>> We do some manual movement of regions when the above happens. >>>> >>>> Cheers >>>> >>>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <[EMAIL PROTECTED]> wrote: >>>> >>>>> Hello, >>>>> >>>>> Every now and then we need to flatten our cluster and re-import all data >>>>> from log files (changes in data format, etc.) Afterwards we notice a >>>>> significant increase in scan performance. As data is added and shuffled >>>>> around between region servers, performance goes down again over time >>>> (say a >>>>> couple of weeks). Are there any routine operations that one should run >>>>> manually, or settings to activate in the HBase configuration to keep the >>>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster. >>>>> >>>>> Thank you, >>>>> >>>>> /David >> >
-
Re: HBase scan performance decreases over time.
Leonid Fedotov 2012-11-05, 18:52
There is property dfs.balance.bandwidthPerSec in hdfs-site.xml <property> <name>dfs.balance.bandwidthPerSec</name> <value>6250000</value> <description> Specifies the maximum amount of bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second. </description> </property> Thank you!
Sincerely, Leonid Fedotov On Nov 5, 2012, at 10:14 AM, Asaf Mesika wrote:
> Where is this settings located? > > Sent from my iPhone > > On 5 בנוב 2012, at 15:05, Michael Segel <[EMAIL PROTECTED]> wrote: > >> There's an HDFS bandwidth setting which is set to 10MB/s. >> >> Way too low for even 1GBe. >> >> Have you modified this setting yet? >> >> -Mike >> >> On Nov 3, 2012, at 2:50 PM, David Koch <[EMAIL PROTECTED]> wrote: >> >>> Hello Ted, >>> >>> We never initiate major compaction manually. I have not looked at I/O >>> balance between nodes in detail. We have noticed that after running for a >>> couple of weeks HBase seems to spend hours pushing blocks between nodes in >>> order to optimize things. We add data daily in one ~30gb push to several >>> tables. Sometimes nodes get added to the running system. >>> >>> Where can I get more information on how to carry out performance related >>> HBase administrative tasks? >>> >>> Thank you, >>> >>> /David >>> >>> >>> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >>> >>>> Can you tell us how often you run major compaction after the import ? >>>> Have you noticed imbalanced read / write requests in the cluster ? Meaning >>>> subset of region servers receive bulk of the writes. >>>> >>>> We do some manual movement of regions when the above happens. >>>> >>>> Cheers >>>> >>>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <[EMAIL PROTECTED]> wrote: >>>> >>>>> Hello, >>>>> >>>>> Every now and then we need to flatten our cluster and re-import all data >>>>> from log files (changes in data format, etc.) Afterwards we notice a >>>>> significant increase in scan performance. As data is added and shuffled >>>>> around between region servers, performance goes down again over time >>>> (say a >>>>> couple of weeks). Are there any routine operations that one should run >>>>> manually, or settings to activate in the HBase configuration to keep the >>>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster. >>>>> >>>>> Thank you, >>>>> >>>>> /David >>
|
|