|
Liu, Raymond
2013-01-05, 03:37
Doug Meil
2013-01-07, 13:37
Michael Segel
2013-01-07, 16:59
Liu, Raymond
2013-01-14, 08:26
Ted Yu
2013-01-05, 03:45
Liu, Raymond
2013-01-05, 04:08
Ted Yu
2013-01-05, 04:36
Liu, Raymond
2013-01-05, 04:45
|
-
One weird problem of my MR job upon hbase table.Liu, Raymond 2013-01-05, 03:37
Hi
I encounter a weird lag behind map task issue here : I have a small hadoop/hbase cluster with 1 master node and 4 regionserver node all have 16 CPU with map and reduce slot set to 24. A few table is created with regions distributed on each region node evenly ( say 16 region for each region server). Also each region has almost the same number of kvs with very similar size. All table had major_compact done to ensure data locality I have a MR job which simply do local region scan in every map task ( so 16 map task for each regionserver node). By theory, every map task should finish within similar time. But the real case is that some regions on the same region server always lags behind a lot, say cost 150 ~250% of the other map tasks average times. If this is happen to a single region server for every table, I might doubt it is a disk issue or other reason that bring down the performance of this region server. But the weird thing is that, though with each single table, almost all the map task on the the same single regionserver is lag behind. But for different table, this lag behind regionserver is different! And the region and region size is distributed evenly which I double checked for a lot of times. ( I even try to set replica to 4 to ensure every node have a copy of local data) Say table 1, all map task on regionserver node 2 is slow. While for table 2, maybe all map task on regionserver node 3 is slow, and with table 1, it will always be regionserver node 2 which is slow regardless of cluster restart, and the slowest map task will always be the very same one. And it won't go away even I do major compact again..... So, anyone could give me some clue on what reason might possible lead to this weird behavior? Any wild guess is welcome! (BTW. I don't encounter this issue a few days ago with the same table. While I do restart cluster and do a few changes upon config file during that period, But restore the config file don't help) Best Regards, Raymond Liu +
Liu, Raymond 2013-01-05, 03:37
-
Re: One weird problem of my MR job upon hbase table.Doug Meil 2013-01-07, 13:37
Hi there, The HBase RefGuide has a comprehensive case study on such a case. This might not be the exact problem, but the diagnostic approach should help. http://hbase.apache.org/book.html#casestudies.slownode On 1/4/13 10:37 PM, "Liu, Raymond" <[EMAIL PROTECTED]> wrote: >Hi > >I encounter a weird lag behind map task issue here : > >I have a small hadoop/hbase cluster with 1 master node and 4 regionserver >node all have 16 CPU with map and reduce slot set to 24. > >A few table is created with regions distributed on each region node >evenly ( say 16 region for each region server). Also each region has >almost the same number of kvs with very similar size. All table had >major_compact done to ensure data locality > >I have a MR job which simply do local region scan in every map task ( so >16 map task for each regionserver node). > >By theory, every map task should finish within similar time. > >But the real case is that some regions on the same region server always >lags behind a lot, say cost 150 ~250% of the other map tasks average >times. > >If this is happen to a single region server for every table, I might >doubt it is a disk issue or other reason that bring down the performance >of this region server. > >But the weird thing is that, though with each single table, almost all >the map task on the the same single regionserver is lag behind. But for >different table, this lag behind regionserver is different! And the >region and region size is distributed evenly which I double checked for a >lot of times. ( I even try to set replica to 4 to ensure every node have >a copy of local data) > >Say table 1, all map task on regionserver node 2 is slow. While for table >2, maybe all map task on regionserver node 3 is slow, and with table 1, >it will always be regionserver node 2 which is slow regardless of cluster >restart, and the slowest map task will always be the very same one. And >it won't go away even I do major compact again..... > >So, anyone could give me some clue on what reason might possible lead to >this weird behavior? Any wild guess is welcome! > >(BTW. I don't encounter this issue a few days ago with the same table. >While I do restart cluster and do a few changes upon config file during >that period, But restore the config file don't help) > > >Best Regards, >Raymond Liu > > +
Doug Meil 2013-01-07, 13:37
-
Re: One weird problem of my MR job upon hbase table.Michael Segel 2013-01-07, 16:59
Where did he mention he was attempting to bond the ports?
Sorry if I missed it? On Jan 7, 2013, at 7:37 AM, Doug Meil <[EMAIL PROTECTED]> wrote: > > Hi there, > > The HBase RefGuide has a comprehensive case study on such a case. This > might not be the exact problem, but the diagnostic approach should help. > > http://hbase.apache.org/book.html#casestudies.slownode > > > > > > On 1/4/13 10:37 PM, "Liu, Raymond" <[EMAIL PROTECTED]> wrote: > >> Hi >> >> I encounter a weird lag behind map task issue here : >> >> I have a small hadoop/hbase cluster with 1 master node and 4 regionserver >> node all have 16 CPU with map and reduce slot set to 24. >> >> A few table is created with regions distributed on each region node >> evenly ( say 16 region for each region server). Also each region has >> almost the same number of kvs with very similar size. All table had >> major_compact done to ensure data locality >> >> I have a MR job which simply do local region scan in every map task ( so >> 16 map task for each regionserver node). >> >> By theory, every map task should finish within similar time. >> >> But the real case is that some regions on the same region server always >> lags behind a lot, say cost 150 ~250% of the other map tasks average >> times. >> >> If this is happen to a single region server for every table, I might >> doubt it is a disk issue or other reason that bring down the performance >> of this region server. >> >> But the weird thing is that, though with each single table, almost all >> the map task on the the same single regionserver is lag behind. But for >> different table, this lag behind regionserver is different! And the >> region and region size is distributed evenly which I double checked for a >> lot of times. ( I even try to set replica to 4 to ensure every node have >> a copy of local data) >> >> Say table 1, all map task on regionserver node 2 is slow. While for table >> 2, maybe all map task on regionserver node 3 is slow, and with table 1, >> it will always be regionserver node 2 which is slow regardless of cluster >> restart, and the slowest map task will always be the very same one. And >> it won't go away even I do major compact again..... >> >> So, anyone could give me some clue on what reason might possible lead to >> this weird behavior? Any wild guess is welcome! >> >> (BTW. I don't encounter this issue a few days ago with the same table. >> While I do restart cluster and do a few changes upon config file during >> that period, But restore the config file don't help) >> >> >> Best Regards, >> Raymond Liu >> >> > > > +
Michael Segel 2013-01-07, 16:59
-
RE: One weird problem of my MR job upon hbase table.Liu, Raymond 2013-01-14, 08:26
Hi
For feedback. With a lot of profiling works, I guess I found the most promise cause of my problem. It's not because one disk is slow or something ( though I do have slow disk on different region servers, but the lagging behind pattern seems not related to the disk slowness pattern) It seems to me this issue is caused by HDFS blocks not distributed evenly across disks on the same region server. For a specific table, I do have even region across region servers. But the HFILE data blocks that belong to the local region on the slow region server don't distributed evenly. Say disk1 have 45 blocks, while disk4 have 30 blocks etc. While, on the large, Add up all the blocks from all tables including both local region's block and remote region's replica data block on this region server, they are evenly distributed. Thus I guess the HDFS did try to even out the data block on the disks, But since it do not know which block is belong to which region, and there are incoming replica data blocks, so even with round robin strategy, it could not even out the "local region"'s data block across disks. Seems this could hardly been avoid? Thus, there are hotspot disk, and with defined scan sequence which lead to lag behind region, which lead to lag behind map task and on this region server. This is the best guess I have gain up to now. But not knowing why this issue come out suddenly on my cluster, or why I don't observe it before... > > Hi there, > > The HBase RefGuide has a comprehensive case study on such a case. This > might not be the exact problem, but the diagnostic approach should help. > > http://hbase.apache.org/book.html#casestudies.slownode > > > > > > On 1/4/13 10:37 PM, "Liu, Raymond" <[EMAIL PROTECTED]> wrote: > > >Hi > > > >I encounter a weird lag behind map task issue here : > > > >I have a small hadoop/hbase cluster with 1 master node and 4 > >regionserver node all have 16 CPU with map and reduce slot set to 24. > > > >A few table is created with regions distributed on each region node > >evenly ( say 16 region for each region server). Also each region has > >almost the same number of kvs with very similar size. All table had > >major_compact done to ensure data locality > > > >I have a MR job which simply do local region scan in every map task ( > >so > >16 map task for each regionserver node). > > > >By theory, every map task should finish within similar time. > > > >But the real case is that some regions on the same region server always > >lags behind a lot, say cost 150 ~250% of the other map tasks average > >times. > > > >If this is happen to a single region server for every table, I might > >doubt it is a disk issue or other reason that bring down the > >performance of this region server. > > > >But the weird thing is that, though with each single table, almost all > >the map task on the the same single regionserver is lag behind. But for > >different table, this lag behind regionserver is different! And the > >region and region size is distributed evenly which I double checked for > >a lot of times. ( I even try to set replica to 4 to ensure every node > >have a copy of local data) > > > >Say table 1, all map task on regionserver node 2 is slow. While for > >table 2, maybe all map task on regionserver node 3 is slow, and with > >table 1, it will always be regionserver node 2 which is slow regardless > >of cluster restart, and the slowest map task will always be the very > >same one. And it won't go away even I do major compact again..... > > > >So, anyone could give me some clue on what reason might possible lead > >to this weird behavior? Any wild guess is welcome! > > > >(BTW. I don't encounter this issue a few days ago with the same table. > >While I do restart cluster and do a few changes upon config file during > >that period, But restore the config file don't help) > > > > > >Best Regards, > >Raymond Liu > > > > > +
Liu, Raymond 2013-01-14, 08:26
-
Re: One weird problem of my MR job upon hbase table.Ted Yu 2013-01-05, 03:45
Did you use TableInputFormat in your MR job ?
Did you use the one from mapred or mapreduce ? What version of HBase are you using ? Did you take a look at Ganglia to see if there is any bottleneck in your cluster ? You mentioned a few changes upon config file shortly before this problem appeared, can you let us know which parameters you modified ? Cheers On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <[EMAIL PROTECTED]> wrote: > Hi > > I encounter a weird lag behind map task issue here : > > I have a small hadoop/hbase cluster with 1 master node and 4 regionserver > node all have 16 CPU with map and reduce slot set to 24. > > A few table is created with regions distributed on each region node evenly > ( say 16 region for each region server). Also each region has almost the > same number of kvs with very similar size. All table had major_compact done > to ensure data locality > > I have a MR job which simply do local region scan in every map task ( so > 16 map task for each regionserver node). > > By theory, every map task should finish within similar time. > > But the real case is that some regions on the same region server always > lags behind a lot, say cost 150 ~250% of the other map tasks average times. > > If this is happen to a single region server for every table, I might doubt > it is a disk issue or other reason that bring down the performance of this > region server. > > But the weird thing is that, though with each single table, almost all the > map task on the the same single regionserver is lag behind. But for > different table, this lag behind regionserver is different! And the region > and region size is distributed evenly which I double checked for a lot of > times. ( I even try to set replica to 4 to ensure every node have a copy of > local data) > > Say table 1, all map task on regionserver node 2 is slow. While for table > 2, maybe all map task on regionserver node 3 is slow, and with table 1, it > will always be regionserver node 2 which is slow regardless of cluster > restart, and the slowest map task will always be the very same one. And it > won't go away even I do major compact again..... > > So, anyone could give me some clue on what reason might possible lead to > this weird behavior? Any wild guess is welcome! > > (BTW. I don't encounter this issue a few days ago with the same table. > While I do restart cluster and do a few changes upon config file during > that period, But restore the config file don't help) > > > Best Regards, > Raymond Liu > > +
Ted Yu 2013-01-05, 03:45
-
RE: One weird problem of my MR job upon hbase table.Liu, Raymond 2013-01-05, 04:08
Hi Ted
Thanks for your reply > > Did you use TableInputFormat in your MR job ? No, a custom one which do the same split work, but input for each map task is the split, and the map task open htable and read the specific region by itself. > Did you use the one from mapred or mapreduce ? All related staff from mapreduce. > > What version of HBase are you using ? 0.94.1 > > Did you take a look at Ganglia to see if there is any bottleneck in your cluster ? I don't, but I do check cpu and disk usage simply by dstat -cdnm , no cpu or disk or network IO bottle neck is observed. > > You mentioned a few changes upon config file shortly before this problem > appeared, can you let us know which parameters you modified ? Mainly increase dfs.datanode.handler.count / hbase.regionserver.handler.count from default to around 30 etc. while this is done on every node. And I change it back later. Hmm... > > Cheers > > On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <[EMAIL PROTECTED]> wrote: > > > Hi > > > > I encounter a weird lag behind map task issue here : > > > > I have a small hadoop/hbase cluster with 1 master node and 4 > > regionserver node all have 16 CPU with map and reduce slot set to 24. > > > > A few table is created with regions distributed on each region node > > evenly ( say 16 region for each region server). Also each region has > > almost the same number of kvs with very similar size. All table had > > major_compact done to ensure data locality > > > > I have a MR job which simply do local region scan in every map task ( > > so > > 16 map task for each regionserver node). > > > > By theory, every map task should finish within similar time. > > > > But the real case is that some regions on the same region server > > always lags behind a lot, say cost 150 ~250% of the other map tasks average > times. > > > > If this is happen to a single region server for every table, I might > > doubt it is a disk issue or other reason that bring down the > > performance of this region server. > > > > But the weird thing is that, though with each single table, almost all > > the map task on the the same single regionserver is lag behind. But > > for different table, this lag behind regionserver is different! And > > the region and region size is distributed evenly which I double > > checked for a lot of times. ( I even try to set replica to 4 to ensure > > every node have a copy of local data) > > > > Say table 1, all map task on regionserver node 2 is slow. While for > > table 2, maybe all map task on regionserver node 3 is slow, and with > > table 1, it will always be regionserver node 2 which is slow > > regardless of cluster restart, and the slowest map task will always be > > the very same one. And it won't go away even I do major compact again..... > > > > So, anyone could give me some clue on what reason might possible lead > > to this weird behavior? Any wild guess is welcome! > > > > (BTW. I don't encounter this issue a few days ago with the same table. > > While I do restart cluster and do a few changes upon config file > > during that period, But restore the config file don't help) > > > > > > Best Regards, > > Raymond Liu > > > > +
Liu, Raymond 2013-01-05, 04:08
-
Re: One weird problem of my MR job upon hbase table.Ted Yu 2013-01-05, 04:36
Since a custom InputFormat was used, I assume you have verified that the
map tasks ran on the region server which hosts the regions being scanned. If you were doing aggregation through this MR job, you can consider using AggregateProtocol. Cheers On Fri, Jan 4, 2013 at 8:08 PM, Liu, Raymond <[EMAIL PROTECTED]> wrote: > Hi Ted > > Thanks for your reply > > > > > Did you use TableInputFormat in your MR job ? > No, a custom one which do the same split work, but input for each map task > is the split, and the map task open htable and read the specific region by > itself. > > > Did you use the one from mapred or mapreduce ? > All related staff from mapreduce. > > > > > What version of HBase are you using ? > 0.94.1 > > > > > Did you take a look at Ganglia to see if there is any bottleneck in your > cluster ? > > I don't, but I do check cpu and disk usage simply by dstat -cdnm , no cpu > or disk or network IO bottle neck is observed. > > > > > You mentioned a few changes upon config file shortly before this problem > > appeared, can you let us know which parameters you modified ? > > Mainly increase dfs.datanode.handler.count / > hbase.regionserver.handler.count from default to around 30 etc. while this > is done on every node. And I change it back later. Hmm... > > > > > > Cheers > > > > On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <[EMAIL PROTECTED]> > wrote: > > > > > Hi > > > > > > I encounter a weird lag behind map task issue here : > > > > > > I have a small hadoop/hbase cluster with 1 master node and 4 > > > regionserver node all have 16 CPU with map and reduce slot set to 24. > > > > > > A few table is created with regions distributed on each region node > > > evenly ( say 16 region for each region server). Also each region has > > > almost the same number of kvs with very similar size. All table had > > > major_compact done to ensure data locality > > > > > > I have a MR job which simply do local region scan in every map task ( > > > so > > > 16 map task for each regionserver node). > > > > > > By theory, every map task should finish within similar time. > > > > > > But the real case is that some regions on the same region server > > > always lags behind a lot, say cost 150 ~250% of the other map tasks > average > > times. > > > > > > If this is happen to a single region server for every table, I might > > > doubt it is a disk issue or other reason that bring down the > > > performance of this region server. > > > > > > But the weird thing is that, though with each single table, almost all > > > the map task on the the same single regionserver is lag behind. But > > > for different table, this lag behind regionserver is different! And > > > the region and region size is distributed evenly which I double > > > checked for a lot of times. ( I even try to set replica to 4 to ensure > > > every node have a copy of local data) > > > > > > Say table 1, all map task on regionserver node 2 is slow. While for > > > table 2, maybe all map task on regionserver node 3 is slow, and with > > > table 1, it will always be regionserver node 2 which is slow > > > regardless of cluster restart, and the slowest map task will always be > > > the very same one. And it won't go away even I do major compact > again..... > > > > > > So, anyone could give me some clue on what reason might possible lead > > > to this weird behavior? Any wild guess is welcome! > > > > > > (BTW. I don't encounter this issue a few days ago with the same table. > > > While I do restart cluster and do a few changes upon config file > > > during that period, But restore the config file don't help) > > > > > > > > > Best Regards, > > > Raymond Liu > > > > > > > +
Ted Yu 2013-01-05, 04:36
-
RE: One weird problem of my MR job upon hbase table.Liu, Raymond 2013-01-05, 04:45
>
> Since a custom InputFormat was used, I assume you have verified that the map > tasks ran on the region server which hosts the regions being scanned. Yes, this inputFormat's behavior is verified with same data before. And , btw, I try to replace it with original TableInputFormat. Similar result. And if I run the same job again, with the native fs's cache's help( my data is small enough to be fully cached by native filesystem, and like TableInputFormat, hbase block cache is disabled). The lags behind issue gone. Thus it must be something wrong with disk IO part. But why different table behavior differently while consistently... > > If you were doing aggregation through this MR job, you can consider using > AggregateProtocol. ;) what trouble me is not that this MR job behavior abnormal, but why, seems nothing specially ever changed in my cluster, then, it suddenly go wrong... > > Cheers > > On Fri, Jan 4, 2013 at 8:08 PM, Liu, Raymond <[EMAIL PROTECTED]> wrote: > > > Hi Ted > > > > Thanks for your reply > > > > > > > > Did you use TableInputFormat in your MR job ? > > No, a custom one which do the same split work, but input for each map > > task is the split, and the map task open htable and read the specific > > region by itself. > > > > > Did you use the one from mapred or mapreduce ? > > All related staff from mapreduce. > > > > > > > > What version of HBase are you using ? > > 0.94.1 > > > > > > > > Did you take a look at Ganglia to see if there is any bottleneck in > > > your > > cluster ? > > > > I don't, but I do check cpu and disk usage simply by dstat -cdnm , no > > cpu or disk or network IO bottle neck is observed. > > > > > > > > You mentioned a few changes upon config file shortly before this > > > problem appeared, can you let us know which parameters you modified ? > > > > Mainly increase dfs.datanode.handler.count / > > hbase.regionserver.handler.count from default to around 30 etc. while > > this is done on every node. And I change it back later. Hmm... > > > > > > > > > > Cheers > > > > > > On Fri, Jan 4, 2013 at 7:37 PM, Liu, Raymond <[EMAIL PROTECTED]> > > wrote: > > > > > > > Hi > > > > > > > > I encounter a weird lag behind map task issue here : > > > > > > > > I have a small hadoop/hbase cluster with 1 master node and 4 > > > > regionserver node all have 16 CPU with map and reduce slot set to 24. > > > > > > > > A few table is created with regions distributed on each region > > > > node evenly ( say 16 region for each region server). Also each > > > > region has almost the same number of kvs with very similar size. > > > > All table had major_compact done to ensure data locality > > > > > > > > I have a MR job which simply do local region scan in every map > > > > task ( so > > > > 16 map task for each regionserver node). > > > > > > > > By theory, every map task should finish within similar time. > > > > > > > > But the real case is that some regions on the same region server > > > > always lags behind a lot, say cost 150 ~250% of the other map > > > > tasks > > average > > > times. > > > > > > > > If this is happen to a single region server for every table, I > > > > might doubt it is a disk issue or other reason that bring down the > > > > performance of this region server. > > > > > > > > But the weird thing is that, though with each single table, almost > > > > all the map task on the the same single regionserver is lag > > > > behind. But for different table, this lag behind regionserver is > > > > different! And the region and region size is distributed evenly > > > > which I double checked for a lot of times. ( I even try to set > > > > replica to 4 to ensure every node have a copy of local data) > > > > > > > > Say table 1, all map task on regionserver node 2 is slow. While > > > > for table 2, maybe all map task on regionserver node 3 is slow, > > > > and with table 1, it will always be regionserver node 2 which is > > > > slow regardless of cluster restart, and the slowest map task will +
Liu, Raymond 2013-01-05, 04:45
|