Liu, Raymond 2013-01-05, 03:37
The HBase RefGuide has a comprehensive case study on such a case. This
might not be the exact problem, but the diagnostic approach should help.
On 1/4/13 10:37 PM, "Liu, Raymond" <[EMAIL PROTECTED]> wrote:
>I encounter a weird lag behind map task issue here :
>I have a small hadoop/hbase cluster with 1 master node and 4 regionserver
>node all have 16 CPU with map and reduce slot set to 24.
>A few table is created with regions distributed on each region node
>evenly ( say 16 region for each region server). Also each region has
>almost the same number of kvs with very similar size. All table had
>major_compact done to ensure data locality
>I have a MR job which simply do local region scan in every map task ( so
>16 map task for each regionserver node).
>By theory, every map task should finish within similar time.
>But the real case is that some regions on the same region server always
>lags behind a lot, say cost 150 ~250% of the other map tasks average
>If this is happen to a single region server for every table, I might
>doubt it is a disk issue or other reason that bring down the performance
>of this region server.
>But the weird thing is that, though with each single table, almost all
>the map task on the the same single regionserver is lag behind. But for
>different table, this lag behind regionserver is different! And the
>region and region size is distributed evenly which I double checked for a
>lot of times. ( I even try to set replica to 4 to ensure every node have
>a copy of local data)
>Say table 1, all map task on regionserver node 2 is slow. While for table
>2, maybe all map task on regionserver node 3 is slow, and with table 1,
>it will always be regionserver node 2 which is slow regardless of cluster
>restart, and the slowest map task will always be the very same one. And
>it won't go away even I do major compact again.....
>So, anyone could give me some clue on what reason might possible lead to
>this weird behavior? Any wild guess is welcome!
>(BTW. I don't encounter this issue a few days ago with the same table.
>While I do restart cluster and do a few changes upon config file during
>that period, But restore the config file don't help)
Michael Segel 2013-01-07, 16:59
Liu, Raymond 2013-01-14, 08:26
Ted Yu 2013-01-05, 03:45
Liu, Raymond 2013-01-05, 04:08
Ted Yu 2013-01-05, 04:36
Liu, Raymond 2013-01-05, 04:45