Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - One weird problem of my MR job upon hbase table.


Copy link to this message
-
Re: One weird problem of my MR job upon hbase table.
Michael Segel 2013-01-07, 16:59
Where did he mention he was attempting to bond the ports?
Sorry if I missed it?

On Jan 7, 2013, at 7:37 AM, Doug Meil <[EMAIL PROTECTED]> wrote:

>
> Hi there,
>
> The HBase RefGuide has a comprehensive case study on such a case.  This
> might not be the exact problem, but the diagnostic approach should help.
>
> http://hbase.apache.org/book.html#casestudies.slownode
>
>
>
>
>
> On 1/4/13 10:37 PM, "Liu, Raymond" <[EMAIL PROTECTED]> wrote:
>
>> Hi
>>
>> I encounter a weird lag behind map task issue here :
>>
>> I have a small hadoop/hbase cluster with 1 master node and 4 regionserver
>> node all have 16 CPU with map and reduce slot set to 24.
>>
>> A few table is created with regions distributed on each region node
>> evenly ( say 16 region for each region server). Also each region has
>> almost the same number of kvs with very similar size. All table had
>> major_compact done to ensure data locality
>>
>> I have a MR job which simply do local region scan in every map task ( so
>> 16 map task for each regionserver node).
>>
>> By theory, every map task should finish within similar time.
>>
>> But the real case is that some regions on the same region server always
>> lags behind a lot, say cost 150 ~250% of the other map tasks average
>> times.
>>
>> If this is happen to a single region server for every table, I might
>> doubt it is a disk issue or other reason that bring down the performance
>> of this region server.
>>
>> But the weird thing is that, though with each single table, almost all
>> the map task on the the same single regionserver is lag behind. But for
>> different table, this lag behind regionserver is different! And the
>> region and region size is distributed evenly which I double checked for a
>> lot of times. ( I even try to set replica to 4 to ensure every node have
>> a copy of local data)
>>
>> Say table 1, all map task on regionserver node 2 is slow. While for table
>> 2, maybe all map task on regionserver node 3 is slow, and with table 1,
>> it will always be regionserver node 2 which is slow regardless of cluster
>> restart, and the slowest map task will always be the very same one. And
>> it won't go away even I do major compact again.....
>>
>> So, anyone could give me some clue on what reason might possible lead to
>> this weird behavior? Any wild guess is welcome!
>>
>> (BTW. I don't encounter this issue a few days ago with the same table.
>> While I do restart cluster and do a few changes upon config file during
>> that period, But restore the config file don't help)
>>
>>
>> Best Regards,
>> Raymond Liu
>>
>>
>
>
>