Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: CDH4.4 and HBASE-8912 issue


Copy link to this message
-
Re: CDH4.4 and HBASE-8912 issue
I can't see anything wrong in your logs, but fact that you trigger this
issue by running balancer makes me think that some of your RS may have some
problem. Here is what would i do in this situation:

1. Make sure that system time, OS configuration, hadoop/HBase configuration
is synced on all servers
2. I would try to isolate issue (first start HMaster and then add
regionservers one by one in order to determine if some of regionservers
cause this issue)
3. Check what hadoop says about HBase data (hadoop fsck /hbase  -files
-locations -blocks)
4. Try to determine if some of your regions have some issues (hbase hbck
-details)

Good luck :)
On Mon, Oct 21, 2013 at 11:24 AM, Boris Emelyanov <[EMAIL PROTECTED]>wrote:

>  On 21.10.2013 12:17, Samir Ahmic wrote:
>
> Hi, Boris
>
>  Did you check RS logs ? There should be exception regarding why
> assignment failed. Can you past that exception ?
>
>  Cheers :)
>
>
> On Mon, Oct 21, 2013 at 9:53 AM, Boris Emelyanov <[EMAIL PROTECTED]>wrote:
>
>>
>> >Boris, what does hbck say?
>> >
>> >We have had this issue a couple times before. To fix it I had to stop the cluster, run offline meta repair tool,
>> >delete zk-store on each zk quorum node
>> >Offline Meta repair tool will not work if there are  inconsistencies  in HBase - you better try hbase hbck
>> >-fixAll first.
>> >
>> >Best regards,
>> >Vladimir Rodionov
>> >Principal Platform Engineer
>> >Carrier IQ, www.carrieriq.com
>>
>> >e-mail: vrodionov@... <http://gmane.org/get-address.php?address=vrodionov%2dSvj7bELwklqcm8Fc2pXOzQ%40public.gmane.org>
>>
>> Hbck says "0 inconsistencies detected".
>> I stopped hbase cluster, deleted zk-database on all quorum nodes, ran "hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair",
>> and got "INFO util.HBaseFsck: Success! .META. table rebuilt.".
>> After that, cluster continued crashing during auto-loadbalancing.
>>
>>
>>
>>  --
>> Best regards,
>>
>> Boris Emelyanov.
>>
>>
>  Hi, Samir! Thank you for your answers!
>
> Actually, as I could understand, the assignment did not fail.
> Here are my logs (time may be slightly out of sync):
>
> on master:
>
> 2013-10-21 12:27:51,541 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan
> for
> mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
>
> destination server is testhadoop-102.example.com,60020,1382339032897
> 2013-10-21 12:27:51,541 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan
> for region
> mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d
> 74f05b.;
> plan=hri=mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.,
> src=, dest=testhadoop-102.example.com,60020,1382339032897
> 2013-10-21 12:27:51,541 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
> to testhadoop-102.example.com,60020,1382339032897
> 2013-10-21 12:27:51,576 FATAL org.apache.hadoop.hbase.master.HMaster:
> Master server abort: loaded coprocessors are: []
> 2013-10-21 12:27:51,577 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unexpected state :
> mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
> state=PENDING_OPEN, ts=1382344071576, server=testhadoop-102.example.com,60020,1382339032897
> .. Cannot transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state :
> mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b.
> state=PENDING_OPEN, ts=1382344071576, server=testhadoop-102.example.com,60020,1382339032897
> .. Cannot transit it to OFFLINE.
>
> on affected regionserver:
>
> 2013-10-21 12:27:52,561 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region:
> mytable,fd27d27d27d27d27d27d27d27d27d27d27d27d18,1380545986996.45e518b477eeac50872de5a73d74f05b..