Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Lost regions question


+
Brennon Church 2013-04-12, 05:50
+
ramkrishna vasudevan 2013-04-12, 05:58
+
Ted Yu 2013-04-12, 06:05
+
Brennon Church 2013-04-12, 16:56
Copy link to this message
-
Re: Lost regions question
Leonid Fedotov 2013-04-15, 18:00
Try to run "habase hbck -fix"
It should do the job.

Thank you!

Sincerely,
Leonid Fedotov

On Apr 12, 2013, at 9:56 AM, Brennon Church wrote:

> hbck does show the hdfs files there without associated regions.  I probably could have recovered had I noticed just after this happened, but given that we've been running like this for over a week, and that there is the potential for collisions between the missing and new data, I'm probably just going to manually reinsert it all using the hdfs files.
>
> Hadoop version is 1.0.1, btw.
>
> Thanks.
>
> --Brennon
>
> On 4/11/13 11:05 PM, Ted Yu wrote:
>> Brennon:
>> Have you run hbck to diagnose the problem ?
>>
>> Since the issue might have involved hdfs, browsing DataNode log(s) may
>> provide some clue as well.
>>
>> What hadoop version are you using ?
>>
>> Cheers
>>
>> On Thu, Apr 11, 2013 at 10:58 PM, ramkrishna vasudevan <
>> [EMAIL PROTECTED]> wrote:
>>
>>> When you say that the parent regions got reopened does that mean that you
>>> did not lose any data(any data could not be read).  The reason am asking is
>>> if after the parent got split into daughters and the data was written to
>>> daughters and if the daughters related files could not be opened you could
>>> have ended up in not able to read the data.
>>>
>>> Some logs could tell us what made the parent to get reopened rather than
>>> daughters.  Another thing i would like to ask is was the cluster brought
>>> down abruptly by killing the RS.
>>>
>>> Which version of HBase?
>>>
>>> Regards
>>> Ram
>>>
>>>
>>>
>>>
>>> On Fri, Apr 12, 2013 at 11:20 AM, Brennon Church <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I had an interesting problem come up recently.  We have a few thousand
>>>> regions across 8 datanode/regionservers.  I made a change, increasing the
>>>> heap size for hadoop from 128M to 2048M which ended up bringing the
>>> cluster
>>>> to a complete halt after about 1 hour.  I reverted back to 128M and
>>> turned
>>>> things back on again but didn't realize at the time that I came up with 9
>>>> fewer regions than I started.  Upon further investigation, I found that
>>> all
>>>> 9 missing regions were from splits that occurred while the cluster was
>>>> running after making the heap change and before it came to a halt.  There
>>>> was a 10th regions (5 splits involved in total) that managed to get
>>>> recovered.  The really odd thing is that in the case of the other 9
>>>> regions, the original parent regions, which as far as I can tell in the
>>>> logs were deleted, were re-opened upon restarting things once again.  The
>>>> daughter regions were gone.  Interestingly, I found the orphaned
>>> datablocks
>>>> still intact, and in at least some cases have been able to extract the
>>> data
>>>> from them and will hopefully re-add it to the tables.
>>>>
>>>> My question is this.  Does anyone know based on the rather muddled
>>>> description I've given above, what could have possibly happened here?  My
>>>> best guess is that the bad state that hdfs was in caused some critical
>>>> component of the split process to be missed, which resulted a reference
>>> to
>>>> the parent regions sticking around and losing the references to the
>>>> daughter regions.
>>>>
>>>> Thanks for any insight you can provide.
>>>>
>>>> --Brennon
>>>>
>>>>
>>>>
>>>>
>
>

+
Brennon Church 2013-04-12, 16:04
+
ramkrishna vasudevan 2013-04-12, 16:27
+
Ted Yu 2013-04-12, 16:34