Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Eternal RIT problem when RS tries to access wrong region-folder on HDFS


Copy link to this message
-
Re: Eternal RIT problem when RS tries to access wrong region-folder on HDFS
Dimitri Goldin 2013-05-03, 14:34
Hi Kevin,

On 05/03/2013 02:57 PM, Kevin O'dell wrote:
 > That is interesting.  I have seen this before, can you please send a
 > hadoop fs -lsr /hbase/documents?  This is going to be caused by a bad
 > split.  I will let you know what files you need to delete to safely
 > recover from this error.

Thanks for the reply. Earlier today I also determined that it has to
do with a failed region-split and already tried to solve it
on my own.

I found a total of three reference files in the folder and two hfiles.
Unfortunately documents contains more than 5k regions, so it seems a
little impractical to send the listing to the list. Please let me know
if you'd still like to see it and I will send it to you directly.

original contents of /hbase/documents/79c619508659018ff3ef0887611eb8f7/d*:
=0707b1ec4c6b41cf9174e0d2a1785fe9.5b9c16898a371de58f31f0bdf86b1f8b
47511faae81b4452afd3ca206e28346f.5b9c16898a371de58f31f0bdf86b1f8b
4f01ecd052ce464d81e79a62ea227d6b (116MB)
4f01ecd052ce464d81e79a62ea227d6b.5b9c16898a371de58f31f0bdf86b1f8b
eb7dbb09701d4353be24ca82481c4a7e (951MB)
=
* d is the only Columnfamily

Additionally, there was an 'almost empty' recovered.edits referencing
the old parent region and containing only a CACHEFLUSH.

As mentioned, '5b9c16898a371de58f31f0bdf86b1f8b' did not exist anymore
,.tmp was empty and .META. entry did not contain any splitA/splitB
columns, so I backed up the original region folder, removed the
reference files and kept 4f01ecd052ce464d81e79a62ea227d6b
and eb7dbb09701d4353be24ca82481c4a7e for now to get the table working
again.

I am still trying to locate log entries from the split, but haven't
found them yet.

Do you think this was an appropriate measure? Please let me know if
you had a different approach in mind and I'll see if I can use the
backed-up region. Also, any ideas under which circumstances this
might occur/is there a JIRA I can follow and maybe try to contribute
observations from logs?

Thanks a lot,
Dimitry
--
----------------------------------
Dimitry Goldin
Software Developer

Neofonie GmbH
Robert-Koch-Platz 4
10115 Berlin

T: +49 30 246 27

[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
http://www.neofonie.de

Handelsregister
Berlin-Charlottenburg: HRB 67460

Gesch�ftsf�hrung:
Thomas Kitlitschko