|
Chris Waterson
2012-12-09, 04:30
Kevin O'dell
2012-12-09, 23:00
Chris Waterson
2012-12-09, 23:29
Kevin O'dell
2012-12-10, 01:08
Tom Brown
2012-12-10, 18:07
Chris Waterson
2012-12-10, 23:03
Kyle McGovern
2012-12-12, 05:26
Kyle McGovern
2012-12-10, 03:09
lars hofhansl
2012-12-11, 05:10
|
-
hbase corruption - missing region files in HDFSChris Waterson 2012-12-09, 04:30
Hello! I've gotten myself into trouble where I'm missing files on HDFS that HBase thinks ought to be there. In particular, running "hbase hbck" yields the below message: two regions are "not deployed on any region server" (because there is no file in HDFS for the region), and "there is a hole in the region chain".
(FWIW, I suspect that this problem is due to a recent incident where we ran the cluster out of disk space.) I'm running 0.92.1, and have been staggering around trying to figure out what procedure I ought to use to correct the problem, but my Google-fu is too poor to have yielded results. Any pointers would be appreciated! thanks, chris ERROR: Region referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. not deployed on any region server. ERROR: Region referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8. not deployed on any region server. ERROR: There is a hole in the region chain between com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579 and com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951. You need to create a new regioninfo and region dir in hdfs to plug the hole. +
Chris Waterson 2012-12-09, 04:30
-
Re: hbase corruption - missing region files in HDFSKevin O'dell 2012-12-09, 23:00
can you run hbase hbck -fixMeta -fixAssignments
This should assign those region servers and fix the hole. On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <[EMAIL PROTECTED]> wrote: > Hello! I've gotten myself into trouble where I'm missing files on HDFS > that HBase thinks ought to be there. In particular, running "hbase hbck" > yields the below message: two regions are "not deployed on any region > server" (because there is no file in HDFS for the region), and "there is a > hole in the region chain". > > (FWIW, I suspect that this problem is due to a recent incident where we > ran the cluster out of disk space.) > > I'm running 0.92.1, and have been staggering around trying to figure out > what procedure I ought to use to correct the problem, but my Google-fu is > too poor to have yielded results. Any pointers would be appreciated! > > thanks, > chris > > > > > ERROR: Region > referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. > not deployed on any region server. > ERROR: Region > referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8. > not deployed on any region server. > ERROR: There is a hole in the region chain between > com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579 > and > com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951. > You need to create a new regioninfo and region dir in hdfs to plug the > hole. > > -- Kevin O'Dell Customer Operations Engineer, Cloudera +
Kevin O'dell 2012-12-09, 23:00
-
Re: hbase corruption - missing region files in HDFSChris Waterson 2012-12-09, 23:29
Well, I upgraded to 0.92.2, since the version I was running on (0.92.1) didn't have those options for "hbck".
That helped. It took me a while to realize that I had to make the root filesystem writable so that "hbck -repair" could create itself a directory. So, once that was done, it at least ran through to completion. But the problem persisted in that there were blocks in META that didn't exist on the filesystem. One poor region server was assigned the sad task of attempting to open the non-existent directory, which it slavishly reattempted again and again, filling its log with FileNotFoundException stack traces. For example, 2012-12-09 00:14:33,315 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. java.io.FileNotFoundException: File does not exist: /hbase/referrers/2cb553c74d52ddcbf31940f6c7128c63/main/33f1fd9efb944c4e982ba719cd7dde84 etc., etc. In particular, the directory above "/hbase/referrers/2cb553...c63" simply did not exist at all in HDFS. So I took matters into my own hands and created the missing "/hbase/referrers/2cb553...c63" directory, its subdirectory "main", and attempted to create a zero-length file "331fd9...e84". This changed the firehose of exceptions from FileNotFoundException to CorruptHFileException. So, I wrote a small program to emit a valid, empty HFile, and proceeded to place these files at whatever places in HDFS that a FileNotFoundException was being thrown. After creating three or four of them, the exceptions stopped. I then ran "hbck -repair" again, and upon completion it declared victory. Again, I suspect that I got myself into this problem because I ran a machine out of disk space. It's likely that most folks are more clever than me, and so this problem hasn't arisen before. :) On Dec 9, 2012, at 3:00 PM, "Kevin O'dell" <[EMAIL PROTECTED]> wrote: > can you run hbase hbck -fixMeta -fixAssignments > > This should assign those region servers and fix the hole. > > On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <[EMAIL PROTECTED]> wrote: > >> Hello! I've gotten myself into trouble where I'm missing files on HDFS >> that HBase thinks ought to be there. In particular, running "hbase hbck" >> yields the below message: two regions are "not deployed on any region >> server" (because there is no file in HDFS for the region), and "there is a >> hole in the region chain". >> >> (FWIW, I suspect that this problem is due to a recent incident where we >> ran the cluster out of disk space.) >> >> I'm running 0.92.1, and have been staggering around trying to figure out >> what procedure I ought to use to correct the problem, but my Google-fu is >> too poor to have yielded results. Any pointers would be appreciated! >> >> thanks, >> chris >> >> >> >> >> ERROR: Region >> referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. >> not deployed on any region server. >> ERROR: Region >> referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8. >> not deployed on any region server. >> ERROR: There is a hole in the region chain between >> com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579 >> and >> com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951. >> You need to create a new regioninfo and region dir in hdfs to plug the >> hole. >> >> > > > -- > Kevin O'Dell > Customer Operations Engineer, Cloudera +
Chris Waterson 2012-12-09, 23:29
-
Re: hbase corruption - missing region files in HDFSKevin O'dell 2012-12-10, 01:08
Chris,
Thank you for the very descriptive update. On Sun, Dec 9, 2012 at 6:29 PM, Chris Waterson <[EMAIL PROTECTED]> wrote: > Well, I upgraded to 0.92.2, since the version I was running on (0.92.1) > didn't have those options for "hbck". > > That helped. > > It took me a while to realize that I had to make the root filesystem > writable so that "hbck > -repair" could create itself a directory. So, once that was done, it at > least ran through to completion. > > But the problem persisted in that there were blocks in META that didn't > exist on the filesystem. One poor region server was assigned the sad task > of attempting to open the non-existent directory, which it slavishly > reattempted again and again, filling its log with FileNotFoundException > stack traces. > > For example, > > 2012-12-09 00:14:33,315 ERROR > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open > of > region=referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. > java.io.FileNotFoundException: File does not exist: > /hbase/referrers/2cb553c74d52ddcbf31940f6c7128c63/main/33f1fd9efb944c4e982ba719cd7dde84 > etc., etc. > > In particular, the directory above "/hbase/referrers/2cb553...c63" simply > did not exist at all in HDFS. > > So I took matters into my own hands and created the missing > "/hbase/referrers/2cb553...c63" directory, its subdirectory "main", and > attempted to create a zero-length file "331fd9...e84". This changed the > firehose of exceptions from FileNotFoundException to CorruptHFileException. > > So, I wrote a small program to emit a valid, empty HFile, and proceeded to > place these files at whatever places in HDFS that a FileNotFoundException > was being thrown. After creating three or four of them, the exceptions > stopped. > > I then ran "hbck -repair" again, and upon completion it declared victory. > > Again, I suspect that I got myself into this problem because I ran a > machine out of disk space. It's likely that most folks are more clever > than me, and so this problem hasn't arisen before. :) > > > > > On Dec 9, 2012, at 3:00 PM, "Kevin O'dell" <[EMAIL PROTECTED]> > wrote: > > > can you run hbase hbck -fixMeta -fixAssignments > > > > This should assign those region servers and fix the hole. > > > > On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <[EMAIL PROTECTED]> > wrote: > > > >> Hello! I've gotten myself into trouble where I'm missing files on HDFS > >> that HBase thinks ought to be there. In particular, running "hbase > hbck" > >> yields the below message: two regions are "not deployed on any region > >> server" (because there is no file in HDFS for the region), and "there > is a > >> hole in the region chain". > >> > >> (FWIW, I suspect that this problem is due to a recent incident where we > >> ran the cluster out of disk space.) > >> > >> I'm running 0.92.1, and have been staggering around trying to figure out > >> what procedure I ought to use to correct the problem, but my Google-fu > is > >> too poor to have yielded results. Any pointers would be appreciated! > >> > >> thanks, > >> chris > >> > >> > >> > >> > >> ERROR: Region > >> > referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. > >> not deployed on any region server. > >> ERROR: Region > >> > referrers,com.free-hdwallpapers.www/wallpapers/anime/mici/78285.jpg|com.free-hdwallpapers.www/wallpaper/anime/wolf-furry/90641,1354964606745.d2451e8db0f2b9546cc42c6d260a2ab8. > >> not deployed on any region server. > >> ERROR: There is a hole in the region chain between > >> > com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579 > >> and > >> > com.free-hdwallpapers.www/wallpapers/entertainment/mici/11840.jpg|com.free-hdwallpapers.www/wallpaper/entertainment/new-moon-bella-and-edward/12951. Kevin O'Dell Customer Operations Engineer, Cloudera +
Kevin O'dell 2012-12-10, 01:08
-
Re: hbase corruption - missing region files in HDFSTom Brown 2012-12-10, 18:07
Chris,
I really appreciate your detailed fix description! I've run into similar problems (due to old hardware and bad sectors) and could never figure out how to fix a broken table. Hbck always seemed to just make things worse until I would give up and recreate the table. Can you publish your utility that you used to create valid/empty HFiles? --Tom On Sun, Dec 9, 2012 at 6:08 PM, Kevin O'dell <[EMAIL PROTECTED]> wrote: > Chris, > > Thank you for the very descriptive update. > > On Sun, Dec 9, 2012 at 6:29 PM, Chris Waterson <[EMAIL PROTECTED]> wrote: > >> Well, I upgraded to 0.92.2, since the version I was running on (0.92.1) >> didn't have those options for "hbck". >> >> That helped. >> >> It took me a while to realize that I had to make the root filesystem >> writable so that "hbck >> -repair" could create itself a directory. So, once that was done, it at >> least ran through to completion. >> >> But the problem persisted in that there were blocks in META that didn't >> exist on the filesystem. One poor region server was assigned the sad task >> of attempting to open the non-existent directory, which it slavishly >> reattempted again and again, filling its log with FileNotFoundException >> stack traces. >> >> For example, >> >> 2012-12-09 00:14:33,315 ERROR >> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open >> of >> region=referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. >> java.io.FileNotFoundException: File does not exist: >> /hbase/referrers/2cb553c74d52ddcbf31940f6c7128c63/main/33f1fd9efb944c4e982ba719cd7dde84 >> etc., etc. >> >> In particular, the directory above "/hbase/referrers/2cb553...c63" simply >> did not exist at all in HDFS. >> >> So I took matters into my own hands and created the missing >> "/hbase/referrers/2cb553...c63" directory, its subdirectory "main", and >> attempted to create a zero-length file "331fd9...e84". This changed the >> firehose of exceptions from FileNotFoundException to CorruptHFileException. >> >> So, I wrote a small program to emit a valid, empty HFile, and proceeded to >> place these files at whatever places in HDFS that a FileNotFoundException >> was being thrown. After creating three or four of them, the exceptions >> stopped. >> >> I then ran "hbck -repair" again, and upon completion it declared victory. >> >> Again, I suspect that I got myself into this problem because I ran a >> machine out of disk space. It's likely that most folks are more clever >> than me, and so this problem hasn't arisen before. :) >> >> >> >> >> On Dec 9, 2012, at 3:00 PM, "Kevin O'dell" <[EMAIL PROTECTED]> >> wrote: >> >> > can you run hbase hbck -fixMeta -fixAssignments >> > >> > This should assign those region servers and fix the hole. >> > >> > On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <[EMAIL PROTECTED]> >> wrote: >> > >> >> Hello! I've gotten myself into trouble where I'm missing files on HDFS >> >> that HBase thinks ought to be there. In particular, running "hbase >> hbck" >> >> yields the below message: two regions are "not deployed on any region >> >> server" (because there is no file in HDFS for the region), and "there >> is a >> >> hole in the region chain". >> >> >> >> (FWIW, I suspect that this problem is due to a recent incident where we >> >> ran the cluster out of disk space.) >> >> >> >> I'm running 0.92.1, and have been staggering around trying to figure out >> >> what procedure I ought to use to correct the problem, but my Google-fu >> is >> >> too poor to have yielded results. Any pointers would be appreciated! >> >> >> >> thanks, >> >> chris >> >> >> >> >> >> >> >> >> >> ERROR: Region >> >> >> referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. >> >> not deployed on any region server. +
Tom Brown 2012-12-10, 18:07
-
Re: hbase corruption - missing region files in HDFSChris Waterson 2012-12-10, 23:03
You bet; see below. It's a Scala script, and will run as-is if you've got Scala installed. It should be easy to translate to Java, however.
chris #!/bin/sh exec scala -cp `hbase classpath` $0 $@ !# // Creates a file "/tmp/hfile.dat" that's an empty HFile. import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.fs.Path import org.apache.hadoop.hbase.io.hfile.HFile object HFileTool { def main(args:Array[String]) = { val conf = new Configuration val path = new Path("file:///tmp/hfile.dat") val writer = HFile.getWriterFactory(conf).createWriter(path.getFileSystem(conf), path) writer.close } } On Dec 10, 2012, at 10:07 AM, Tom Brown <[EMAIL PROTECTED]> wrote: > Chris, > > I really appreciate your detailed fix description! I've run into > similar problems (due to old hardware and bad sectors) and could never > figure out how to fix a broken table. Hbck always seemed to just make > things worse until I would give up and recreate the table. > > Can you publish your utility that you used to create valid/empty HFiles? > > --Tom > > On Sun, Dec 9, 2012 at 6:08 PM, Kevin O'dell <[EMAIL PROTECTED]> wrote: >> Chris, >> >> Thank you for the very descriptive update. >> >> On Sun, Dec 9, 2012 at 6:29 PM, Chris Waterson <[EMAIL PROTECTED]> wrote: >> >>> Well, I upgraded to 0.92.2, since the version I was running on (0.92.1) >>> didn't have those options for "hbck". >>> >>> That helped. >>> >>> It took me a while to realize that I had to make the root filesystem >>> writable so that "hbck >>> -repair" could create itself a directory. So, once that was done, it at >>> least ran through to completion. >>> >>> But the problem persisted in that there were blocks in META that didn't >>> exist on the filesystem. One poor region server was assigned the sad task >>> of attempting to open the non-existent directory, which it slavishly >>> reattempted again and again, filling its log with FileNotFoundException >>> stack traces. >>> >>> For example, >>> >>> 2012-12-09 00:14:33,315 ERROR >>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open >>> of >>> region=referrers,com.free-hdwallpapers.www/wallpapers/animals/mici/595718.jpg|com.free-hdwallpapers.www/wallpaper/animals/husky/270579,1354964606745.0c54fe59c58ddd6b34042ec98171bff7. >>> java.io.FileNotFoundException: File does not exist: >>> /hbase/referrers/2cb553c74d52ddcbf31940f6c7128c63/main/33f1fd9efb944c4e982ba719cd7dde84 >>> etc., etc. >>> >>> In particular, the directory above "/hbase/referrers/2cb553...c63" simply >>> did not exist at all in HDFS. >>> >>> So I took matters into my own hands and created the missing >>> "/hbase/referrers/2cb553...c63" directory, its subdirectory "main", and >>> attempted to create a zero-length file "331fd9...e84". This changed the >>> firehose of exceptions from FileNotFoundException to CorruptHFileException. >>> >>> So, I wrote a small program to emit a valid, empty HFile, and proceeded to >>> place these files at whatever places in HDFS that a FileNotFoundException >>> was being thrown. After creating three or four of them, the exceptions >>> stopped. >>> >>> I then ran "hbck -repair" again, and upon completion it declared victory. >>> >>> Again, I suspect that I got myself into this problem because I ran a >>> machine out of disk space. It's likely that most folks are more clever >>> than me, and so this problem hasn't arisen before. :) >>> >>> >>> >>> >>> On Dec 9, 2012, at 3:00 PM, "Kevin O'dell" <[EMAIL PROTECTED]> >>> wrote: >>> >>>> can you run hbase hbck -fixMeta -fixAssignments >>>> >>>> This should assign those region servers and fix the hole. >>>> >>>> On Sat, Dec 8, 2012 at 11:30 PM, Chris Waterson <[EMAIL PROTECTED]> >>> wrote: >>>> >>>>> Hello! I've gotten myself into trouble where I'm missing files on HDFS >>>>> that HBase thinks ought to be there. In particular, running "hbase +
Chris Waterson 2012-12-10, 23:03
-
Re: hbase corruption - missing region files in HDFSKyle McGovern 2012-12-12, 05:26
Logged https://issues.apache.org/jira/browse/HBASE-7335
We are running CDH4 which runs CDH 0.92.1 We have a copy of the table in a inconsistent state if there is any other output needed not on the JIRA. On 12/10/12 11:10 PM, lars hofhansl wrote: This sounds like a bug. Which version of HBase is this. Could you file a bug? Thanks. -- Lars ________________________________ From: Kyle McGovern <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Sunday, December 9, 2012 7:09 PM Subject: Re: hbase corruption - missing region files in HDFS We recently had a very similar issue on a couple of our clusters. What ended up happening was a split failed and there was a leftover file in the region telling it where the new split region was located. The destination region folder/file did not exist so our region server would try endlessly to read a file that didn't exist. The end result was exhaustion of open file descriptors for the region server due to the number of connections it was making. Our fix was to remove the bad "split file" and assign the region again. 15:38:21 # hdfs dfs -ls -R /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a drwxr-xr-x - root hadoop 0 2012-12-07 13:21*/hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/*.oldlogs -rw-r--r-- 3 root hadoop 124 2012-12-07 13:21*/hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/*.oldlogs/hlog.1354760917669 -rw-r--r-- 3 root hadoop 352 2012-12-07 13:27*/hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/*.regioninfo drwxr-xr-x - root hadoop 0 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW -rw-r--r-- 3 root hadoop 554522 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/195cc6d2cc384b39bd5ad30e95385bd8 -rw-r--r-- 3 root hadoop 4558378 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/1c42fa9bc26a4550a439f4bd31bb08b0 -rw-r--r-- 3 root hadoop 3498028 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/28a356081046422b8c057bc20c0ae658 -rw-r--r-- 3 root hadoop 1948108 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/3353dc2d99184fe4b9d73f39503dfbc7 -rw-r--r-- 3 root hadoop 4390731 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/4ce59f31c1b74db5804953fa7967f791 -rw-r--r-- 3 root hadoop 3116921421 2012-12-07 12:22 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5313858989b24752ae31322333de02e0 -rw-r--r-- 3 root hadoop 5395692 2012-12-07 12:22 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/54c11a7e4f9d4ebfafaf2b93d3c9e954 -rw-r--r-- 3 root hadoop 5981971640 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8 -rw-r--r-- 3 root hadoop 23 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8.7d4f7401d2fe7a813778248970b03515 -rw-r--r-- 3 root hadoop 2251800 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/673b36462014480cb7d91088412b85a7 -rw-r--r-- 3 root hadoop 408794 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/73261dd86f634f2086ec745642425d7c -rw-r--r-- 3 root hadoop 2676245 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/769728d25b5b4e78be6b36f9716a82c4 -rw-r--r-- 3 root hadoop 1262744 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/81f414cb3fe449f6a80310dd38ea467f -rw-r--r-- 3 root hadoop 940502 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/8f818b3c45344ad68c0b4afc7fe20bbb -rw-r--r-- 3 root hadoop 3492843 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/ae7cb412e5da4a908b0f2ea4d5cd5c76 -rw-r--r-- 3 root hadoop 2894474 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/b6ee14a0a75341d0aa58187fb6159a41 -rw-r--r-- 3 root hadoop 14257782 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/bd4fff3291d647eb9cc533d66f9685a3 -rw-r--r-- 3 root hadoop 4880699 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c4d3f1c8511743579588162616beeea1 -rw-r--r-- 3 root hadoop 35238595 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1 -rw-r--r-- 3 root hadoop 23 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1.7d4f7401d2fe7a813778248970b03515 -rw-r--r-- 3 root hadoop 3181138002 2012-12-07 12:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/cad9f4cc0ef54a7896a3a47253250e71 -rw-r--r-- 3 root hadoop 1747856 2012-12-07 12:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/cca2ad1698984a73abd9c58c78945be0 -rw-r--r-- 3 root hadoop 6264897732 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/d876f1f4734e4778b2efa527ef1ef3ee -rw-r--r-- 3 root hadoop 463704 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/f2efc4a6ec054a62a44f664cc0b01c0a -rw-r--r-- 3 root hadoop 686868 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/f34384ae8c1d4e16afb79cb41bf6cf74 -rw-r--r-- 3 root hadoop 838234 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/fc1dc425cf324beaa283ef82fdc073e3 For example, if I remove /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1.7d4f7401d2fe7a813778248970b03515 and /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8.7d4f7401d2fe7a813778248970b03515 the region will successfully assign and hbck does not show errors for this region anymore. The contents of the file appear to just be a split key. +
Kyle McGovern 2012-12-12, 05:26
-
Re: hbase corruption - missing region files in HDFSKyle McGovern 2012-12-10, 03:09
We recently had a very similar issue on a couple of our clusters. What
ended up happening was a split failed and there was a leftover file in the region telling it where the new split region was located. The destination region folder/file did not exist so our region server would try endlessly to read a file that didn't exist. The end result was exhaustion of open file descriptors for the region server due to the number of connections it was making. Our fix was to remove the bad "split file" and assign the region again. 15:38:21 # hdfs dfs -ls -R /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a drwxr-xr-x - root hadoop 0 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.oldlogs -rw-r--r-- 3 root hadoop 124 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.oldlogs/hlog.1354760917669 -rw-r--r-- 3 root hadoop 352 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.regioninfo drwxr-xr-x - root hadoop 0 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW -rw-r--r-- 3 root hadoop 554522 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/195cc6d2cc384b39bd5ad30e95385bd8 -rw-r--r-- 3 root hadoop 4558378 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/1c42fa9bc26a4550a439f4bd31bb08b0 -rw-r--r-- 3 root hadoop 3498028 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/28a356081046422b8c057bc20c0ae658 -rw-r--r-- 3 root hadoop 1948108 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/3353dc2d99184fe4b9d73f39503dfbc7 -rw-r--r-- 3 root hadoop 4390731 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/4ce59f31c1b74db5804953fa7967f791 -rw-r--r-- 3 root hadoop 3116921421 2012-12-07 12:22 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5313858989b24752ae31322333de02e0 -rw-r--r-- 3 root hadoop 5395692 2012-12-07 12:22 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/54c11a7e4f9d4ebfafaf2b93d3c9e954 -rw-r--r-- 3 root hadoop 5981971640 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8 -rw-r--r-- 3 root hadoop 23 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8.7d4f7401d2fe7a813778248970b03515 -rw-r--r-- 3 root hadoop 2251800 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/673b36462014480cb7d91088412b85a7 -rw-r--r-- 3 root hadoop 408794 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/73261dd86f634f2086ec745642425d7c -rw-r--r-- 3 root hadoop 2676245 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/769728d25b5b4e78be6b36f9716a82c4 -rw-r--r-- 3 root hadoop 1262744 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/81f414cb3fe449f6a80310dd38ea467f -rw-r--r-- 3 root hadoop 940502 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/8f818b3c45344ad68c0b4afc7fe20bbb -rw-r--r-- 3 root hadoop 3492843 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/ae7cb412e5da4a908b0f2ea4d5cd5c76 -rw-r--r-- 3 root hadoop 2894474 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/b6ee14a0a75341d0aa58187fb6159a41 -rw-r--r-- 3 root hadoop 14257782 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/bd4fff3291d647eb9cc533d66f9685a3 -rw-r--r-- 3 root hadoop 4880699 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c4d3f1c8511743579588162616beeea1 -rw-r--r-- 3 root hadoop 35238595 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1 -rw-r--r-- 3 root hadoop 23 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1.7d4f7401d2fe7a813778248970b03515 -rw-r--r-- 3 root hadoop 3181138002 2012-12-07 12:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/cad9f4cc0ef54a7896a3a47253250e71 -rw-r--r-- 3 root hadoop 1747856 2012-12-07 12:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/cca2ad1698984a73abd9c58c78945be0 -rw-r--r-- 3 root hadoop 6264897732 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/d876f1f4734e4778b2efa527ef1ef3ee -rw-r--r-- 3 root hadoop 463704 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/f2efc4a6ec054a62a44f664cc0b01c0a -rw-r--r-- 3 root hadoop 686868 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/f34384ae8c1d4e16afb79cb41bf6cf74 -rw-r--r-- 3 root hadoop 838234 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/fc1dc425cf324beaa283ef82fdc073e3 For example, if I remove /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1.7d4f7401d2fe7a813778248970b03515 and /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8.7d4f7401d2fe7a813778248970b03515 the region will successfully assign and hbck does not show errors for this region anymore. The contents of the file appear to just be a split key. +
Kyle McGovern 2012-12-10, 03:09
-
Re: hbase corruption - missing region files in HDFSlars hofhansl 2012-12-11, 05:10
This sounds like a bug. Which version of HBase is this.
Could you file a bug? Thanks. -- Lars ________________________________ From: Kyle McGovern <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Sunday, December 9, 2012 7:09 PM Subject: Re: hbase corruption - missing region files in HDFS We recently had a very similar issue on a couple of our clusters. What ended up happening was a split failed and there was a leftover file in the region telling it where the new split region was located. The destination region folder/file did not exist so our region server would try endlessly to read a file that didn't exist. The end result was exhaustion of open file descriptors for the region server due to the number of connections it was making. Our fix was to remove the bad "split file" and assign the region again. 15:38:21 # hdfs dfs -ls -R /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a drwxr-xr-x - root hadoop 0 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.oldlogs -rw-r--r-- 3 root hadoop 124 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.oldlogs/hlog.1354760917669 -rw-r--r-- 3 root hadoop 352 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.regioninfo drwxr-xr-x - root hadoop 0 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW -rw-r--r-- 3 root hadoop 554522 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/195cc6d2cc384b39bd5ad30e95385bd8 -rw-r--r-- 3 root hadoop 4558378 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/1c42fa9bc26a4550a439f4bd31bb08b0 -rw-r--r-- 3 root hadoop 3498028 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/28a356081046422b8c057bc20c0ae658 -rw-r--r-- 3 root hadoop 1948108 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/3353dc2d99184fe4b9d73f39503dfbc7 -rw-r--r-- 3 root hadoop 4390731 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/4ce59f31c1b74db5804953fa7967f791 -rw-r--r-- 3 root hadoop 3116921421 2012-12-07 12:22 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5313858989b24752ae31322333de02e0 -rw-r--r-- 3 root hadoop 5395692 2012-12-07 12:22 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/54c11a7e4f9d4ebfafaf2b93d3c9e954 -rw-r--r-- 3 root hadoop 5981971640 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8 -rw-r--r-- 3 root hadoop 23 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8.7d4f7401d2fe7a813778248970b03515 -rw-r--r-- 3 root hadoop 2251800 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/673b36462014480cb7d91088412b85a7 -rw-r--r-- 3 root hadoop 408794 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/73261dd86f634f2086ec745642425d7c -rw-r--r-- 3 root hadoop 2676245 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/769728d25b5b4e78be6b36f9716a82c4 -rw-r--r-- 3 root hadoop 1262744 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/81f414cb3fe449f6a80310dd38ea467f -rw-r--r-- 3 root hadoop 940502 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/8f818b3c45344ad68c0b4afc7fe20bbb -rw-r--r-- 3 root hadoop 3492843 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/ae7cb412e5da4a908b0f2ea4d5cd5c76 -rw-r--r-- 3 root hadoop 2894474 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/b6ee14a0a75341d0aa58187fb6159a41 -rw-r--r-- 3 root hadoop 14257782 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/bd4fff3291d647eb9cc533d66f9685a3 -rw-r--r-- 3 root hadoop 4880699 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c4d3f1c8511743579588162616beeea1 -rw-r--r-- 3 root hadoop 35238595 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1 -rw-r--r-- 3 root hadoop 23 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1.7d4f7401d2fe7a813778248970b03515 -rw-r--r-- 3 root hadoop 3181138002 2012-12-07 12:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/cad9f4cc0ef54a7896a3a47253250e71 -rw-r--r-- 3 root hadoop 1747856 2012-12-07 12:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/cca2ad1698984a73abd9c58c78945be0 -rw-r--r-- 3 root hadoop 6264897732 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/d876f1f4734e4778b2efa527ef1ef3ee -rw-r--r-- 3 root hadoop 463704 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/f2efc4a6ec054a62a44f664cc0b01c0a -rw-r--r-- 3 root hadoop 686868 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/f34384ae8c1d4e16afb79cb41bf6cf74 -rw-r--r-- 3 root hadoop 838234 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/fc1dc425cf324beaa283ef82fdc073e3 For example, if I remove /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1.7d4f7401d2fe7a813778248970b03515 and /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8.7d4f7401d2fe7a813778248970b03515 the region will successfully assign and hbck does not show errors for this region anymore. The contents of the file appear to just be a split key. +
lars hofhansl 2012-12-11, 05:10
|