Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error


Copy link to this message
-
Re: hbase-0.89/trunk: org.apache.hadoop.fs.ChecksumException: Checksum error
Andrey Stepachev 2010-09-22, 09:29
One more note. This database was 0.20.6 before. Then
i start 0.89 over it.
(but table with wrong checksum was created in 0.89 hbase)

2010/9/22 Andrey Stepachev <[EMAIL PROTECTED]>:
> 2010/9/22 Ryan Rawson <[EMAIL PROTECTED]>:
>> why are you using such expensive disks?  raid + hdfs = lower
>> performance than non-raid.
>
> It was database server, before we migrate to hbase. It was designed
> for postgresql. Now with compression and hbase nature our database
> is 12Gb instead of 180GB in pg.
> So this server was not designed for hbase.
> In production (0.20.6) we much lighter servers (3) with simle dual
> sata drives.
>
>>
>> how's your ram?  hows your network switches?  NICs?  etc etc.
>> anything along the data path can introduce errors.
>
> no. all things on one machined. 17Gb ram (5GB hbase)
>
>>
>> in this case we did the right thing and threw exceptions, but looks
>> like your client continues to call next() despite getting
>> exceptions... can you check your client code to verify this?
>
> hm. i check. but i use only simple wrapper around ResultScanner
> http://pastebin.org/1074628. It should bail out on exception (except
> ScannerTimeoutException)
>
>>
>> On Wed, Sep 22, 2010 at 2:14 AM, Andrey Stepachev <[EMAIL PROTECTED]> wrote:
>>> hp proliant raid 10 with 4 sas. 15k. smartarray 6i. 2cpu/4core.
>>>
>>> 2010/9/22 Ryan Rawson <[EMAIL PROTECTED]>:
>>>> generally checksum errors are due to hardware faults of one kind or another.
>>>>
>>>> what is your hardware like?
>>>>
>>>> On Wed, Sep 22, 2010 at 2:08 AM, Andrey Stepachev <[EMAIL PROTECTED]> wrote:
>>>>> But why it is bad? Split/compaction? I make my own RetryResultIterator
>>>>> which reopen scanner on timeout. But what is best way to reopen scanner.
>>>>> Can you point me where i can find all this exceptions? Or may be
>>>>> here already some sort for recoveratble iterator?
>>>>>
>>>>> 2010/9/22 Ryan Rawson <[EMAIL PROTECTED]>:
>>>>>> ah ok i think i get it... basically at this point your scanner is bad
>>>>>> and iterating on it again won't work.  the scanner should probably
>>>>>> self close itself so you get tons of additional exceptions but instead
>>>>>> we dont.
>>>>>>
>>>>>> there is probably a better fix for this, i'll ponder
>>>>>>
>>>>>> On Wed, Sep 22, 2010 at 1:57 AM, Ryan Rawson <[EMAIL PROTECTED]> wrote:
>>>>>>> very strange... looks like a bad block ended up in your scanner and
>>>>>>> subsequent nexts were failing due to that short read.
>>>>>>>
>>>>>>> did you have to kill the regionserver or did things recover and
>>>>>>> continue normally?
>>>>>>>
>>>>>>> -ryan
>>>>>>>
>>>>>>> On Wed, Sep 22, 2010 at 1:37 AM, Andrey Stepachev <[EMAIL PROTECTED]> wrote:
>>>>>>>> Hi All.
>>>>>>>>
>>>>>>>> I get org.apache.hadoop.fs.ChecksumException for a table on heavy
>>>>>>>> write in standalone mode.
>>>>>>>> table tmp.bsn.main created 2010-09-22 10:42:28,860 and then 5 threads
>>>>>>>> writes data to it.
>>>>>>>> At some moment exception thrown.
>>>>>>>>
>>>>>>>> Andrey.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>