Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> probable there is a bug in HLog Implementation


Copy link to this message
-
probable there is a bug in HLog Implementation
Hi:All
     after reading source code of HLog, i'm wandering wheather it's a bug.
     for example, only one region is active. max log size is a fraction of
region size.
     flush begins, region A acquire a sequecne number,say, N.
     insert operation can continue while we flush the cache.
     flush opeartion complete, delete region A's entry in lastSeqWritten(Map
of regions to most recet sequence/edit id in their memstore)
     when flush compelte, current sequence number maybe N+5, five log
messages added to the log for region A during the flush operation .
     region A going on to accept update, insert a new entry into
lastSeqWritten for region A, but in current HLog implementation the value
is  N+6 .
     But i tink the value corresponding to Region A in lastSeqWritten should
be N,not N+6.
     N+6 means all edits whose sequence number smaller than N+6  in Region A
is already persisent on disk, but it's not the fact.
     edits N+1,N+2,N+3,N+4,N+5, the new five edit are maybe in memstore of
Region A.
     So, the value should be N, the sequence number when flush begins and
flush completes.
     the above procedure leave a change of data loss.
     though in current implementation the chance of data loss is rare.
     So,i think it's a bug.
     the fix is easy, when flush complete, just set the value for Region A
in lastSeqWritten to N instead of removing the entry .
     if you want a data loss scenario, i can you give you one.

     if i miss something , Pls let me known.

--
Best Regards
Anty Rao
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB