Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> NotServingRegionException during memstore flush


Copy link to this message
-
Re: NotServingRegionException during memstore flush
By tailing the HMaster logs while our attempted flush was being run, I
noticed that the region it fails to get is actually referenced as being
deleted shortly after a split is received:

12/04/07 02:44:39 INFO master.ServerManager: Received REGION_SPLIT:
> visitor-summaries-a2,\x00\x00\x9E\xDB\x00\x00\x00\x00\x00\x01!\xD0\x00\x00\x01&\x15\x08\xE7\x10\x00\x0Dp\xD0,1332659099765.ba5a0a3b1e6b530b04ed826027af8561.:
> Daughters;
> visitor-summaries-a2,\x00\x00\x9E\xDB\x00\x00\x00\x00\x00\x01!\xD0\x00\x00\x01&\x15\x08\xE7\x10\x00\x0Dp\xD0,1333781077583.a36a803260dfe2bb328299c0004bbcc6.,
> visitor-summaries-a2,\x00\x00\x9E\xDB\x00\x00\x00\x00\x00\x06C\xD7\x00\x00\x013\x18&\xAA8\x00\x04\x11\x04,1333781077583.75bbaadb472087d0cfc056b2d9fc894c.
> from ip-10-6-110-74.ec2.internal,60020,1326728928266
> 12/04/07 02:49:21 INFO catalog.MetaEditor: Deleted daughter reference
> visitor-summaries-a2,\x00\x00\x9E\xDB\x00\x00\x00\x00\x00\x01!\xD0\x00\x00\x01&\x15\x08\xE7\x10\x00\x0Dp\xD0,1333781077583.a36a803260dfe2bb328299c0004bbcc6.,
> qualifier=splitA, from parent
> visitor-summaries-a2,\x00\x00\x9E\xDB\x00\x00\x00\x00\x00\x01!\xD0\x00\x00\x01&\x15\x08\xE7\x10\x00\x0Dp\xD0,1332659099765.ba5a0a3b1e6b530b04ed826027af8561.
> 12/04/07 02:49:21 INFO catalog.MetaEditor: Deleted daughter reference
> visitor-summaries-a2,\x00\x00\x9E\xDB\x00\x00\x00\x00\x00\x06C\xD7\x00\x00\x013\x18&\xAA8\x00\x04\x11\x04,1333781077583.75bbaadb472087d0cfc056b2d9fc894c.,
> qualifier=splitB, from parent
> visitor-summaries-a2,\x00\x00\x9E\xDB\x00\x00\x00\x00\x00\x01!\xD0\x00\x00\x01&\x15\x08\xE7\x10\x00\x0Dp\xD0,1332659099765.ba5a0a3b1e6b530b04ed826027af8561.
> 12/04/07 02:49:21 INFO catalog.MetaEditor: Deleted region
> visitor-summaries-a2,\x00\x00\x9E\xDB\x00\x00\x00\x00\x00\x01!\xD0\x00\x00\x01&\x15\x08\xE7\x10\x00\x0Dp\xD0,1332659099765.ba5a0a3b1e6b530b04ed826027af8561.
> from META
> 12/04/07 02:49:21 INFO master.CatalogJanitor: Scanned 19036 catalog row(s)
> and gc'd 1 unreferenced parent region(s)
A couple questions:

1) I'm guessing I can make this better by upping the split size
( hbase.hregion.max.filesize ) and doing the splits myself after the
flushes are complete.  But this looks to be a cluster-wide setting.  Is
there a way to set this only on certain tables?  I don't want to have to
manage splitting for all tables, especially since other teams have tables
within this same cluster.

2) For some reason these flush calls are taking a very long time.  I
thought they were supposed to be asynchronous? It took 45 minutes for this
flush to eventually fail, during which time the HMaster was mostly quiet
(until near the end when the region was deleted).  As it was going I
noticed in the HMaster UI that the flushes were happening.. so it appears
to be waiting for the flushes to finish.

I am on cd3u2 (0.90.4)

On Sat, Apr 7, 2012 at 1:18 AM, Bryan Beaudreault
<[EMAIL PROTECTED]>wrote:

> Hello all,
>
> I have a job that does heavy writing into HBase.  The most recent run was
> 94 million records, each being put to two tables: one table stores a
> KeyValue per record, while the other table batches them up into bundles of
> up to a few thousand per bundle.  This latest run took about 25 minutes to
> run.
>
> We are currently in a phase of development where we need to do these
> migrations often, and we noticed that enabling the WAL slows the job down
> about 6-8x.  In the interest of speed, we have disabled the WAL and added
> the following safeguards:
>
> 1) At the beginning of the job we check for any dead servers.  At the end
> of the job we check again, and compare.  If there is a new dead server, we
> retry the job (the jobs are idempotent/reentrant).
>
> 2) At the end of the job, if no servers were lost, we force a memstore
> flush on the tables that were saved to, using HBaseAdmin.flush(String
> tableName).  We then poll the HServerLoad.RegionLoad for all regions of the
> tables we flushed, checking the memStoreSizeMB and waiting until it reaches
> 0 (obviously with a time limit, which causes the job to fail).
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB