Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - [HBase 0.92.1] Too many stores files to compact, compaction moving slowly


Copy link to this message
-
RE: [HBase 0.92.1] Too many stores files to compact, compaction moving slowly
Ramkrishna.S.Vasudevan 2012-05-14, 05:48
It was a testscenario so currently we have not thought of any work around
Coming to the question of store files growing,
I currently don't remember the size of the store files that got created that
time. I will check that and get back to you.  But it's a worthy one to look
at as you say individual files growing in size is not normal.

Regards
Ram
> -----Original Message-----
> From: Shrijeet Paliwal [mailto:[EMAIL PROTECTED]]
> Sent: Monday, May 14, 2012 10:50 AM
> To: [EMAIL PROTECTED]
> Subject: Re: [HBase 0.92.1] Too many stores files to compact,
> compaction moving slowly
>
> Hello Ram,
> https://issues.apache.org/jira/browse/HBASE-5161 does sound like it.
> First
> it was a heavy write scenario, second the region server was low on
> memory.
> In your case of 400GB region. What was the size of individual store
> files?
> Were they big as well?  While I can see why number of store files will
> grow, I am not able to understand why size of individual store files
> keep
> growing.
>
> Lastly what did you do with your 400 GB region? Any work around ?
>
> -Shrijeet
>
> On Sun, May 13, 2012 at 9:29 PM, Ramkrishna.S.Vasudevan <
> [EMAIL PROTECTED]> wrote:
> >
> > Hi Shrijeet
> > Regarding your last question about the region growing bigger
> > The following points could be one reason
> >
> > When you said your compactions are slower and also you were trying to
> split
> > some very big store files, every split would have created some set of
> > reference files.
> > By the time as more writes are happening more store files are
> flushed.
> >
> > In the compaction algo, whenever reference files are found those
> files
> will
> > be tried to compact. But what happens is the though there are
> reference
> > files we try to  take the latest files to compact and the reference
> files
> > keeps losing the race in getting compacted i.e they are priority is
> going
> > down.
> >
> > Pls refer to HBASE-5161.  It could be your case. In our case the
> region
> > infact went upto 400GB, but it was a heavy write scenario.
> >
> > Regards
> > Ram
> >
> >
> > > -----Original Message-----
> > > From: Shrijeet Paliwal [mailto:[EMAIL PROTECTED]]
> > > Sent: Monday, May 14, 2012 4:43 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: [HBase 0.92.1] Too many stores files to compact,
> compaction
> > > moving slowly
> > >
> > > Hi,
> > >
> > > HBase version : 0.92.1
> > > Hadoop version: 0.20.2-cdh3u0
> > >
> > > Relavant configurations:
> > > * hbase.regionserver.fileSplitTimeout : 300000
> > > * hbase.hstore.compactionThreshold : 3
> > > * hbase.hregion.max.filesize : 2147483648
> > > * hbase.hstore.compaction.max : 10
> > > * hbase.hregion.majorcompaction: 864000000000
> > > * HBASE_HEAPSIZE : 4000
> > >
> > > Some how[1] a user has got his table into a complicated state. The
> > > table
> > > has 299 regions out of which roughly 28 regions have huge amount of
> > > store
> > > files in them, as high as 2300 (snapshot
> > > http://pastie.org/pastes/3907336/text) files! To add to
> complication
> > > the individual store files are as big as 14GB.
> > >
> > > Now I am in pursuit of balancing the data in this table.  I tried
> doing
> > > manual splits. But the split requests were failing with error "Took
> too
> > > long to split the files and create the references, aborting split".
> > > To get around I increased hbase.regionserver.fileSplitTimeout.
> > >
> > > From this point splits happend. I went ahead and identified 10
> regions
> > > which had too many store files and did split on them. After splits
> > > daughter
> > > regions were created with references to all the store files in the
> > > parent
> > > region and compactions started happening. The minor compaction
> > > threshold is
> > > 10. Since there are 2000 + files (taking one instance for example)
> it
> > > will
> > > do 200 sweeps of minor compaction.
> > > Each sweep is running slow(couple of hours), since the individual
> files
> > > (in
> > > the set of 10) are too big.