Shrijeet Paliwal 2012-05-13, 23:12
Stack 2012-05-14, 19:29
Shrijeet Paliwal 2012-05-14, 21:11
Stack 2012-05-14, 21:46
Shrijeet Paliwal 2012-05-14, 22:03
Stack 2012-05-14, 22:26
Ramkrishna.S.Vasudevan 2012-05-14, 04:29
-Re: [HBase 0.92.1] Too many stores files to compact, compaction moving slowly
Shrijeet Paliwal 2012-05-14, 05:20
https://issues.apache.org/jira/browse/HBASE-5161 does sound like it. First
it was a heavy write scenario, second the region server was low on memory.
In your case of 400GB region. What was the size of individual store files?
Were they big as well? While I can see why number of store files will
grow, I am not able to understand why size of individual store files keep
Lastly what did you do with your 400 GB region? Any work around ?
On Sun, May 13, 2012 at 9:29 PM, Ramkrishna.S.Vasudevan <
[EMAIL PROTECTED]> wrote:
> Hi Shrijeet
> Regarding your last question about the region growing bigger
> The following points could be one reason
> When you said your compactions are slower and also you were trying to
> some very big store files, every split would have created some set of
> reference files.
> By the time as more writes are happening more store files are flushed.
> In the compaction algo, whenever reference files are found those files
> be tried to compact. But what happens is the though there are reference
> files we try to take the latest files to compact and the reference files
> keeps losing the race in getting compacted i.e they are priority is going
> Pls refer to HBASE-5161. It could be your case. In our case the region
> infact went upto 400GB, but it was a heavy write scenario.
> > -----Original Message-----
> > From: Shrijeet Paliwal [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, May 14, 2012 4:43 AM
> > To: [EMAIL PROTECTED]
> > Subject: [HBase 0.92.1] Too many stores files to compact, compaction
> > moving slowly
> > Hi,
> > HBase version : 0.92.1
> > Hadoop version: 0.20.2-cdh3u0
> > Relavant configurations:
> > * hbase.regionserver.fileSplitTimeout : 300000
> > * hbase.hstore.compactionThreshold : 3
> > * hbase.hregion.max.filesize : 2147483648
> > * hbase.hstore.compaction.max : 10
> > * hbase.hregion.majorcompaction: 864000000000
> > * HBASE_HEAPSIZE : 4000
> > Some how a user has got his table into a complicated state. The
> > table
> > has 299 regions out of which roughly 28 regions have huge amount of
> > store
> > files in them, as high as 2300 (snapshot
> > http://pastie.org/pastes/3907336/text) files! To add to complication
> > the individual store files are as big as 14GB.
> > Now I am in pursuit of balancing the data in this table. I tried doing
> > manual splits. But the split requests were failing with error "Took too
> > long to split the files and create the references, aborting split".
> > To get around I increased hbase.regionserver.fileSplitTimeout.
> > From this point splits happend. I went ahead and identified 10 regions
> > which had too many store files and did split on them. After splits
> > daughter
> > regions were created with references to all the store files in the
> > parent
> > region and compactions started happening. The minor compaction
> > threshold is
> > 10. Since there are 2000 + files (taking one instance for example) it
> > will
> > do 200 sweeps of minor compaction.
> > Each sweep is running slow(couple of hours), since the individual files
> > (in
> > the set of 10) are too big.
> > Now coming to questions:
> > A] Given we can afford down time of this table (and of cluster if
> > needed)
> > can I do some thing *better* than manual splits and allowing
> > compactions to
> > complete? (I am picturing a tool which scans all the HDFS directories
> > under
> > the table and launches a distributed *compact and split if needed* job.
> > Or
> > some thing along those lines..)
> > B] If not (A) , can I temporarily tweak some configurations (other than
> > heap given to region server) to get the table to a healthy state?
> > C] How come we managed to get individual files as big as 15GB, our max
> > region size has been configured to be 2GB?
> >  My theory is during the writes all requests consistently went to
> > same
> > region server and we managed to flushed faster than we could compact.
Ramkrishna.S.Vasudevan 2012-05-14, 05:48