Thanks for all the replies on this.
Based on the feedback, particularly considering the high number of splits maintainable per server, I'll leave the splits in place. I'm not keen on merging tablets due to its impact on query performance.
From: Eric Newton [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, 9 April 2013 11:57
To: [EMAIL PROTECTED]
Subject: Re: Removing splits [SEC=UNCLASSIFIED]
Is there a maximum number of splits a table can have?
There are a few theoretical limits to the number of tablets you can have.
1) a row cannot be split over tablets: if you only have a billion rows, you can only have a billion tablets
2) tablet servers track some bits of overhead about a tablet in memory: typically this is only a 1-2K per tablet, so a gigabyte JVM would only be able to have a 500K-1M tablets per server.
3) there's a limit to the number of files/directories that can be stored in your NameNode. More tablets tend to create more files and directories.
Performance is likely to be poor at these limits, and it would not be helpful to approach them.
I have seen stable clusters with over 500K tablets.
How can splits be removed once they are nolonger required, I can't see any command in the api?
With version 1.4, you can merge tablets together. In the shell, you can merge ranges, or have the shell merge ranges based on size.
With version 1.5, you will be able to merge METADATA tablets together.
IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.