> Is there a maximum number of splits a table can have?
There are a few theoretical limits to the number of tablets you can have.
1) a row cannot be split over tablets: if you only have a billion rows, you
can only have a billion tablets
2) tablet servers track some bits of overhead about a tablet in memory:
typically this is only a 1-2K per tablet, so a gigabyte JVM would only be
able to have a 500K-1M tablets per server.
3) there's a limit to the number of files/directories that can be stored in
your NameNode. More tablets tend to create more files and directories.
Performance is likely to be poor at these limits, and it would not be
helpful to approach them.
I have seen stable clusters with over 500K tablets.
> How can splits be removed once they are nolonger required, I can't see
> any command in the api?
With version 1.4, you can merge tablets together. In the shell, you can
merge ranges, or have the shell merge ranges based on size.
With version 1.5, you will be able to merge METADATA tablets together.