-Seekable interface and CompressInputStream question
java8964 java8964 2012-12-22, 02:13
I have a question related to Seekable interface. Right now I am using the CDH3 release, with hadoop 0.20.2. I understand in it, the CompressInputStream will throw UnsupportedException in methods inherited from Seekable interface, as they are not implemented.
My question is that does Seekable mean the underline InputStream will support Split? As if an InputStream can be seekable, then it should be able to split, right?
If so, in the future release, I assume that CompressInputStream will implement Seekable in hadoop. But my understand is that some compression can be split, some cannot. If the data file is gzip file, and let's say that I get a CompressInputStream does support Seekable, with codec of Gzip codec, I will assume it is Splitable, but in fact it isn't. How do I write a generic InputFormat to support both splitable/unsplitable compress input stream in this case? Or my understanding is not correct, that Seekable and Split are totally different things?