Michel Segel 2012-02-29, 12:10
Edward Capriolo 2012-02-29, 15:52
Niels Basjes 2012-02-29, 16:00

Note that the solution I created (HADOOP-7076) does not require any
It can split ANY gzipped file as-is.
The downside is that this effectively costs some additional performance
because the task has to decompress the first part of the file that is to be

The other two ways of splitting gzipped files either require
- creating come kind of "compression index" before actually using the file
- creating a file in a format that is gerenated in such a way that it is
really a set of concatenated gzipped files. (HADOOP-7909)

