Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Should splittable Gzip be a "core" hadoop feature?


+
Michel Segel 2012-02-29, 12:10
+
Edward Capriolo 2012-02-29, 15:52
Copy link to this message
-
Re: Should splittable Gzip be a "core" hadoop feature?
Hi,

On Wed, Feb 29, 2012 at 16:52, Edward Capriolo <[EMAIL PROTECTED]>wrote:
...

> But being able to generate split info for them and processing them
> would be good as well. I remember that was a hot thing to do with lzo
> back in the day. The pain of once overing the gz files to generate the
> split info is detracting but it is nice to know it is there if you
> want it.
>

Note that the solution I created (HADOOP-7076) does not require any
preprocessing.
It can split ANY gzipped file as-is.
The downside is that this effectively costs some additional performance
because the task has to decompress the first part of the file that is to be
discarded.

The other two ways of splitting gzipped files either require
- creating come kind of "compression index" before actually using the file
(HADOOP-6153)
- creating a file in a format that is gerenated in such a way that it is
really a set of concatenated gzipped files. (HADOOP-7909)

--
Best regards / Met vriendelijke groeten,

Niels Basjes
+
Robert Evans 2012-02-29, 16:31
+
Edward Capriolo 2012-02-29, 17:06
+
Robert Evans 2012-02-29, 18:13
+
Niels Basjes 2012-02-29, 21:17
+
Michel Segel 2012-03-01, 12:34
+
Niels Basjes 2012-02-29, 15:55