Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Doubts on compressed file


Copy link to this message
-
Re: Doubts on compressed file
Hi,

Yes all files are split into block-size chunks in HDFS. HDFS is
agnostic about what the file's content is, and its attributes (such as
compression, etc.). This is left to the file reader logic to handle.

When a GZip reader initializes, it reads the whole file length, across
all the blocks the file may have, which HDFS lets you do transparently
by just requesting the data length to read. It ends up reading blocks
serially for you, and your app just has to take care of reading actual
gzip data without worrying about block split boundaries.

On Wed, Nov 7, 2012 at 5:52 PM, Ramasubramanian Narayanan
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> If a zip file(Gzip) is loaded into HDFS will it get splitted into Blocks and
> store in HDFS?
>
> I understand that a single mapper can work with GZip as it reads the entire
> file from beginning to end... In that case if the GZip file size is larget
> than 128 MB will it get splitted into blocks and stored in HDFS?
>
> regards,
> Rams

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB