Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Ideal file size


Copy link to this message
-
Re: Ideal file size
It does not matter what the file size is because the file size is
split into blocks which is what the NN tracks.

For larger deployments you can go with a large block size like 256MB
or even 512MB.  Generally the bigger the file the better split
calculation is very input format dependent however.

On Wed, Jun 6, 2012 at 10:00 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> We have continuous flow of data into the sequence file. I am wondering what
> would be the ideal file size before file gets rolled over. I know too many
> small files are not good but could someone tell me what would be the ideal
> size such that it doesn't overload NameNode.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB