Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Ideal file size


Copy link to this message
-
Re: Ideal file size
Edward Capriolo 2012-06-06, 14:55
It does not matter what the file size is because the file size is
split into blocks which is what the NN tracks.

For larger deployments you can go with a large block size like 256MB
or even 512MB.  Generally the bigger the file the better split
calculation is very input format dependent however.

On Wed, Jun 6, 2012 at 10:00 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> We have continuous flow of data into the sequence file. I am wondering what
> would be the ideal file size before file gets rolled over. I know too many
> small files are not good but could someone tell me what would be the ideal
> size such that it doesn't overload NameNode.