It does not matter what the file size is because the file size is
split into blocks which is what the NN tracks.
For larger deployments you can go with a large block size like 256MB
or even 512MB. Generally the bigger the file the better split
calculation is very input format dependent however.
On Wed, Jun 6, 2012 at 10:00 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> We have continuous flow of data into the sequence file. I am wondering what
> would be the ideal file size before file gets rolled over. I know too many
> small files are not good but could someone tell me what would be the ideal
> size such that it doesn't overload NameNode.