When you seek to a position within a HDFS file, you are not seek from the
start of the first block and then one by one.
Actually DFSClient can skip some blocks until find one block, which offset
and block length includes your seek position.
On Mon, Apr 1, 2013 at 12:55 AM, Rahul Bhattacharjee <
[EMAIL PROTECTED]> wrote:
> Many places it has been written that to avoid huge no of disk seeks , we
> store big blocks in HDFS , so that once we seek to the location , then
> there is only data transfer rate which would be predominant , no more
> seeks. I am not sure if I have understood this correctly.
> My question is , no matter what the block size we decide , finally its
> getting written to the computers HDD , which would be formatted and would
> have a block size in KB's and also while writing to the FS (not HDFS) , its
> not guaranteed that the blocks that we write are continuous , so there
> would be disk seeks anyways .The assumption of HDFS would be only true if
> the underlying Fs guarentees to write the data in continuous blocks.
> Can someone explain a bit.