Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Block size in HDFS

Copy link to this message
Re: Block size in HDFS
I am also relatively new to hadoop, so others may feel free to correct me if
am wrong.

NN keeps track of a file by "inode" and the blocks related to that inode. In
your case, since your file size is smaller than the block size, NN will have
only ONE block associated with this inode (assuming only one replication).
To track the 1KB file, the approximate memory that it will cost NN is the
memory to store the INODE-to-BLOCK relation.

As far as the DN is concerned, all it knows is that there is data that
resides on its disk with a specific name (that associates with the Block
name that NN knows). It does not know (or care) about what the block size is
or where else it is replicated etc! Hence for the DN it is just another file
and it consumes how much ever space is required to store this file (1KB in
your case).

So, it does not cost either the NN or DN 64MB to store a 1KB file.

John George

On 6/10/11 11:47 AM, "Pedro Costa" <[EMAIL PROTECTED]> wrote:

> So, I'm not getting how a 1KB file can cost a block of 64MB. Can
> anyone explain me?
> On Fri, Jun 10, 2011 at 5:13 PM, Philip Zeyliger <[EMAIL PROTECTED]> wrote:
>> On Fri, Jun 10, 2011 at 9:08 AM, Pedro Costa <[EMAIL PROTECTED]> wrote:
>>> This means that, when HDFS reads 1KB file from the disk, he will put
>>> the data in blocks of 64MB?
>> No.
>>> On Fri, Jun 10, 2011 at 5:00 PM, Philip Zeyliger <[EMAIL PROTECTED]>
>>> wrote:
>>>> On Fri, Jun 10, 2011 at 8:42 AM, Pedro Costa <[EMAIL PROTECTED]> wrote:
>>>>> But, how can I say that a 1KB file will only use 1KB of disc space, if
>>>>> a block is configured has 64MB? In my view, if a 1KB use a block of
>>>>> 64MB, the file will occupy 64MB in the disc.
>>>> A block of HDFS is the unit of distribution and replication, not the
>>>> unit of storage.  HDFS uses the underlying file systems for physical
>>>> storage.
>>>> -- Philip
>>>>> How can you disassociate a  64MB data block from HDFS of a disk block?
>>>>> On Fri, Jun 10, 2011 at 5:01 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:
>>>>>> On 06/10/2011 10:35 AM, Pedro Costa wrote:
>>>>>> Hi,
>>>>>> If I define HDFS to use blocks of 64 MB, and I store in HDFS a 1KB
>>>>>> file, this file will ocupy 64MB in the HDFS?
>>>>>> Thanks,
>>>>>> HDFS is not very efficient storing small files, because each file is
>>>>>> stored
>>>>>> in a block (of 64 MB in your case), and the block metadata
>>>>>> is held in memory by the NN. But you should know that this 1KB file only
>>>>>> will use 1KB of disc space.
>>>>>> For small files, you can use Hadoop archives.
>>>>>> Regards
>>>>>> --
>>>>>> Marcos Luís Ortíz Valmaseda
>>>>>>  Software Engineer (UCI)
>>>>>>  http://marcosluis2186.posterous.com
>>>>>>  http://twitter.com/marcosluis2186