Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Clarification on T file


Copy link to this message
-
Re: Clarification on T file
Hey Maninder,

In some ways the TFile is close to SequenceFiles.

On Fri, Apr 20, 2012 at 8:19 PM, maninder batth
<[EMAIL PROTECTED]> wrote:
> My requirements are to save variable sized binary records and ability to
> query them later on. So i was looking at Tfile and had some doubts.
>
> 1. Is the datablock in the tfile a fixed size or variable size? If it is
> fixed, what happens when a record cannot fit in the datablock? Would you
> normally fill the empty space with zeros or spread the record over 2
> datablocks?
>
> 2. Is there any downside of having a variable sized datablocks?

The condition for creation of a data block is only if the current size
of the block (at end of an append) is >= min-size-of-block.

Hence the data block isn't "fixed" in size. So if there's still space,
another record's written and then the condition is checked (which
would then trigger a block completion).

> 3. Are the records synced with file at the boundary of a datablock or they
> just written to file system. The question is like write() call in linux vs
> fsync()?

Unsure what you mean by a "datablock" here. The TFiles don't work at
the FS level, and the "datablocks" in it are logical. Could you
clarify this question given (1) and (2)?

--
Harsh J