Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Clarification on T file


Copy link to this message
-
Re: Clarification on T file
Hey Maninder,

In some ways the TFile is close to SequenceFiles.

On Fri, Apr 20, 2012 at 8:19 PM, maninder batth
<[EMAIL PROTECTED]> wrote:
> My requirements are to save variable sized binary records and ability to
> query them later on. So i was looking at Tfile and had some doubts.
>
> 1. Is the datablock in the tfile a fixed size or variable size? If it is
> fixed, what happens when a record cannot fit in the datablock? Would you
> normally fill the empty space with zeros or spread the record over 2
> datablocks?
>
> 2. Is there any downside of having a variable sized datablocks?

The condition for creation of a data block is only if the current size
of the block (at end of an append) is >= min-size-of-block.

Hence the data block isn't "fixed" in size. So if there's still space,
another record's written and then the condition is checked (which
would then trigger a block completion).

> 3. Are the records synced with file at the boundary of a datablock or they
> just written to file system. The question is like write() call in linux vs
> fsync()?

Unsure what you mean by a "datablock" here. The TFiles don't work at
the FS level, and the "datablocks" in it are logical. Could you
clarify this question given (1) and (2)?

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB