Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Storage file format


Copy link to this message
-
Storage file format
Hi All,

I am interested in working on storage format. (sign up?)

I wrote a HDFS  file format, which is similar to Sequence file (row
storage, block management, compress), I provide InputFormat and
OutputFormat,

sometimes it get a great performance, sometimes not, depends on the data.

for Drill, we should implement a column-storage, this can skip some columns
during query, and skip some rows within one column file. but this
column-storage should based on the distributed file system, such as HDFS,
Mapr DFS, I like Mapr DFS because of HA.

we can implement the following column storage file format, I think it's
enough to us.

http://arxiv.org/pdf/1105.4252.pdf