Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Programmatically write files into HDFS with Flume

Copy link to this message
Programmatically write files into HDFS with Flume
Chen Song 2013-04-30, 18:00
I am looking at options in Java programs that can write files into HDFS
with the following requirements.

1) Transaction Support: Each file, when being written, either fully written
successfully or failed totally without any partial file blocks written.

2) Compression Support/File Formats: Can specify compression type or file
format when writing contents.

I know how to write data into a file on HDFS by opening a FSDataOutputStream
 shown here<http://stackoverflow.com/questions/13457934/writing-to-a-file-in-hdfs-in-hadoop>.
Just wondering if there is some libraries of out of the box solutions that
provides the support I mentioned above.

I stumbled upon Flume, which provides HDFS sink that can support
transaction, compression, file rotation, etc. But it doesn't seem to
provide an API to be used as a library. The features Flume provides are
highly coupled with the Flume architectural components, like source,
channel, and sinks and doesn't seem to be usable independently. All I need
is merely on the HDFS loading part.

Does anyone have some good suggestions?

Chen Song