Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Programmatically write files into HDFS with Flume


Copy link to this message
-
Programmatically write files into HDFS with Flume
I am looking at options in Java programs that can write files into HDFS
with the following requirements.

1) Transaction Support: Each file, when being written, either fully written
successfully or failed totally without any partial file blocks written.

2) Compression Support/File Formats: Can specify compression type or file
format when writing contents.

I know how to write data into a file on HDFS by opening a FSDataOutputStream
 shown here<http://stackoverflow.com/questions/13457934/writing-to-a-file-in-hdfs-in-hadoop>.
Just wondering if there is some libraries of out of the box solutions that
provides the support I mentioned above.

I stumbled upon Flume, which provides HDFS sink that can support
transaction, compression, file rotation, etc. But it doesn't seem to
provide an API to be used as a library. The features Flume provides are
highly coupled with the Flume architectural components, like source,
channel, and sinks and doesn't seem to be usable independently. All I need
is merely on the HDFS loading part.

Does anyone have some good suggestions?

--
Chen Song
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB