I am looking at options in Java programs that can write files into HDFS
with the following requirements.
1) Transaction Support: Each file, when being written, either fully written
successfully or failed totally without any partial file blocks written.
2) Compression Support/File Formats: Can specify compression type or file
format when writing contents.
I know how to write data into a file on HDFS by opening a FSDataOutputStream
Just wondering if there is some libraries of out of the box solutions that
provides the support I mentioned above.
I stumbled upon Flume, which provides HDFS sink that can support
transaction, compression, file rotation, etc. But it doesn't seem to
provide an API to be used as a library. The features Flume provides are
highly coupled with the Flume architectural components, like source,
channel, and sinks and doesn't seem to be usable independently. All I need
is merely on the HDFS loading part.
Does anyone have some good suggestions?