Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Programmatically write files into HDFS with Flume


Copy link to this message
-
Re: Programmatically write files into HDFS with Flume
Roshan Naik 2013-05-01, 02:25
Are you sure you want to directly write to hdfs from the app that is
generating data ? often in production, apps like web servers etc do not
have direct access to HDFS.  i am not sure that HDFS sink guarantees  'either
fully written successfully or failed totally without any partial file
blocks written' since  each transaction does not translate into a separate
file. so i think there could be some partially written transactions in case
of transaction abort.

This level of support for all-or-none at the file level is planned for what
is currently referred to as the HCatalog sink
https://issues.apache.org/jira/browse/FLUME-1734

-roshan
On Tue, Apr 30, 2013 at 6:48 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> If you just want to write data to HDFS then Flume might not be the best
> thing to use; however, there is a Flume Embedded Agent<https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst#embedded-agent>that will embed Flume into your application. I don't believe it works yet
> with the HDFS sink, but some tinkering can likely make it work.
>
> - Connor
>
>
> On Tue, Apr 30, 2013 at 11:00 AM, Chen Song <[EMAIL PROTECTED]>wrote:
>
>> I am looking at options in Java programs that can write files into HDFS
>> with the following requirements.
>>
>> 1) Transaction Support: Each file, when being written, either fully
>> written successfully or failed totally without any partial file blocks
>> written.
>>
>> 2) Compression Support/File Formats: Can specify compression type or file
>> format when writing contents.
>>
>> I know how to write data into a file on HDFS by opening a
>> FSDataOutputStream shown here<http://stackoverflow.com/questions/13457934/writing-to-a-file-in-hdfs-in-hadoop>.
>> Just wondering if there is some libraries of out of the box solutions that
>> provides the support I mentioned above.
>>
>> I stumbled upon Flume, which provides HDFS sink that can support
>> transaction, compression, file rotation, etc. But it doesn't seem to
>> provide an API to be used as a library. The features Flume provides are
>> highly coupled with the Flume architectural components, like source,
>> channel, and sinks and doesn't seem to be usable independently. All I need
>> is merely on the HDFS loading part.
>>
>> Does anyone have some good suggestions?
>>
>> --
>> Chen Song
>>
>>
>