Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Use flume to copy data in local directory (hadoop server) into hdfs


Copy link to this message
-
Re: Use flume to copy data in local directory (hadoop server) into hdfs
Luu,

Have you tried using the spooling directory source?

-Jeff
On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I need to copy data in a local directory (hadoop server) into hdfs
> regularly and automatically. This is my flume config:
>
> agent.sources = execSource
> agent.channels = fileChannel
> agent.sinks = hdfsSink
>
> agent.sources.execSource.type = exec
>
> agent.sources.execSource.shell = /bin/bash -c
> agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done
>
> agent.sources.execSource.restart = true
> agent.sources.execSource.restartThrottle = 3600000
> agent.sources.execSource.batchSize = 100
>
> ...
> agent.sinks.hdfsSink.hdfs.rollInterval = 0
> agent.sinks.hdfsSink.hdfs.rollSize = 262144000
> agent.sinks.hdfsSink.hdfs.rollCount = 0
> agent.sinks.hdfsSink.batchsize = 100000
> ...
> agent.channels.fileChannel.type = FILE
> agent.channels.fileChannel.capacity = 100000
> ...
>
> while hadoop command takes 30second, Flume takes arround 4 minutes to copy
> 1 gb text file into HDFS. I am worried about whether the config is not good
> or shouldn't use flume in this case?
>
> How about your opinion?
>
>