-Re: Use flume to copy data in local directory (hadoop server) into hdfs
Jeong-shik Jang 2013-10-22, 01:09
And how about trying memory channel as well?
I think Hadoop command will read local and write to hdfs but file channel
may read, write, read and then hdfs doing some additional stuffs.
2013년 10월 22일 화요일에 Jeff Lord님이 작성:
> Have you tried using the spooling directory source?
> > wrote:
>> Hi all,
>> I need to copy data in a local directory (hadoop server) into hdfs
>> regularly and automatically. This is my flume config:
>> agent.sources = execSource
>> agent.channels = fileChannel
>> agent.sinks = hdfsSink
>> agent.sources.execSource.type = exec
>> agent.sources.execSource.shell = /bin/bash -c
>> agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done
>> agent.sources.execSource.restart = true
>> agent.sources.execSource.restartThrottle = 3600000
>> agent.sources.execSource.batchSize = 100
>> agent.sinks.hdfsSink.hdfs.rollInterval = 0
>> agent.sinks.hdfsSink.hdfs.rollSize = 262144000
>> agent.sinks.hdfsSink.hdfs.rollCount = 0
>> agent.sinks.hdfsSink.batchsize = 100000
>> agent.channels.fileChannel.type = FILE
>> agent.channels.fileChannel.capacity = 100000
>> while hadoop command takes 30second, Flume takes arround 4 minutes to
>> copy 1 gb text file into HDFS. I am worried about whether the config is not
>> good or shouldn't use flume in this case?
>> How about your opinion?