Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Use flume to copy data in local directory (hadoop server) into hdfs


+
Cuong Luu 2013-10-21, 10:25
Copy link to this message
-
Re: Use flume to copy data in local directory (hadoop server) into hdfs
Jeff Lord 2013-10-21, 15:50
Luu,

Have you tried using the spooling directory source?

-Jeff
On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I need to copy data in a local directory (hadoop server) into hdfs
> regularly and automatically. This is my flume config:
>
> agent.sources = execSource
> agent.channels = fileChannel
> agent.sinks = hdfsSink
>
> agent.sources.execSource.type = exec
>
> agent.sources.execSource.shell = /bin/bash -c
> agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done
>
> agent.sources.execSource.restart = true
> agent.sources.execSource.restartThrottle = 3600000
> agent.sources.execSource.batchSize = 100
>
> ...
> agent.sinks.hdfsSink.hdfs.rollInterval = 0
> agent.sinks.hdfsSink.hdfs.rollSize = 262144000
> agent.sinks.hdfsSink.hdfs.rollCount = 0
> agent.sinks.hdfsSink.batchsize = 100000
> ...
> agent.channels.fileChannel.type = FILE
> agent.channels.fileChannel.capacity = 100000
> ...
>
> while hadoop command takes 30second, Flume takes arround 4 minutes to copy
> 1 gb text file into HDFS. I am worried about whether the config is not good
> or shouldn't use flume in this case?
>
> How about your opinion?
>
>
+
Jeong-shik Jang 2013-10-22, 01:09
+
ltcuong211 2013-10-24, 15:35
+
DSuiter RDX 2013-10-24, 17:57
+
ltcuong211 2013-10-25, 14:25