Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Use flume to copy data in local directory (hadoop server) into hdfs


Copy link to this message
-
Re: Use flume to copy data in local directory (hadoop server) into hdfs
Luu,

Have you tried using the spooling directory source?

-Jeff
On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I need to copy data in a local directory (hadoop server) into hdfs
> regularly and automatically. This is my flume config:
>
> agent.sources = execSource
> agent.channels = fileChannel
> agent.sinks = hdfsSink
>
> agent.sources.execSource.type = exec
>
> agent.sources.execSource.shell = /bin/bash -c
> agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done
>
> agent.sources.execSource.restart = true
> agent.sources.execSource.restartThrottle = 3600000
> agent.sources.execSource.batchSize = 100
>
> ...
> agent.sinks.hdfsSink.hdfs.rollInterval = 0
> agent.sinks.hdfsSink.hdfs.rollSize = 262144000
> agent.sinks.hdfsSink.hdfs.rollCount = 0
> agent.sinks.hdfsSink.batchsize = 100000
> ...
> agent.channels.fileChannel.type = FILE
> agent.channels.fileChannel.capacity = 100000
> ...
>
> while hadoop command takes 30second, Flume takes arround 4 minutes to copy
> 1 gb text file into HDFS. I am worried about whether the config is not good
> or shouldn't use flume in this case?
>
> How about your opinion?
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB