Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Use flume to copy data in local directory (hadoop server) into hdfs


Copy link to this message
-
Re: Use flume to copy data in local directory (hadoop server) into hdfs
Hi Luu,

And how about trying memory channel as well?
I think Hadoop command will read local and write to hdfs but file channel
may read, write, read and then hdfs doing some additional stuffs.

JS

2013년 10월 22일 화요일에 Jeff Lord님이 작성:

> Luu,
>
> Have you tried using the spooling directory source?
>
> -Jeff
>
>
> On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');>
> > wrote:
>
>> Hi all,
>>
>> I need to copy data in a local directory (hadoop server) into hdfs
>> regularly and automatically. This is my flume config:
>>
>> agent.sources = execSource
>> agent.channels = fileChannel
>> agent.sinks = hdfsSink
>>
>> agent.sources.execSource.type = exec
>>
>> agent.sources.execSource.shell = /bin/bash -c
>> agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done
>>
>> agent.sources.execSource.restart = true
>> agent.sources.execSource.restartThrottle = 3600000
>> agent.sources.execSource.batchSize = 100
>>
>> ...
>> agent.sinks.hdfsSink.hdfs.rollInterval = 0
>> agent.sinks.hdfsSink.hdfs.rollSize = 262144000
>> agent.sinks.hdfsSink.hdfs.rollCount = 0
>> agent.sinks.hdfsSink.batchsize = 100000
>> ...
>> agent.channels.fileChannel.type = FILE
>> agent.channels.fileChannel.capacity = 100000
>> ...
>>
>> while hadoop command takes 30second, Flume takes arround 4 minutes to
>> copy 1 gb text file into HDFS. I am worried about whether the config is not
>> good or shouldn't use flume in this case?
>>
>> How about your opinion?
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB