Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Use flume to copy data in local directory (hadoop server) into hdfs


+
Cuong Luu 2013-10-21, 10:25
+
Jeff Lord 2013-10-21, 15:50
Copy link to this message
-
Re: Use flume to copy data in local directory (hadoop server) into hdfs
Jeong-shik Jang 2013-10-22, 01:09
Hi Luu,

And how about trying memory channel as well?
I think Hadoop command will read local and write to hdfs but file channel
may read, write, read and then hdfs doing some additional stuffs.

JS

2013년 10월 22일 화요일에 Jeff Lord님이 작성:

> Luu,
>
> Have you tried using the spooling directory source?
>
> -Jeff
>
>
> On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');>
> > wrote:
>
>> Hi all,
>>
>> I need to copy data in a local directory (hadoop server) into hdfs
>> regularly and automatically. This is my flume config:
>>
>> agent.sources = execSource
>> agent.channels = fileChannel
>> agent.sinks = hdfsSink
>>
>> agent.sources.execSource.type = exec
>>
>> agent.sources.execSource.shell = /bin/bash -c
>> agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done
>>
>> agent.sources.execSource.restart = true
>> agent.sources.execSource.restartThrottle = 3600000
>> agent.sources.execSource.batchSize = 100
>>
>> ...
>> agent.sinks.hdfsSink.hdfs.rollInterval = 0
>> agent.sinks.hdfsSink.hdfs.rollSize = 262144000
>> agent.sinks.hdfsSink.hdfs.rollCount = 0
>> agent.sinks.hdfsSink.batchsize = 100000
>> ...
>> agent.channels.fileChannel.type = FILE
>> agent.channels.fileChannel.capacity = 100000
>> ...
>>
>> while hadoop command takes 30second, Flume takes arround 4 minutes to
>> copy 1 gb text file into HDFS. I am worried about whether the config is not
>> good or shouldn't use flume in this case?
>>
>> How about your opinion?
>>
>>
>
+
ltcuong211 2013-10-24, 15:35
+
DSuiter RDX 2013-10-24, 17:57
+
ltcuong211 2013-10-25, 14:25