Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Number of concurrent writer to HDFS


Copy link to this message
-
Re: Number of concurrent writer to HDFS
I think there is no distinct limitation at the number of files one can
write to at the same time. Because each write stream is out to
corresponding DataNodes which are different most likely. So it's like the
MapReduce output directly stored as seperate file in HDFS which is no
distinct limitation at the number of files concurrently write.

2012/8/7 Nguyen Manh Tien <[EMAIL PROTECTED]>

> @Yanbo, Alex: I want to dev a custom module to write directly to HDFS.
> Collector in flume aggregate log from many source and write into few file.
> So if i want to write to many file (for example one for each source), i
> want to know how many file we can open in that case.
>
> Thanks.
> Tien
>
>
> On Mon, Aug 6, 2012 at 9:58 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>
>> Also interested in this question.
>>
>> @Yanbo: while we could use third-party tools to import/gather data into
>> HDFS, I guess here is the intention to write data to HDFS directly. It
>> would be great to hear what are the "sensible" limitations on number of
>> files one can write to at the same time.
>>
>> Thank you in advance,
>>
>> Alex Baranau
>> ------
>> Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
>>
>> On Mon, Aug 6, 2012 at 2:14 AM, Yanbo Liang <[EMAIL PROTECTED]> wrote:
>>
>>> You can use scribe or flume to collect log data and integrated with
>>> hadoop.
>>>
>>>
>>> 2012/8/4 Nguyen Manh Tien <[EMAIL PROTECTED]>
>>>
>>>> Hi,
>>>> I plan to streaming logs data HDFS using many writer, each writer write
>>>> a stream of data to a HDFS file (may rotate)
>>>>
>>>> I wonder how many concurrent writer i should use?
>>>> And if you have that experience please share to me : hadoop cluster
>>>> size, number of writer, replication.
>>>>
>>>> Thanks.
>>>> Tien
>>>>
>>>
>>>
>>
>>
>> --
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
>> - Solr
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB