Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> In flume-ng is there any advantages of 2-tier topology in  a cluster of  30-40 nodes?


Copy link to this message
-
Re: In flume-ng is there any advantages of 2-tier topology in  a cluster of  30-40 nodes?
Hi

Does someone have any inputs on this?

Just to summarize the questions again....

-- In a cluster with small number of nodes (say 30 -50) is it sufficient
to use only 1 tier architecture in flume?
-- How does 2-tier architecture  help in getting better HA in the above
environment?

Regards,
Jagadish

On 01/30/2013 08:13 PM, Jagadish Bihani wrote:
> Hi
>
> Thanks  Alexander for the reply.
> I have added my thoughts in line.
>
> On 01/30/2013 11:56 AM, Alexander Alten-Lorenz wrote:
>> Hi,
>>
>> If the agents (Tier 1) have access to HDFS, each single client can
>> put data into HDFS. But this doesn't make really sense, instead you
>> want different files from different hosts in a structured view (maybe
>> per host a directory, the contents inside split into buckets).
> -- But if number of clients are lesser (say 30-40) why doesn't it make
> sense to write directly?
> Because ultimately purpose is to deliver the source data to HDFS
> directly. (say in a single HDFS directory).
>> When you implement a Tier 2 (maybe 2 or more servers who has access
>> to HDFS), you can have more features like loadbalancing, HA and
>> mirrored sinks, as example (one sink put the data into HDFS, the
>> other sink into a other system for backup maybe). For stability and
>> reliability a Tier 2 architecture is recommend. And made some things
>> easier ;)
> -- I didnt get the point how we get HA and load balancing using 2
> tiers.  e.g.
> 1. If HDFS goes down then both in 1 tier case and 2 tier
> case channel will grow until its maximum size.
> 2. If in 1-tier scenario one node goes down then its data wont reach
> HDFS.
> Similarly in 2 tier scenario : if a node from 1st tier goes down then
> its data
> wont reach HDFS.
>
> Could you please elaborate if I am missing something?
>>
>> Cheers,
>>   Alex
>>
>> On Jan 30, 2013, at 7:05 AM, Jagadish Bihani
>> <[EMAIL PROTECTED]> wrote:
>>
>>> Hi
>>>
>>> In our scenario there are around 30 machines from which we want to
>>> put data into HDFS.
>>>
>>> Now the approach we thought of initially was:
>>>
>>> 1. First tier  : Agent which collect data from source then pass it
>>> to avro sink.
>>> 2. Second tier:  Lets call those agents 'collectors' which collect
>>> data from First tier agents and then dump it to HDFS.
>>> (Second tier agents are fewer in number say 4:1)
>>>
>>> Instead of above topology if I simply use HDFS sink in first tier
>>> agents. It can serve the purpose.
>>> And also number of nodes are lesser (say 30) that won't hurt HDFS
>>> namenode too much compared
>>> to if number of nodes were say 1000.
>>>
>>> But apart from that I don't say any advantage of adding the 2nd tier.
>>> Is there any advantage I am missing in terms of failover, HDFS
>>> performance or any other parameter?
>>>
>>> Regards,
>>> Jagadish
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB