Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Re: seeking help on flume cluster deployment


+
Chen Wang 2014-01-09, 23:12
+
Chen Wang 2014-01-10, 00:56
+
Saurabh B 2014-01-10, 03:18
+
Chen Wang 2014-01-10, 03:24
+
Jeff Lord 2014-01-10, 03:49
+
Chen Wang 2014-01-10, 04:43
+
Jeff Lord 2014-01-10, 04:50
+
Joao Salcedo 2014-01-10, 04:58
+
Chen Wang 2014-01-10, 05:20
+
Ashish 2014-01-10, 05:29
+
Chen Wang 2014-01-10, 05:38
+
Ashish 2014-01-10, 06:00
Copy link to this message
-
Re: seeking help on flume cluster deployment
Ashish,
Interesting enough, i was initially doing 1 too, and had a working version.
But finally I give it up because in my bolt i have to flush to hdfs either
when data reaching certain size or a timer times out, which is exactly what
flume can offer. Also it has some complexity of grouping entries within the
same partition while with flume it is a piece of cake.

Thank you so much for all you guys's input. It helped me a lot !
Chen

On Thu, Jan 9, 2014 at 10:00 PM, Ashish <[EMAIL PROTECTED]> wrote:

> Got it!
>
> My first reaction was to use HDFS bolt to write data directly to HDFS, but
> couldn't find an implementation for the same. My knowledge is limited for
> Storm.
> If the data is already flowing through Storm, you got two options
> 1. Write a bolt to dump data to HDFS
> 2. Write a Flume bolt using RPC client as recommended in thread, and reuse
> Flume's capabilities.
>
> If you already have Flume installation running, #2 is quickest way of
> running. Otherwise also, installing and running Flume is like a walk in the
> park :)
>
> You can also watch related discussion on
> https://issues.apache.org/jira/browse/FLUME-1286. There is some good info
> in the JIRA.
>
> thanks
> ashish
>
>
>
>
> On Fri, Jan 10, 2014 at 11:08 AM, Chen Wang <[EMAIL PROTECTED]>wrote:
>
>> Ashish,
>> Since we already use storm for other real time processing, i thus want to
>> re utilize it. The biggest advantage for me of using storm in this case is
>> that i could use storm's spout to read from our socket server continuously,
>> and the storm framework can ensure it never stops. Meantime, i can also
>> easily filter out /translate the data in bolt before sending to flume. For
>> this piece of data stream, right now my first step is to get it into hdfs,
>> but will add real time processing soon.
>> Does that make sense to you?
>> Thanks,
>> Chen
>>
>>
>> On Thu, Jan 9, 2014 at 9:29 PM, Ashish <[EMAIL PROTECTED]> wrote:
>>
>>> Why do you need Storm? Are you doing any real time processing? If not,
>>> IMHO, avoid Storm.
>>>
>>> Can use something like this
>>>
>>> Socket -> Load Balanced RPC Client -> Flume Topology with HA
>>>
>>> What Application level protocol are you using at Socket level?
>>>
>>>
>>> On Fri, Jan 10, 2014 at 10:50 AM, Chen Wang <[EMAIL PROTECTED]>wrote:
>>>
>>>> Jeff, Joao,
>>>> Thanks for the pointer!
>>>> I think i am getting close here:
>>>> 1. set up a cluster of flume agent with redundancies, source as avro,
>>>> sink as HDFS.
>>>> 2 use storm(not quite necessary) to read from our socket server, then
>>>> in the bolt, using flume client (load balancing rpc client) to send the
>>>> event to the agent set up in step 1.
>>>>
>>>> Then I thus get all the benefit of storm and flume. Does this set up
>>>> look right to you?
>>>> thank you very much,
>>>> Chen
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 8:58 PM, Joao Salcedo <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi Chen,
>>>>>
>>>>> Maybe it would be worth checking this
>>>>>
>>>>> http://flume.apache.org/FlumeDeveloperGuide.html#loadbalancing-rpc-client
>>>>>
>>>>> Regards,
>>>>>
>>>>> Joao
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 3:50 PM, Jeff Lord <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Have you taken a look at the load balancing rpc client?
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 8:43 PM, Chen Wang <[EMAIL PROTECTED]
>>>>>> > wrote:
>>>>>>
>>>>>>> Jeff,
>>>>>>> I have read this ppt at the beginning, but didn't find solution to
>>>>>>> my user case. To simplify my case, I only have 1 data source(composed of 5
>>>>>>> socket server)  and i am looking for a fault tolerant deployment of flume,
>>>>>>> that can read from this single data source and sink to hdfs in fault
>>>>>>> tolerant mode: when one node dies, another flume node can pick up and
>>>>>>> continue;
>>>>>>> Thanks,
>>>>>>> Chen
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 7:49 PM, Jeff Lord <[EMAIL PROTECTED]>wrote:
>>>>>>>
>>>>>>>> Chen,
>>>>>>>>
>>>>>>>> Have you taken a look at this presentation on Planning and
+
Chen Wang 2014-01-11, 01:15
+
Chen Wang 2014-01-11, 01:47
+
Chen Wang 2014-01-11, 06:09
+
Chen Wang 2014-01-09, 22:38
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB