Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Restarts without data loss


+
Senthilvel Rangaswamy 2012-07-09, 06:18
+
Arvind Prabhakar 2012-07-09, 06:29
+
alo alt 2012-07-09, 06:46
+
Senthilvel Rangaswamy 2012-07-09, 06:54
+
Inder Pall 2012-07-09, 07:02
+
Tejinder Aulakh 2012-07-09, 17:29
+
Arvind Prabhakar 2012-07-10, 09:20
+
Arvind Prabhakar 2012-07-10, 09:23
+
Inder Pall 2012-07-10, 16:11
+
Brock Noland 2012-07-09, 06:58
+
Senthilvel Rangaswamy 2012-07-09, 07:01
+
Inder Pall 2012-07-09, 07:04
+
Senthilvel Rangaswamy 2012-07-09, 07:11
+
Hari Shreedharan 2012-07-09, 07:07
+
Senthilvel Rangaswamy 2012-07-09, 07:12
+
Hari Shreedharan 2012-07-09, 07:21
+
Juhani Connolly 2012-07-09, 07:51
+
Brock Noland 2012-07-09, 08:22
+
Juhani Connolly 2012-07-09, 10:49
+
Brock Noland 2012-07-09, 17:36
Copy link to this message
-
Re: Restarts without data loss
Out of interest, there is a JIRA for graceful shutdown - FLUME-1318. Please add your design thoughts in JIRA
--Mubarak

On Jul 9, 2012, at 10:36 AM, Brock Noland wrote:

> If you ran the workload with file channel and then took 10 thread
> dumps I think we'd have enough to understand what is going on.
>
> Brock
>
> On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
> <[EMAIL PROTECTED]> wrote:
>> It is currently pushing only 10 events per second or so(roughly 250 bytes
>> per event). This is with datadir/checkpoint on the same directory. Of course
>> the fact that there is a tail process running and that tomcat is also
>> writing out logs is without a doubt compounding the problem somewhat.
>>
>> I haven't taken a serious look at thread dumps of the file channel since I
>> don't have a thorough understanding of it. However analysis has involved
>> trying varying numbers of sinks(no throughput difference) and replacing with
>> memory channel(which easily handles the 650 ish requests per second we have
>> per server for the particular api, no problems even with a single sink).
>>
>> Since you say there's heavy fsyncing, and with 7200rpm disks, each seek will
>> have an average latency of 4.16ms, so for alternating seeks between the
>> checkpoint and the data dir, if each of those writes happens in order,
>> you're already limited to best case of barely more than 100 events per
>> second. Our experience so far has shown it to be significantly less.
>>
>> I do believe that batching a bunch of puts or takes with a single commit
>> together as two seeks followed by writes(or one if we can only use a single
>> storage file) could give significant returns when paired with a batching
>> sink/source(which many already do... Requesting multiple items at a time).
>>
>> If there is any specific data you would like I would be happy to try and
>> provide it.
>>
>>
>> On 07/09/2012 05:22 PM, Brock Noland wrote:
>>
>> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> - Intended setup with flume was a file channel connected to an avro sink.
>>> With only a single disk available, it is extremely slow. JDBC channel is
>>> also extremely slow, and MemoryChannel will fill up and start refusing puts
>>> as soon as a network issue comes up.
>>
>>
>> Have you taken a few thread dumps or done other analysis? When you say
>> "extremely slow" what do you mean? Configured for no dataloss FileChannel is
>> going to be doing a lot of fsync'ing so I am not surprised it's slow. That
>> is a property of disks not FileChannel. I think we should use group commit
>> but that shouldn't make it 10x faster.
>>
>> Brock
>>
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
+
Juhani Connolly 2012-07-10, 02:14
+
Juhani Connolly 2012-07-10, 07:54
+
Brock Noland 2012-07-10, 16:18