Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Re: Fastest way to get data into flume?

Andrew Ehrlich 2014-03-27, 18:08
Mike Keane 2014-03-27, 18:11
Chris Schneider 2014-03-27, 19:30
Jeff Lord 2014-03-27, 19:34
Jimmy 2014-03-27, 19:40
Asim Zafir 2014-03-27, 20:59
ed 2014-03-28, 09:24
Copy link to this message
Re: Fastest way to get data into flume?
Thanks for the netcat ack setting - I'll give that a shot.

As for data ingestion, we're aiming for ~10000 events per second on our
development VMs, since we want to be able to run the system in time
acceleration (with simulation of incoming data). Each event is ~100-300

I've gotten massive speedups with the Thrift source -> Memory -> HDFS when
I start batching the Thrift source into larger and larger batches.  Since
the requirements for consistency are much less during development and demo,
it's easy to raise that batch size arbitrarily high, so I think that
solution will work for now.

Thank you everybody for the help thinking about this, and clarifying what
I'm seeing as reasonable / unreasonable.  I think a batched Thrift source
will sustain me for now, so I can move on with my project, and loop back
when I have better numbers for what I really need on my VM.
On Fri, Mar 28, 2014 at 3:23 AM, ed <[EMAIL PROTECTED]> wrote:
Chris Schneider 2014-03-27, 16:44