Thanks for the netcat ack setting - I'll give that a shot.
As for data ingestion, we're aiming for ~10000 events per second on our
development VMs, since we want to be able to run the system in time
acceleration (with simulation of incoming data). Each event is ~100-300
I've gotten massive speedups with the Thrift source -> Memory -> HDFS when
I start batching the Thrift source into larger and larger batches. Since
the requirements for consistency are much less during development and demo,
it's easy to raise that batch size arbitrarily high, so I think that
solution will work for now.
Thank you everybody for the help thinking about this, and clarifying what
I'm seeing as reasonable / unreasonable. I think a batched Thrift source
will sustain me for now, so I can move on with my project, and loop back
when I have better numbers for what I really need on my VM.
On Fri, Mar 28, 2014 at 3:23 AM, ed <[EMAIL PROTECTED]> wrote: