David Medinets 2013-01-28, 16:24
Christopher 2013-01-29, 23:49
I had a plain Java program, single-threaded, that read an HDFS
Sequence File with fairly small Sqoop records (probably under 200
bytes each). As each record was read a Mutation was created, then
written via Batch Writer to Accumulo. This program was as simple as it
gets. Read a record, Write a mutation. The Row Id used YYYYMMDD (a
date) so the ingest targeted one tablet. The ingest rate was over 150
million entries for about 19 hours. Everything seemed fine. Over 3.5
Billion entries were written. Then the nodes ran out of memory and
Accumulo nodes went dead. 90% of the server was lost. And data poofed
out of existence. Only 800M entries are visible now.
We restarted the data node processes and the cluster has been running
garbage collection for over 2 days.
I did not expect this simple approach to cause an issue. From looking
at the logs file, I think that at least two compactions were being run
while still ingested those 176 million entries per hour. The hold
times started rising and eventually the system simply ran out of
memory. I have no certainty about this explanation though.
My current thinking is to re-initialize Accumulo and find some way to
programatically monitoring the hold time. The add a delay to the
ingest process whenever the hold time rises over 30 seconds. Does that
I know there are other approaches to ingest and I might give up this
method and use another. I was trying to get some kind of baseline for
analysis reasons with this approach.
Eric Newton 2013-01-28, 13:53
John Vines 2013-01-28, 14:32
Keith Turner 2013-01-30, 16:30
John Vines 2013-01-30, 16:35
David Medinets 2013-01-30, 16:36