Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Re: Accumulo v1.4.1 - ran out of memory and lost data (RESOLVED - Data was restored)


+
David Medinets 2013-01-28, 16:24
+
Christopher 2013-01-29, 23:49
+
David Medinets 2013-01-28, 13:28
+
Eric Newton 2013-01-28, 13:53
+
John Vines 2013-01-28, 14:32
+
Keith Turner 2013-01-30, 16:30
+
John Vines 2013-01-30, 16:35
Copy link to this message
-
Re: Accumulo v1.4.1 - ran out of memory and lost data
Yes. Accumulo fully recovered when I restarted the loggers.

On Wed, Jan 30, 2013 at 11:30 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
> Was this resolved?
>
> On Mon, Jan 28, 2013 at 8:28 AM, David Medinets
> <[EMAIL PROTECTED]> wrote:
>> I had a plain Java program, single-threaded, that read an HDFS
>> Sequence File with fairly small Sqoop records (probably under 200
>> bytes each). As each record was read a Mutation was created, then
>> written via Batch Writer to Accumulo. This program was as simple as it
>> gets. Read a record, Write a mutation. The Row Id used YYYYMMDD (a
>> date) so the ingest targeted one tablet. The ingest rate was over 150
>> million entries for about 19 hours. Everything seemed fine. Over 3.5
>> Billion entries were written. Then the nodes ran out of memory and
>> Accumulo nodes went dead. 90% of the server was lost. And data poofed
>> out of existence. Only 800M entries are visible now.
>>
>> We restarted the data node processes and the cluster has been running
>> garbage collection for over 2 days.
>>
>> I did not expect this simple approach to cause an issue. From looking
>> at the logs file, I think that at least two compactions were being run
>> while still ingested those 176 million entries per hour. The hold
>> times started rising and eventually the system simply ran out of
>> memory. I have no certainty about this explanation though.
>>
>> My current thinking is to re-initialize Accumulo and find some way to
>> programatically monitoring the hold time. The add a delay to the
>> ingest process whenever the hold time rises over 30 seconds. Does that
>> sound feasible?
>>
>> I know there are other approaches to ingest and I might give up this
>> method and use another. I was trying to get some kind of baseline for
>> analysis reasons with this approach.