Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Accumulo v1.4.1 - ran out of memory and lost data


Copy link to this message
-
Re: Accumulo v1.4.1 - ran out of memory and lost data
Yes. Accumulo fully recovered when I restarted the loggers.

On Wed, Jan 30, 2013 at 11:30 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
> Was this resolved?
>
> On Mon, Jan 28, 2013 at 8:28 AM, David Medinets
> <[EMAIL PROTECTED]> wrote:
>> I had a plain Java program, single-threaded, that read an HDFS
>> Sequence File with fairly small Sqoop records (probably under 200
>> bytes each). As each record was read a Mutation was created, then
>> written via Batch Writer to Accumulo. This program was as simple as it
>> gets. Read a record, Write a mutation. The Row Id used YYYYMMDD (a
>> date) so the ingest targeted one tablet. The ingest rate was over 150
>> million entries for about 19 hours. Everything seemed fine. Over 3.5
>> Billion entries were written. Then the nodes ran out of memory and
>> Accumulo nodes went dead. 90% of the server was lost. And data poofed
>> out of existence. Only 800M entries are visible now.
>>
>> We restarted the data node processes and the cluster has been running
>> garbage collection for over 2 days.
>>
>> I did not expect this simple approach to cause an issue. From looking
>> at the logs file, I think that at least two compactions were being run
>> while still ingested those 176 million entries per hour. The hold
>> times started rising and eventually the system simply ran out of
>> memory. I have no certainty about this explanation though.
>>
>> My current thinking is to re-initialize Accumulo and find some way to
>> programatically monitoring the hold time. The add a delay to the
>> ingest process whenever the hold time rises over 30 seconds. Does that
>> sound feasible?
>>
>> I know there are other approaches to ingest and I might give up this
>> method and use another. I was trying to get some kind of baseline for
>> analysis reasons with this approach.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB