Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Re: [jira] [Created] (ACCUMULO-665) large values, complex iterator stacks, and RFile readers can consume a surprising amount of memory


Copy link to this message
-
Re: [jira] [Created] (ACCUMULO-665) large values, complex iterator stacks, and RFile readers can consume a surprising amount of memory
He's referring to something like the BooleanLogic iterator stack in the
Wikipedia example. It's a tree of user iterators that are merging streams
of key-value pairs together, so you end up getting many open readers and
possibly many RFile blocks spread out among many HDFS blocks concurrently.

On Sat, Jun 30, 2012 at 7:40 PM, David Medinets <[EMAIL PROTECTED]>wrote:

> How would you define complex iterator stack? Can you outline the elements?
> On Jun 29, 2012 5:19 PM, "Eric Newton (JIRA)" <[EMAIL PROTECTED]> wrote:
>
> > Eric Newton created ACCUMULO-665:
> > ------------------------------------
> >
> >              Summary: large values, complex iterator stacks, and RFile
> > readers can consume a surprising amount of memory
> >                  Key: ACCUMULO-665
> >                  URL: https://issues.apache.org/jira/browse/ACCUMULO-665
> >              Project: Accumulo
> >           Issue Type: Bug
> >           Components: tserver
> >     Affects Versions: 1.5.0, 1.4.0
> >          Environment: large cluster
> >             Reporter: Eric Newton
> >             Assignee: Eric Newton
> >             Priority: Minor
> >
> >
> > On a production cluster, with a complex iterator tree, a large value
> > (~350M) was causing a 4G tserver to fail with out-of-memory.
> >
> > There were several factors contributing to the problem:
> > # a bug: the query should not have been looking to the big data
> > # complex iterator tree, causing many copies of the data to be held at
> the
> > same time
> > # RFile doubles the buffer it uses to load values, and continues to use
> > that large buffer for future values
> >
> > This ticket is for the last point.  If we know we're not even going to
> > look at the value, we can read past it without storing it in memory.  It
> is
> > surprising that skipping past a large value would cause the server to run
> > out of memory, especially since it should fit into memory enough times to
> > be returned to the caller.
> >
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> > administrators:
> > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB