Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - Re: [jira] [Created] (ACCUMULO-665) large values, complex iterator stacks, and RFile readers can consume a surprising amount of memory


Copy link to this message
-
Re: [jira] [Created] (ACCUMULO-665) large values, complex iterator stacks, and RFile readers can consume a surprising amount of memory
William Slacum 2012-07-01, 00:00
He's referring to something like the BooleanLogic iterator stack in the
Wikipedia example. It's a tree of user iterators that are merging streams
of key-value pairs together, so you end up getting many open readers and
possibly many RFile blocks spread out among many HDFS blocks concurrently.

On Sat, Jun 30, 2012 at 7:40 PM, David Medinets <[EMAIL PROTECTED]>wrote:

> How would you define complex iterator stack? Can you outline the elements?
> On Jun 29, 2012 5:19 PM, "Eric Newton (JIRA)" <[EMAIL PROTECTED]> wrote:
>
> > Eric Newton created ACCUMULO-665:
> > ------------------------------------
> >
> >              Summary: large values, complex iterator stacks, and RFile
> > readers can consume a surprising amount of memory
> >                  Key: ACCUMULO-665
> >                  URL: https://issues.apache.org/jira/browse/ACCUMULO-665
> >              Project: Accumulo
> >           Issue Type: Bug
> >           Components: tserver
> >     Affects Versions: 1.5.0, 1.4.0
> >          Environment: large cluster
> >             Reporter: Eric Newton
> >             Assignee: Eric Newton
> >             Priority: Minor
> >
> >
> > On a production cluster, with a complex iterator tree, a large value
> > (~350M) was causing a 4G tserver to fail with out-of-memory.
> >
> > There were several factors contributing to the problem:
> > # a bug: the query should not have been looking to the big data
> > # complex iterator tree, causing many copies of the data to be held at
> the
> > same time
> > # RFile doubles the buffer it uses to load values, and continues to use
> > that large buffer for future values
> >
> > This ticket is for the last point.  If we know we're not even going to
> > look at the value, we can read past it without storing it in memory.  It
> is
> > surprising that skipping past a large value would cause the server to run
> > out of memory, especially since it should fit into memory enough times to
> > be returned to the caller.
> >
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> > administrators:
> > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
> >
>