Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Accumulo 1.4 Memory Issues


+
Matt Parker 2012-08-02, 17:16
+
Marc Parisi 2012-08-02, 17:25
+
Matt Parker 2012-08-02, 17:34
+
John Vines 2012-08-02, 17:44
+
Matt Parker 2012-08-04, 20:37
+
William Slacum 2012-08-02, 17:49
+
Matt Parker 2012-08-03, 15:04
+
Marc Parisi 2012-08-02, 17:53
+
Matt Parker 2012-08-02, 17:54
+
David Medinets 2012-08-03, 00:46
+
Josh Elser 2012-08-03, 02:02
Copy link to this message
-
Re: Accumulo 1.4 Memory Issues
Matt,

It's possible that calling flush on the BatchWriter after every add of a
small mutation could cause memory usage to spike and the java garbage
collector to fall behind. I'm not sure this is in our standard test cases.
There are two things we should do to move forwards:

1. We should try to get your application working given these constraints.
Is there a reason you're flushing the writer after every mutation? Can you
limit the calls to flush to groups of mutations? Usually flushing too
frequently it's an indicator that you're trying to do a read-modify-write
loop, and there is often a better performing alternative. If you do need to
flush frequently (e.g. maybe you're using cells as locks) then we'll
probably have to skip to #2.

2. We should try to build a test that replicates this problem in a
simplified environment. Can you share done source code that we can use to
build that test?

Adam
On Aug 2, 2012 1:54 PM, "Matt Parker" <[EMAIL PROTECTED]> wrote:

> I'm flushing/closing the writer after every small "transaction" (i.e. a
> nodes get updated or inserted, or a link record is moved
> (deleted/reinserted).
>
> The client will throw the Out of Memory error, but I can restart the
> client and rerun the same set of operatoins again. So I would assume the
> TServer seems to be uneffected.
>
> On Thu, Aug 2, 2012 at 1:49 PM, William Slacum <
> [EMAIL PROTECTED]> wrote:
>
>> Is it the TServer bombing out or your client or both? How often are you
>> flushing your writer?
>>
>>
>> On Thu, Aug 2, 2012 at 10:34 AM, Matt Parker <[EMAIL PROTECTED]>wrote:
>>
>>> for my small test case, I'm storing some basic data in three tables:
>>>
>>> nodes - spatial index (id, list of child nodes, whether it's a leaf node
>>> )
>>> image metadata - (id, bounding box coordinates, a text string of the
>>> bounding box)
>>> link - linking table that tells which images correspond to specific
>>> nodes.
>>>
>>> The image data isn't being stored in Accumulo, yet.
>>>
>>>
>>>
>>> On Thu, Aug 2, 2012 at 1:25 PM, Marc Parisi <[EMAIL PROTECTED]> wrote:
>>>
>>>> are you using native maps? if so, are they being used?
>>>>
>>>>
>>>> On Thu, Aug 2, 2012 at 1:16 PM, Matt Parker <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> I setup a single instance Accumulo server.
>>>>>
>>>>> I can load 32K rows of image metadata without issue.
>>>>>
>>>>> I have another set of routines that build a dynamic spatial index,
>>>>> where nodes are inserted/updated/deleted over time.
>>>>> These operations are typically done one at a time, where each
>>>>> batchwriter are closed after use.
>>>>>
>>>>> It loads maybe a couple hundred operations, and then it dies with an
>>>>> OutOfMemory error when trying to close a batchwriter.
>>>>>
>>>>> I tried uping the memery settings on my client and on the tserver, but
>>>>> the results were the same.
>>>>>
>>>>> Outside of Accumulo, I can build the whole index in memory without any
>>>>> special JVM memory settings. I was wondering whether anyone else had run
>>>>> into a similar issue?
>>>>>
>>>>
>>>>
>>>
>>
>
+
Matt Parker 2012-08-02, 17:51