Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - OutOfMemoryError: Java heap space after data load


+
Terry P. 2013-04-29, 19:02
+
Eric Newton 2013-04-29, 19:32
+
Terry P. 2013-04-29, 22:30
+
John Vines 2013-04-29, 19:21
+
Terry P. 2013-04-29, 21:07
+
Terry P. 2013-04-29, 22:40
+
Eric Newton 2013-04-30, 16:01
+
Terry P. 2013-04-30, 18:43
+
Eric Newton 2013-04-30, 18:47
+
Terry P. 2013-04-30, 21:43
Copy link to this message
-
Re: OutOfMemoryError: Java heap space after data load
Eric Newton 2013-04-30, 23:07
:-) You're welcome.

The maximum twitter tweet rate to date is 33,388 tweets/second.

You ingested data at twice that rate.  Not bad.

-Eric

On Tue, Apr 30, 2013 at 5:43 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Eric, I'm really disappointed.  Rather than writing anything at all
> actually, I opted to run the RandomBatchWriter example program.
>
> It wasn't 35x faster.
>
> It was 52x faster.
>
> After all the excellent posts I've seen from you, I really expected a more
> precise guestimation from you.  ;-)
>
> Thanks for the gentle nudge to do better than python and the accumulo
> shell.  At a million rows inserted in 13 seconds, I'm certain the Accumulo
> cluster I've set up can certainly handle the 2-5K records per second max we
> expect to throw at it.
>
> Thanks again!
>
>
>
> On Tue, Apr 30, 2013 at 1:47 PM, Eric Newton <[EMAIL PROTECTED]>wrote:
>
>> I've probably written more python than Java, so I understand. :-)
>>
>> I've used Jython for scripting tests.  In unreleased versions (1.4.4 &
>> 1.5.0) the Proxy will let you use the language of your choice.
>>
>> -Eric
>>
>>
>>
>> On Tue, Apr 30, 2013 at 2:43 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Eric,
>>> Thanks for the info.  You've inspired me to dive into it in Java -- I
>>> had been using the accumulo shell because I had a python data generation
>>> script already in place and it was "faster" that way.  But if a small java
>>> program is going to be 35x "faster" than that, it makes no sense to bother
>>> with the shell!
>>>
>>> Thanks,
>>> Terry
>>>
>>>
>>> On Tue, Apr 30, 2013 at 11:01 AM, Eric Newton <[EMAIL PROTECTED]>wrote:
>>>
>>>> There's no need to flush... the shell is flushing after every single
>>>> line.
>>>>
>>>> The flush you are invoking causes a minor compaction.
>>>>
>>>> If you wrote a quick java program to ingest the data, the data would
>>>> load about 35x faster.
>>>>
>>>> -Eric
>>>>
>>>>
>>>> On Mon, Apr 29, 2013 at 6:40 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Perhaps having a configuration item to limit the size of the
>>>>> shell_history.txt file would help avoid this in future?
>>>>>
>>>>>
>>>>> On Mon, Apr 29, 2013 at 5:37 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> You hit it John -- on the NameNode the shell_history.txt file is
>>>>>> 128MB, and same thing on the DataNode that 99% of the data went to due to
>>>>>> the key structure.  On the other two datanodes it was tiny, and both could
>>>>>> login fine (just my luck that the only datanode I tried after the load was
>>>>>> the fat one).
>>>>>>
>>>>>> So is --disable-tab-completion supposed to skip reading the
>>>>>> shell_history.txt file?  It appears that is not the case with 1.4.2 as it
>>>>>> still dies with OOM error.
>>>>>>
>>>>>> I now see that a better way to go would probably be to use
>>>>>> --execute-file switch to read the load file rather than pipe it to the
>>>>>> shell.  Correct?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 29, 2013 at 5:04 PM, John Vines <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> Depending on your answer to Eric's question, I wonder if your
>>>>>>> history is enough to blow it up. You may also want to check the size of
>>>>>>> ~/.accumulo/shell_history.txt and see if that is ginormous.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 29, 2013 at 5:07 PM, Terry P. <[EMAIL PROTECTED]>wrote:
>>>>>>>
>>>>>>>> Hi John,
>>>>>>>> I attempted to start the shell with --disable-tab-completion but it
>>>>>>>> still failed in an identical manner.  What is that feature/option?
>>>>>>>>
>>>>>>>> The ACCUMULO_OTHER_OPTS var was set to "-Xmx256m -Xms64m" via the
>>>>>>>> 2GB example config script.  I upped the -Xmx256m to 512m and the shell
>>>>>>>> started successfully, so thanks!
>>>>>>>>
>>>>>>>> What would cause the shell to need more than 256m of memory just to
>>>>>>>> start?  I'd like to understand how to determine an appropriate value to set
>>>>>>>> ACCUMULO_OTHER_OPTS to.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Terry
>>>>>>>>
>>>>>>>>