Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - OutOfMemoryError: Java heap space after data load


Copy link to this message
-
Re: OutOfMemoryError: Java heap space after data load
Eric Newton 2013-04-30, 18:47
I've probably written more python than Java, so I understand. :-)

I've used Jython for scripting tests.  In unreleased versions (1.4.4 &
1.5.0) the Proxy will let you use the language of your choice.

-Eric

On Tue, Apr 30, 2013 at 2:43 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Hi Eric,
> Thanks for the info.  You've inspired me to dive into it in Java -- I had
> been using the accumulo shell because I had a python data generation script
> already in place and it was "faster" that way.  But if a small java program
> is going to be 35x "faster" than that, it makes no sense to bother with the
> shell!
>
> Thanks,
> Terry
>
>
> On Tue, Apr 30, 2013 at 11:01 AM, Eric Newton <[EMAIL PROTECTED]>wrote:
>
>> There's no need to flush... the shell is flushing after every single line.
>>
>> The flush you are invoking causes a minor compaction.
>>
>> If you wrote a quick java program to ingest the data, the data would load
>> about 35x faster.
>>
>> -Eric
>>
>>
>> On Mon, Apr 29, 2013 at 6:40 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>
>>> Perhaps having a configuration item to limit the size of the
>>> shell_history.txt file would help avoid this in future?
>>>
>>>
>>> On Mon, Apr 29, 2013 at 5:37 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>
>>>> You hit it John -- on the NameNode the shell_history.txt file is 128MB,
>>>> and same thing on the DataNode that 99% of the data went to due to the key
>>>> structure.  On the other two datanodes it was tiny, and both could login
>>>> fine (just my luck that the only datanode I tried after the load was the
>>>> fat one).
>>>>
>>>> So is --disable-tab-completion supposed to skip reading the
>>>> shell_history.txt file?  It appears that is not the case with 1.4.2 as it
>>>> still dies with OOM error.
>>>>
>>>> I now see that a better way to go would probably be to use
>>>> --execute-file switch to read the load file rather than pipe it to the
>>>> shell.  Correct?
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 29, 2013 at 5:04 PM, John Vines <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Depending on your answer to Eric's question, I wonder if your history
>>>>> is enough to blow it up. You may also want to check the size of
>>>>> ~/.accumulo/shell_history.txt and see if that is ginormous.
>>>>>
>>>>>
>>>>> On Mon, Apr 29, 2013 at 5:07 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Hi John,
>>>>>> I attempted to start the shell with --disable-tab-completion but it
>>>>>> still failed in an identical manner.  What is that feature/option?
>>>>>>
>>>>>> The ACCUMULO_OTHER_OPTS var was set to "-Xmx256m -Xms64m" via the 2GB
>>>>>> example config script.  I upped the -Xmx256m to 512m and the shell started
>>>>>> successfully, so thanks!
>>>>>>
>>>>>> What would cause the shell to need more than 256m of memory just to
>>>>>> start?  I'd like to understand how to determine an appropriate value to set
>>>>>> ACCUMULO_OTHER_OPTS to.
>>>>>>
>>>>>> Thanks,
>>>>>> Terry
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 29, 2013 at 2:21 PM, John Vines <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> The shell gets it's memory config from the accumulo-env file from
>>>>>>> ACCUMULO_OTHER_OPTS. If, for some reason, the value was low or there was a
>>>>>>> lot of data being loaded for the tab completion stuff in the shell, it
>>>>>>> could die. You can try upping that value in the file or try running the
>>>>>>> shell with "--disable-tab-completion" to see if that helps.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 29, 2013 at 3:02 PM, Terry P. <[EMAIL PROTECTED]>wrote:
>>>>>>>
>>>>>>>> Greetings folks,
>>>>>>>> I have stood up our 8-node Accumulo 1.4.2 cluster consisting of 3
>>>>>>>> ZooKeepers, 1 NameNode (also runs Accumulo Master, Monitor, and GC), and 3
>>>>>>>> DataNodes / TabletServers (Secondary NameNode with Alternate Accumulo
>>>>>>>> Master process will follow).  The initial config files were copied from the
>>>>>>>> 2GB/native-standalone directory.
>>>>>>>>
>>>>>>>> For a quick test I have a text file I generated to load 500,000
>