Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> OutOfMemoryError: Java heap space after data load


Copy link to this message
-
Re: OutOfMemoryError: Java heap space after data load
Eric, I'm really disappointed.  Rather than writing anything at all
actually, I opted to run the RandomBatchWriter example program.

It wasn't 35x faster.

It was 52x faster.

After all the excellent posts I've seen from you, I really expected a more
precise guestimation from you.  ;-)

Thanks for the gentle nudge to do better than python and the accumulo
shell.  At a million rows inserted in 13 seconds, I'm certain the Accumulo
cluster I've set up can certainly handle the 2-5K records per second max we
expect to throw at it.

Thanks again!

On Tue, Apr 30, 2013 at 1:47 PM, Eric Newton <[EMAIL PROTECTED]> wrote:

> I've probably written more python than Java, so I understand. :-)
>
> I've used Jython for scripting tests.  In unreleased versions (1.4.4 &
> 1.5.0) the Proxy will let you use the language of your choice.
>
> -Eric
>
>
>
> On Tue, Apr 30, 2013 at 2:43 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>
>> Hi Eric,
>> Thanks for the info.  You've inspired me to dive into it in Java -- I had
>> been using the accumulo shell because I had a python data generation script
>> already in place and it was "faster" that way.  But if a small java program
>> is going to be 35x "faster" than that, it makes no sense to bother with the
>> shell!
>>
>> Thanks,
>> Terry
>>
>>
>> On Tue, Apr 30, 2013 at 11:01 AM, Eric Newton <[EMAIL PROTECTED]>wrote:
>>
>>> There's no need to flush... the shell is flushing after every single
>>> line.
>>>
>>> The flush you are invoking causes a minor compaction.
>>>
>>> If you wrote a quick java program to ingest the data, the data would
>>> load about 35x faster.
>>>
>>> -Eric
>>>
>>>
>>> On Mon, Apr 29, 2013 at 6:40 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>
>>>> Perhaps having a configuration item to limit the size of the
>>>> shell_history.txt file would help avoid this in future?
>>>>
>>>>
>>>> On Mon, Apr 29, 2013 at 5:37 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> You hit it John -- on the NameNode the shell_history.txt file is
>>>>> 128MB, and same thing on the DataNode that 99% of the data went to due to
>>>>> the key structure.  On the other two datanodes it was tiny, and both could
>>>>> login fine (just my luck that the only datanode I tried after the load was
>>>>> the fat one).
>>>>>
>>>>> So is --disable-tab-completion supposed to skip reading the
>>>>> shell_history.txt file?  It appears that is not the case with 1.4.2 as it
>>>>> still dies with OOM error.
>>>>>
>>>>> I now see that a better way to go would probably be to use
>>>>> --execute-file switch to read the load file rather than pipe it to the
>>>>> shell.  Correct?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 29, 2013 at 5:04 PM, John Vines <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Depending on your answer to Eric's question, I wonder if your history
>>>>>> is enough to blow it up. You may also want to check the size of
>>>>>> ~/.accumulo/shell_history.txt and see if that is ginormous.
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 29, 2013 at 5:07 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> Hi John,
>>>>>>> I attempted to start the shell with --disable-tab-completion but it
>>>>>>> still failed in an identical manner.  What is that feature/option?
>>>>>>>
>>>>>>> The ACCUMULO_OTHER_OPTS var was set to "-Xmx256m -Xms64m" via the
>>>>>>> 2GB example config script.  I upped the -Xmx256m to 512m and the shell
>>>>>>> started successfully, so thanks!
>>>>>>>
>>>>>>> What would cause the shell to need more than 256m of memory just to
>>>>>>> start?  I'd like to understand how to determine an appropriate value to set
>>>>>>> ACCUMULO_OTHER_OPTS to.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Terry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 29, 2013 at 2:21 PM, John Vines <[EMAIL PROTECTED]>wrote:
>>>>>>>
>>>>>>>> The shell gets it's memory config from the accumulo-env file from
>>>>>>>> ACCUMULO_OTHER_OPTS. If, for some reason, the value was low or there was a
>>>>>>>> lot of data being loaded for the tab completion stuff in the shell, it
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB