Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> OutOfMemoryError: Java heap space after data load


+
Terry P. 2013-04-29, 19:02
+
Eric Newton 2013-04-29, 19:32
+
Terry P. 2013-04-29, 22:30
+
John Vines 2013-04-29, 19:21
+
Terry P. 2013-04-29, 21:07
+
Terry P. 2013-04-29, 22:40
Copy link to this message
-
Re: OutOfMemoryError: Java heap space after data load
There's no need to flush... the shell is flushing after every single line.

The flush you are invoking causes a minor compaction.

If you wrote a quick java program to ingest the data, the data would load
about 35x faster.

-Eric
On Mon, Apr 29, 2013 at 6:40 PM, Terry P. <[EMAIL PROTECTED]> wrote:

> Perhaps having a configuration item to limit the size of the
> shell_history.txt file would help avoid this in future?
>
>
> On Mon, Apr 29, 2013 at 5:37 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>
>> You hit it John -- on the NameNode the shell_history.txt file is 128MB,
>> and same thing on the DataNode that 99% of the data went to due to the key
>> structure.  On the other two datanodes it was tiny, and both could login
>> fine (just my luck that the only datanode I tried after the load was the
>> fat one).
>>
>> So is --disable-tab-completion supposed to skip reading the
>> shell_history.txt file?  It appears that is not the case with 1.4.2 as it
>> still dies with OOM error.
>>
>> I now see that a better way to go would probably be to use --execute-file
>> switch to read the load file rather than pipe it to the shell.  Correct?
>>
>>
>>
>>
>> On Mon, Apr 29, 2013 at 5:04 PM, John Vines <[EMAIL PROTECTED]> wrote:
>>
>>> Depending on your answer to Eric's question, I wonder if your history is
>>> enough to blow it up. You may also want to check the size of
>>> ~/.accumulo/shell_history.txt and see if that is ginormous.
>>>
>>>
>>> On Mon, Apr 29, 2013 at 5:07 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi John,
>>>> I attempted to start the shell with --disable-tab-completion but it
>>>> still failed in an identical manner.  What is that feature/option?
>>>>
>>>> The ACCUMULO_OTHER_OPTS var was set to "-Xmx256m -Xms64m" via the 2GB
>>>> example config script.  I upped the -Xmx256m to 512m and the shell started
>>>> successfully, so thanks!
>>>>
>>>> What would cause the shell to need more than 256m of memory just to
>>>> start?  I'd like to understand how to determine an appropriate value to set
>>>> ACCUMULO_OTHER_OPTS to.
>>>>
>>>> Thanks,
>>>> Terry
>>>>
>>>>
>>>>
>>>> On Mon, Apr 29, 2013 at 2:21 PM, John Vines <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> The shell gets it's memory config from the accumulo-env file from
>>>>> ACCUMULO_OTHER_OPTS. If, for some reason, the value was low or there was a
>>>>> lot of data being loaded for the tab completion stuff in the shell, it
>>>>> could die. You can try upping that value in the file or try running the
>>>>> shell with "--disable-tab-completion" to see if that helps.
>>>>>
>>>>>
>>>>> On Mon, Apr 29, 2013 at 3:02 PM, Terry P. <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Greetings folks,
>>>>>> I have stood up our 8-node Accumulo 1.4.2 cluster consisting of 3
>>>>>> ZooKeepers, 1 NameNode (also runs Accumulo Master, Monitor, and GC), and 3
>>>>>> DataNodes / TabletServers (Secondary NameNode with Alternate Accumulo
>>>>>> Master process will follow).  The initial config files were copied from the
>>>>>> 2GB/native-standalone directory.
>>>>>>
>>>>>> For a quick test I have a text file I generated to load 500,000 rows
>>>>>> of sample data using the Accumulo shell.  For lack of a better place to run
>>>>>> it this first time, I ran it on the NameNode.  The script performs flushes
>>>>>> every 10,000 records (about 30,000 entries).  After the load finished, when
>>>>>> I attempt to login to the Accumulo Shell on the NameNode, I get the error:
>>>>>>
>>>>>> [root@edib-namenode ~]# /usr/lib/accumulo/bin/accumulo shell -u
>>>>>> $AUSER -p $AUSERPWD
>>>>>> #
>>>>>> # java.lang.OutOfMemoryError: Java heap space
>>>>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>>>> #   Executing /bin/sh -c "kill -9 24899"...
>>>>>> Killed
>>>>>>
>>>>>> The performance of that test was pretty poor at about 160/second
>>>>>> (somewhat expected, as it was just one thread) so to keep moving I
>>>>>> generated 3 different load files and ran one on each of the 3 DataNodes /
>>>>>> TabletServers.  Performance was much better, sustaining 1,400 per second.
+
Terry P. 2013-04-30, 18:43
+
Eric Newton 2013-04-30, 18:47
+
Terry P. 2013-04-30, 21:43
+
Eric Newton 2013-04-30, 23:07
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB