Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - RE: EXTERNAL: Re: Failing Tablet Servers


+
Cardon, Tejay E 2012-09-20, 21:26
+
Jim Klucar 2012-09-20, 22:44
+
Cardon, Tejay E 2012-09-20, 22:50
+
Jim Klucar 2012-09-20, 22:56
+
Adam Fuchs 2012-09-20, 23:21
+
Cardon, Tejay E 2012-09-21, 14:12
Copy link to this message
-
Re: EXTERNAL: Re: Failing Tablet Servers
John Vines 2012-09-21, 14:25
memory.maps is what defines the size of the in memory map. When using
native maps, that space does not come out of the heap size. But when using
non-native maps, it comes out of the heap space.

I think the issue Eric is trying to hit at is the fickleness of the java
garbage collector. When you give a process that much heap, that's so much
more data you can hold before you need to garbage collect. However, that
also means when it does garbage collect, it's collecting a LOT more, which
can result is poor performance.

John

On Fri, Sep 21, 2012 at 10:12 AM, Cardon, Tejay E
<[EMAIL PROTECTED]>wrote:

>  Jim, Eric, and Adam,****
>
> Thanks.  It sounds like you’re all saying the same thing.  Originally I
> was doing each key/value as its own mutation, and it was blowing up much
> faster (probably due to the volume/overhead of the mutation objects
> themselves.  I’ll try refactoring to break them up into something
> in-between.  My keys are small (<25 Bytes), and my values are empty, but
> I’ll aim for ~1,000 key/values per mutation and see how that works out for
> me.****
>
> ** **
>
> Eric,****
>
> I was under the impression that the memory.maps setting was not very
> important when using native maps.  Apparently I’m mistaken there.  What
> does this setting control when in a native map setting?  And, in general,
> what’s the proper balance between tserver_opts and tserver.memory.maps?***
> *
>
> ** **
>
> With regards to the “Finished gathering information from 24 servers in
> 27.45 seconds”  Do you have any recommendations for how to chase down the
> bottleneck?  I’m pretty sure I’m having GC issues, but I’m not sure what is
> causing them on the server side.  I’m sending a fairly small number of very
> large mutation objects, which I’d expect to be a moderate problem for the
> GC, but not a huge one..****
>
> ** **
>
> Thanks again to everyone for being so responsive and helpful.****
>
> ** **
>
> Tejay Cardon****
>
> ** **
>
> ** **
>
> *From:* Eric Newton [mailto:[EMAIL PROTECTED]]
> *Sent:* Friday, September 21, 2012 8:03 AM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* EXTERNAL: Re: Failing Tablet Servers****
>
>  ** **
>
> A few items noted from your logs:****
>
> ** **
>
> tserver.memory.maps.max = 1G****
>
>  ** **
>
> If you are giving your processes 10G, you might want to make the map
> larger, say 6G, and then reduce the JVM by 6G.****
>
> ** **
>
> Write-Ahead Log recovery complete for rz<;zw== (8 mutations applied,
> 8000000 entries created)****
>
>  ** **
>
> You are creating rows with 1M columns.  This is ok, but you might want to
> write them out more incrementally.****
>
> ** **
>
> WARN : Running low on memory****
>
>  ** **
>
> That's pretty self-explanatory.  I'm guessing that the very large
> mutations are causing the tablet servers to run out of memory before they
> are held waiting for minor compactions.****
>
> ** **
>
> Finished gathering information from 24 servers in 27.45 seconds****
>
>  ** **
>
> Something is running slow, probably due to GC thrashing.****
>
> ** **
>
> WARN : Lost servers [10.1.24.69:9997[139d46130344b98]]****
>
>  ** **
>
> And there's a server crashing, probably due to an OOM condition.****
>
> ** **
>
> Send smaller mutations.  Maybe keep it to 200K column updates.  You can
> still have 1M wide rows, just send 5 mutations.****
>
> ** **
>
> -Eric****
>
> ** **
>
> On Thu, Sep 20, 2012 at 5:05 PM, Cardon, Tejay E <[EMAIL PROTECTED]>
> wrote:****
>
> I’m seeing some strange behavior on a moderate (30 node) cluster.  I’ve
> got 27 tablet servers on large dell servers with 30GB of memory each.  I’ve
> set the TServer_OPTS to give them each 10G of memory.  I’m running an
> ingest process that uses AccumuloInputFormat in a MapReduce job to write
> 1,000 rows with each row containing ~1,000,000 columns in 160,000
> families.  The MapReduce initially runs quite quickly and I can see the
> ingest rate peak on the  monitor page.  However, after about 30 seconds of
> high ingest, the ingest falls to 0.  It then stalls out and my map task are
+
Cardon, Tejay E 2012-09-21, 14:35
+
Jim Klucar 2012-09-21, 14:40
+
Eric Newton 2012-09-21, 14:32
+
Cardon, Tejay E 2012-09-21, 14:50