Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - RE: EXTERNAL: Re: Failing Tablet Servers


+
Cardon, Tejay E 2012-09-20, 21:26
+
Jim Klucar 2012-09-20, 22:44
+
Cardon, Tejay E 2012-09-20, 22:50
+
Jim Klucar 2012-09-20, 22:56
+
Adam Fuchs 2012-09-20, 23:21
+
Cardon, Tejay E 2012-09-21, 14:12
+
John Vines 2012-09-21, 14:25
+
Cardon, Tejay E 2012-09-21, 14:35
Copy link to this message
-
Re: EXTERNAL: Re: Failing Tablet Servers
Jim Klucar 2012-09-21, 14:40
tejay,

Here's a good article about Java using native memory.

http://www.ibm.com/developerworks/linux/library/j-nativememory-linux/index.html#how

On Fri, Sep 21, 2012 at 10:35 AM, Cardon, Tejay E
<[EMAIL PROTECTED]> wrote:
> Gotcha.  So if I’m using java maps then my tserver_opts needs to be
> tserver.memory.maps + extra for the rest of the tserver because the memory
> map will be taken from the overall memory allocated to the tserver.  But if
> I’m using native maps, then I need far less tserver memory because the map
> memory is not deducted from the tserver.  Is that correct?
>
>
>
> Thanks,
> tejay
>
>
>
> From: John Vines [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 21, 2012 8:26 AM
> To: [EMAIL PROTECTED]
> Subject: Re: EXTERNAL: Re: Failing Tablet Servers
>
>
>
> memory.maps is what defines the size of the in memory map. When using native
> maps, that space does not come out of the heap size. But when using
> non-native maps, it comes out of the heap space.
>
> I think the issue Eric is trying to hit at is the fickleness of the java
> garbage collector. When you give a process that much heap, that's so much
> more data you can hold before you need to garbage collect. However, that
> also means when it does garbage collect, it's collecting a LOT more, which
> can result is poor performance.
>
> John
>
>
>
> On Fri, Sep 21, 2012 at 10:12 AM, Cardon, Tejay E <[EMAIL PROTECTED]>
> wrote:
>
> Jim, Eric, and Adam,
>
> Thanks.  It sounds like you’re all saying the same thing.  Originally I was
> doing each key/value as its own mutation, and it was blowing up much faster
> (probably due to the volume/overhead of the mutation objects themselves.
> I’ll try refactoring to break them up into something in-between.  My keys
> are small (<25 Bytes), and my values are empty, but I’ll aim for ~1,000
> key/values per mutation and see how that works out for me.
>
>
>
> Eric,
>
> I was under the impression that the memory.maps setting was not very
> important when using native maps.  Apparently I’m mistaken there.  What does
> this setting control when in a native map setting?  And, in general, what’s
> the proper balance between tserver_opts and tserver.memory.maps?
>
>
>
> With regards to the “Finished gathering information from 24 servers in 27.45
> seconds”  Do you have any recommendations for how to chase down the
> bottleneck?  I’m pretty sure I’m having GC issues, but I’m not sure what is
> causing them on the server side.  I’m sending a fairly small number of very
> large mutation objects, which I’d expect to be a moderate problem for the
> GC, but not a huge one..
>
>
>
> Thanks again to everyone for being so responsive and helpful.
>
>
>
> Tejay Cardon
>
>
>
>
>
> From: Eric Newton [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 21, 2012 8:03 AM
>
>
> To: [EMAIL PROTECTED]
> Subject: EXTERNAL: Re: Failing Tablet Servers
>
>
>
> A few items noted from your logs:
>
>
>
> tserver.memory.maps.max = 1G
>
>
>
> If you are giving your processes 10G, you might want to make the map larger,
> say 6G, and then reduce the JVM by 6G.
>
>
>
> Write-Ahead Log recovery complete for rz<;zw== (8 mutations applied, 8000000
> entries created)
>
>
>
> You are creating rows with 1M columns.  This is ok, but you might want to
> write them out more incrementally.
>
>
>
> WARN : Running low on memory
>
>
>
> That's pretty self-explanatory.  I'm guessing that the very large mutations
> are causing the tablet servers to run out of memory before they are held
> waiting for minor compactions.
>
>
>
> Finished gathering information from 24 servers in 27.45 seconds
>
>
>
> Something is running slow, probably due to GC thrashing.
>
>
>
> WARN : Lost servers [10.1.24.69:9997[139d46130344b98]]
>
>
>
> And there's a server crashing, probably due to an OOM condition.
>
>
>
> Send smaller mutations.  Maybe keep it to 200K column updates.  You can
> still have 1M wide rows, just send 5 mutations.
>
>
>
> -Eric
>
>
>
> On Thu, Sep 20, 2012 at 5:05 PM, Cardon, Tejay E <[EMAIL PROTECTED]>
+
Eric Newton 2012-09-21, 14:32
+
Cardon, Tejay E 2012-09-21, 14:50