Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> File Channel Capacity issue

Camp, Roy 2012-11-23, 19:58
Hari Shreedharan 2012-11-23, 21:14
Camp, Roy 2012-11-25, 23:08
Brock Noland 2012-11-25, 23:13
Camp, Roy 2012-11-26, 23:29
Copy link to this message
Re: File Channel Capacity issue


On Mon, Nov 26, 2012 at 5:29 PM, Camp, Roy <[EMAIL PROTECTED]> wrote:

>  Brock,****
> ** **
> I’m a bit confused by this.  Are you saying that after the FileChannel is
> full the events would be held in heap?

No. We write the events to disk, but we store a pointer to the event in
memory. Currently, in the worst case, you would consume 32 bytes per event
and as such the worst case memory consumption would be 32 events * channel

Note that is 32 bytes per event regardless of event size. The events could
be 10KB or 100KB but we would still only consume 32 bytes of memory. Still,
this overhead is higher than it need be. Currently in the worst case, we
store the pointer as an Integer and Long which consumes [(8 byte for the
object header + 4 bytes) + (8 bytes for the object header + 8 bytes) + 4
bytes fudge factor] = 32 bytes

I think we could improve on this by using a (1) primitive map as opposed to
a HashMap,  (2) by writing this data out to a separate file, or (3) by
keeping two checkpoint files. The (1) primitive map would give us an
immediate savings of about 16 bytes per event and be simple to implement
while (2) a separate file would save all of the 32 bytes but be complex to
implement. (3) Two checkpoints would saves us 32 bytes and gives us better
durability of the checkpoint data requiring fewer deletions of the
checkpoints and full replays of the WAL.