Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> WAL - rate limiting factor x4.67


Copy link to this message
-
WAL - rate limiting factor x4.67
I've been trying to get the most out of streaming data into Accumulo 1.5 (Hadoop Cloudera CDH4). Having tried a number of settings, re-writing client code etc I finally switched off the Write Ahead Log (table.walog.enabled=false) and saw a huge leap in ingest performance. 

Ingest with table.walog.enabled= true:   ~6 MB/s
Ingest with table.walog.enabled= false:  ~28 MB/s
That is a factor of about x4.67 speed improvement. 

Now my use case could probably live without or work around not having a wal, but I wondered if this was a known issue?? 
(didn't see anything in jira), wal seem to be a significant rate limiter this is either endemic to Accumulo or an HDFS / setup issue. Though given everything is in HDFS these days and otherwise IO flies it looks like Accumulo WAL is the most likely culprit.   

I don't believe this to be an IO issue on the box, with wal off the is significantly more IO (up to 80M/s reported by dstat), with wal on (up to 12M/s reported by dstat). Testing the box with FIO sequential write is 160M/s. 

Further info: 
Hadoop 2.00 (Cloudera cdh4)
Accumulo (1.5.0)
Zookeeper ( with Netty, minor improvement of <1MB/s  )
Filesystem ( HDFS is ZFS, compression=on, dedup=on, otherwise ext4 )

With large imports from scratch now I start off CPU bound and as more shuffling is needed this becomes Disk bound later in the import as expected. So I know pre-splitting would probably sort it.

Tnx 

P
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB