Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Scan + Gets are disk bound


+
Rahul Ravindran 2013-06-04, 18:48
Copy link to this message
-
Re: Scan + Gets are disk bound
anil gupta 2013-06-05, 04:31
On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:

> Hi,
>
> We are relatively new to Hbase, and we are hitting a roadblock on our scan
> performance. I searched through the email archives and applied a bunch of
> the recommendations there, but they did not improve much. So, I am hoping I
> am missing something which you could guide me towards. Thanks in advance.
>
> We are currently writing data and reading in an almost continuous mode
> (stream of data written into an HBase table and then we run a time-based MR
> on top of this Table). We currently were backed up and about 1.5 TB of data
> was loaded into the table and we began performing time-based scan MRs in 10
> minute time intervals(startTime and endTime interval is 10 minutes). Most
> of the 10 minute interval had about 100 GB of data to process.
>
> Our workflow was to primarily eliminate duplicates from this table. We
> have  maxVersions = 5 for the table. We use TableInputFormat to perform the
> time-based scan to ensure data locality. In the mapper, we check if there
> exists a previous version of the row in a time period earlier to the
> timestamp of the input row. If not, we emit that row.
>
> We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
> turned off block cache for this table with the expectation that the block
> index and bloom filter will be cached in the block cache. We expect
> duplicates to be rare and hence hope for most of these checks to be
> fulfilled by the bloom filter. Unfortunately, we notice very slow
> performance on account of being disk bound. Looking at jstack, we notice
> that most of the time, we appear to be hitting disk for the block index. We
> performed a major compaction and retried and performance improved some, but
> not by much. We are processing data at about 2 MB per second.
>
>   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
> datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).

Anil: You dont have the right balance between disk,cpu and ram. You have
too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
to be the biggest reason of your problem.

> HBase is running with 30 GB Heap size, memstore values being capped at 3
> GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
> heap size(15 GB). We are using SNAPPY for our tables.
>
>
> A couple of questions:
>         * Is the performance of the time-based scan bad after a major
> compaction?
>
Anil: In general, TimeBased(i am assuming you have built your rowkey on
timestamp) scans are not good for HBase because of region hot-spotting.
Have you tried setting the ScannerCaching to a higher number?

>
>         * What can we do to help alleviate being disk bound? The typical
> answer of adding more RAM does not seem to have helped, or we are missing
> some other config
>
Anil: Try adding more disks to your machines.

>
>
>
> Below are some of the metrics from a Regionserver webUI:
>
> requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
> numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
> totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
> memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
> flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
> blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
> blockCacheHitCount=27592222, blockCacheMissCount=25373411,
> blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
> blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
> slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
> fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
> fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
> fsReadLatencyHistogram99th=100981301.2,
Thanks & Regards,
Anil Gupta
+
Rahul Ravindran 2013-06-05, 04:41
+
Anoop John 2013-06-05, 05:44
+
Rahul Ravindran 2013-06-05, 05:53
+
Asaf Mesika 2013-06-05, 05:51
+
Rahul Ravindran 2013-06-05, 06:15