Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Storing images in Hbase


+
Michael Segel 2013-01-11, 15:00
+
Mohammad Tariq 2013-01-11, 15:27
+
Mohit Anchlia 2013-01-11, 17:40
+
Jack Levin 2013-01-11, 17:47
+
Jack Levin 2013-01-11, 17:51
+
Mohit Anchlia 2013-01-13, 15:47
+
kavishahuja 2013-01-05, 10:11
+
谢良 2013-01-06, 03:58
+
Mohit Anchlia 2013-01-06, 05:45
+
谢良 2013-01-06, 06:14
+
Damien Hardy 2013-01-06, 09:35
+
Yusup Ashrap 2013-01-06, 11:58
+
Andrew Purtell 2013-01-06, 20:12
+
Asaf Mesika 2013-01-06, 20:28
+
Andrew Purtell 2013-01-06, 20:49
+
Andrew Purtell 2013-01-06, 20:52
+
Mohit Anchlia 2013-01-06, 21:09
+
Amandeep Khurana 2013-01-06, 20:33
+
Marcos Ortiz 2013-01-11, 18:01
+
Jack Levin 2013-01-13, 16:17
+
Varun Sharma 2013-01-17, 23:29
Copy link to this message
-
Re: Storing images in Hbase
I forgot to mention that I also have this setup:

<property>
  <name>hbase.hregion.memstore.flush.size</name>
  <value>33554432</value>
  <description>Flush more often. Default: 67108864</description>
</property>

This parameter works on per region amount, so this means if any of my
400 (currently) regions on a regionserver has 30MB+ in memstore, the
hbase will flush it to disk.
Here are some metrics from a regionserver:

requests=2, regions=370, stores=370, storefiles=1390,
storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0,
flushQueueSize=0, usedHeap=3516, maxHeap=4987,
blockCacheSize=790656256, blockCacheFree=255245888,
blockCacheCount=2436, blockCacheHitCount=218015828,
blockCacheMissCount=13514652, blockCacheEvictedCount=2561516,
blockCacheHitRatio=94, blockCacheHitCachingRatio=98

Note, that memstore is only 2G, this particular regionserver HEAP is set to 5G.

And last but not least, its very important to have good GC setup:

export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m
-XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \
-XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \
-XX:+UseParNewGC \
-XX:NewSize=128m -XX:MaxNewSize=128m \
-XX:-UseAdaptiveSizePolicy \
-XX:+CMSParallelRemarkEnabled \
-XX:-TraceClassUnloading
"

-Jack

On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
> Hey Jack,
>
> Thanks for the useful information. By flush size being 15 %, do you mean
> the memstore flush size ? 15 % would mean close to 1G, have you seen any
> issues with flushes taking too long ?
>
> Thanks
> Varun
>
> On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> wrote:
>
>> That's right, Memstore size , not flush size is increased.  Filesize is
>> 10G. Overall write cache is 60% of heap and read cache is 20%.  Flush size
>> is 15%.  64 maxlogs at 128MB. One namenode server, one secondary that can
>> be promoted.  On the way to hbase images are written to a queue, so that we
>> can take Hbase down for maintenance and still do inserts later.  ImageShack
>> has ‘perma cache’ servers that allows writes and serving of data even when
>> hbase is down for hours, consider it 4th replica 😉 outside of hadoop
>>
>> Jack
>>
>>  *From:* Mohit Anchlia <[EMAIL PROTECTED]>
>> *Sent:* ‎January‎ ‎13‎, ‎2013 ‎7‎:‎48‎ ‎AM
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Re: Storing images in Hbase
>>
>> Thanks Jack for sharing this information. This definitely makes sense when
>> using the type of caching layer. You mentioned about increasing write
>> cache, I am assuming you had to increase the following parameters in
>> addition to increase the memstore size:
>>
>> hbase.hregion.max.filesize
>> hbase.hregion.memstore.flush.size
>>
>> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote:
>>
>> > We buffer all accesses to HBASE with Varnish SSD based caching layer.
>> > So the impact for reads is negligible.  We have 70 node cluster, 8 GB
>> > of RAM per node, relatively weak nodes (intel core 2 duo), with
>> > 10-12TB per server of disks.  Inserting 600,000 images per day.  We
>> > have relatively little of compaction activity as we made our write
>> > cache much larger than read cache - so we don't experience region file
>> > fragmentation as much.
>> >
>> > -Jack
>> >
>> > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <[EMAIL PROTECTED]>
>> > wrote:
>> > > I think it really depends on volume of the traffic, data distribution
>> per
>> > > region, how and when files compaction occurs, number of nodes in the
>> > > cluster. In my experience when it comes to blob data where you are
>> > serving
>> > > 10s of thousand+ requests/sec writes and reads then it's very difficult
>> > to
>> > > manage HBase without very hard operations and maintenance in play. Jack
>> > > earlier mentioned they have 1 billion images, It would be interesting
>> to
>> > > know what they see in terms of compaction, no of requests per sec. I'd
+
Varun Sharma 2013-01-22, 01:10
+
Varun Sharma 2013-01-22, 01:12
+
Jack Levin 2013-01-24, 04:53
+
S Ahmed 2013-01-24, 22:13
+
Jack Levin 2013-01-25, 07:41
+
S Ahmed 2013-01-27, 02:00
+
Jack Levin 2013-01-27, 02:56
+
yiyu jia 2013-01-27, 15:37
+
Jack Levin 2013-01-27, 16:56
+
yiyu jia 2013-01-27, 21:58
+
Jack Levin 2013-01-28, 04:06
+
Jack Levin 2013-01-28, 04:16
+
Andrew Purtell 2013-01-28, 18:58
+
yiyu jia 2013-01-28, 20:23
+
Andrew Purtell 2013-01-28, 21:13
+
yiyu jia 2013-01-28, 21:44
+
Andrew Purtell 2013-01-28, 21:49
+
Adrien Mogenet 2013-01-28, 10:01
+
Jack Levin 2013-01-28, 18:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB