Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Blocking Inserts


Copy link to this message
-
Re: Blocking Inserts
In your case, likely you are hitting the blocking store files
(hbase.hstore.blockingStoreFiles default:7) and/or
hbase.hregion.memstore.block.multiplier - check out
http://hbase.apache.org/book/config.files.html for more details on
this configurations and how they affect your insert performance.

On ganglia, also check whether you have a compaction queue spiking
during these timeouts.
--Suraj
On Thu, Jun 21, 2012 at 4:27 AM, Martin Alig <[EMAIL PROTECTED]> wrote:
> Thank you for the suggestions.
>
> So I changed the setup and now have:
> 1 Master running Namenode, SecondaryNamenode, ZK and the HMaster
> 7 Slaves running Datanode and Regionserver
> 2 Clients to insert data
>
>
> What I forgot in my first post, that sometimes the clients even get a
> SocketTimeOutException when inserting the data. (of course during that time
> 0 inserts are done)
> By looking at the logs, (I also turned on the gc logs) I see the following:
>
> Multiple consecutive entries like:
> 2012-06-21 11:42:13,962 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 6 on 60020' on region
> usertable,user600,1340200683555.a45b03dd65a62afa676488921e47dbaa.: memstore
> size 1.0g is >= than blocking 1.0g size
>
> Shortly after those entries, many entries like:
> 2012-06-21 12:43:53,028 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":35046,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2642a14d),
> rpc version=1, client version=29, methodsFingerPrint=-1508511443","client":"
> 10.110.129.12:54624
> ","starttimems":1340275397981,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
>
> Looking at the gc-logs, many entries like:
> 2870.329: [GC 2870.330: [ParNew: 108450K->3401K(118016K), 0.0182570 secs]
> 4184711K->4079843K(12569856K), 0.0183510 secs] [Times: user=0.24 sys=0.00,
> real=0.01 secs]
>
> But always arround 0.01 secs - 0.04secs.
>
> And also from the gc-log:
> 2696.013: [CMS-concurrent-sweep: 8.999/10.448 secs] [Times: user=46.93
> sys=2.24, real=10.45 secs]
>
> Is the 10.45 secs too long?
> Or what exactly should I watch out for in the gc logs?
>
>
> I also configured ganglia to have a look at some more metrics. Looking at
> io_wait (which should matter concerning my question to the disks), I can
> observe values between 10 % and 25 % on the regionserver.
> Should that be lower?
>
> Btw. I'm using HBase 0.94 and Hadoop 1.0.3.
>
>
> Thank you again.
>
>
> Martin
>
>
>
> On Wed, Jun 20, 2012 at 7:04 PM, Dave Wang <[EMAIL PROTECTED]> wrote:
>
>> I'd also remove the DN and RS from the node running ZK, NN, etc. as you
>> don't want heavweight processes on that node.
>>
>> - Dave
>>
>> On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark <[EMAIL PROTECTED]
>> >wrote:
>>
>> > Basically without metrics on what's going on it's tough to know for sure.
>> >
>> > I would turn on GC logging and make sure that is not playing a part, get
>> > metrics on IO while this is going on, and look through the logs to see
>> what
>> > is happening when you notice the pause.
>> >
>> > On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <[EMAIL PROTECTED]>
>> > wrote:
>> >
>> > > Hi
>> > >
>> > > I'm doing some evaluations with HBase. The workload I'm facing is
>> mainly
>> > > insert-only.
>> > > Currently I'm inserting 1KB rows, where 100Bytes go into one column.
>> > >
>> > > I have the following cluster machines at disposal:
>> > >
>> > > Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled)
>> > > 24 GiB Memory
>> > > 1 GigE
>> > > 2x 15k RPM Sas 73 GB (RAID1)
>> > >
>> > > I have 10 Nodes.
>> > > The first node runs:
>> > >
>> > > Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a
>> > > RegionServer
>> > >
>> > > The other nodes run:
>> > >
>> > > Datanode and RegionServer
>> > >
>> > >
>> > > Now running my test client and inserting rows, the throughput goes up
>> to
>> > > 150'000 inserts/sec. But then after some time the throughput drops down
>> > to
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB