Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> How to collect the real-time transaction request logs from HBase Master/Region Servers?


Copy link to this message
-
How to collect the real-time transaction request logs from HBase Master/Region Servers?
Dear All,

I am a newbie in HBase/Hadoop and recently have a small-scale setup in a
research cloud:
------------------------------------------
1 Master Server (Also Hadoop Name Node)
3 Region Server (Also Hadoop Data Node)
1 Ganglia Monitoring Server
1 YCSB Workload Generation Server
------------------------------------------
HBase Version: 0.94.7, r1471806
Hadoop Version: 1.0.4, r1393290
Ganglia Version: gmond/gmetad - 3.6.0, gweb - 3.5.8
YCSB Version: 0.1.4
------------------------------------------

I have only one table in HBase - 'usertable' with a single column family
'cf1' holding 1,000,000 key-value records. The row keys are in
monotonically increasing order and currently I have 6 regions distributed
in the 3 region servers each holding 2 of the regions.
*
*
*Objective:* create region hotspots for some research experiments

*Observation:*
After running a workload consist of a total 10,000,000 operations (50%
read, 50% write) I've observed the below statistics in the Web UI of the
master server which can suggest potential hotspots in the 3rd (not sure why
!!) and 6th regions (possibly it was receiving large number of write
requests).

Table Regions
 NameRegion ServerStart Key End KeyRequests
usertable,,1369584948241.3061b90ff519c1bce5b3d867690a2b4a. hdb1-02:60030
user2035146605813492656 127946
usertable,user2035146605813492656,1369584948241.00f8a51bab6d98ebd7c4db582579c3e7.
hdb1-03:60030user2035146605813492656user30679275375621809 126700
usertable,user30679275375621809,1369584813037.d704a50802ec39982884e394d4ef05b7.
hdb1-04:60030user30679275375621809user5136356049533495298
*284828*usertable,user5136356049533495298,1369584928780.999b987d646462e21b8916a737619b39.
hdb1-02:60030 user5136356049533495298user617761656465008158133108usertable,user617761656465008158,1369584928780.9cfe288f48f987de7f93b800dcd4c964.
hdb1-04:60030 user617761656465008158user7218407885253116621119008usertable,user7218407885253116621,1369584832152.e3a9c4d35c91f06c18ed346886ff3306.
hdb1-03:60030 user7218407885253116621*363234*

*Questions:*

   1. Can the HBase developer community guide me on how to collect the *raw
   logs* (directly from the master/region servers) for the above table
   which I've retrieved from the Master server?
   2. And how the master server is getting these logs from the region
   servers? As far I've understand from the architecture the client will
   directly communicate with the region servers to read/write the data
   bypassing the master server (unless the first time or if the region server
   is not responding)
   3. How frequently the master collects these logs? Is it real-time
   (within 1 sec interval !!)?
   4. Which HBase metrics will be most helpful to notice region hotspots
   from Ganglia?
I want to know which transaction request (read/write) going to which region
servers from the raw log dumps as like

No:12345 ---- Type:Write ---- Query ---- Region06
and so on ...
Many thanks again...
Regards,
Joarder Kamal
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB