Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Disk Seeks and Column families


+
Praveen Sripati 2012-01-21, 07:08
Copy link to this message
-
Re: Disk Seeks and Column families
2012/1/21 Praveen Sripati <[EMAIL PROTECTED]>:
> Hi,
>
> 1) According to the this url (1), HBase performs well for two or three
> column families. Why is it so?

Frist, each column family stored in separate location, so, as stated in
'6.2.1. Cardinality of ColumnFamilies', such schema design can lead
to many small pieces for small column family and aggregate should
perform slowly.
Second, if region split, all column families will split too,
in case of large  number of them whis can be inefficient.
Third, related to number of memstores. Each column family
has it's own memstore, so it is more likely to hit forced flush
and bloсked writes.

>
> 2) Dump of a HFile, looks like below. The contents of a row stay together
> like a regular row-oriented database. If the column family has 100 column
> family qualifiers and is dense then the data for a particular column family
> qualifier is spread wide. If I want to do an aggregation on a particular
> column identifier, the disk seeks doesn't seems to be much better than a
> regular row-oriented database.

You don't need seeks for each column, hbase reads blocks and filter
out uneeded data.
And most pefromance gained from collocated keys and compression.
BTW, hbase is not so good in case of wide tables, hbase prefers tall tables.

>
> Please correct me if I am wrong.
>
> K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50
> K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50
> K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51
> K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51
> K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52
>
> (1) - http://hbase.apache.org/book/number.of.cfs.html
>
> Thanks,
> Praveen

--
Andrey.
+
Doug Meil 2012-01-21, 13:52
+
Doug Meil 2012-01-21, 15:16
+
Andrey Stepachev 2012-01-21, 18:58
+
yuzhihong@... 2012-01-21, 15:33
+
Praveen Sripati 2012-01-21, 17:49
+
Doug Meil 2012-01-21, 18:06
+
M. C. Srivas 2012-01-22, 06:32
+
Praveen Sripati 2012-01-24, 06:15
+
Andrey Stepachev 2012-01-24, 06:51
+
Andrey Stepachev 2012-01-24, 06:52
+
Jason Frantz 2012-01-24, 09:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB