Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> column based or row based storage for HBase?


Copy link to this message
-
Re: column based or row based storage for HBase?
In my understanding of column-oriented structure of hbase, the first
thing is the term column-oriented. The meaning is that the data which
belongs to the same column family stores continuously in the disk. For
each column-family, the data is stored as row store. If you want to
understand the internal mechnisam of HBase, you'd better take a look
at the content of HFile.

regards!

Yong

On Mon, Aug 6, 2012 at 5:03 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
> Thank you for the informative reply, Mohit!
>
> Some more comments,
>
> 1. actually my confusion about column based storage is from the book "HBase
> The Definitive Guide", chapter 1, section "the Dawn of Big Data", which
> draw a picture showing HBase store the same column of all different rows
> continuously physically in storage. Any comments?
>
> 2. I want to confirm my understanding is correct -- supposing I have only
> one column family with 10 columns, the physical storage is row (with all
> related columns) after row, other than store 1st column of all rows, then
> store 2nd columns of all rows, etc?
>
> 3. It seems when we say column based storage, there are two meanings, (1)
> column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
> where the same column of different rows stored together, (2) and column
> oriented architecture, e.g. how Hbase is designed, which is used to
> describe the pattern to store sparse, large number of columns (with NULL
> for free). Any comments?
>
> regards,
> Lin
>
> On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>>
>> > Hi guys,
>> >
>> > I am wondering whether HBase is using column based storage or row based
>> > storage?
>> >
>> >    - I read some technical documents and mentioned advantages of HBase is
>> >    using column based storage to store similar data together to foster
>> >    compression. So it means same columns of different rows are stored
>> > together;
>>
>>
>> Probably what you read was in context of Column Families. HBase has concept
>> of column family similar to Google's bigtable. And the store files on disk
>> is per column family. All columns of a given column family are in one store
>> file and columns of different column family is a different file.
>>
>>
>> >    - But I also learned HBase is a sorted key-value map in underlying
>> >    HFile. It uses key to address all related columns for that key (row),
>> > so it
>> >    seems to be a row based storage?
>> >
>> HBase stores entire row together along with columns represented by
>> KeyValue. This is also called cell in HBase.
>>
>>
>> > It is appreciated if anyone could clarify my confusions. Any related
>> > documents or code for more details are welcome.
>> >
>> > thanks in advance,
>> >
>> > Lin
>> >
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB