Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - column based or row based storage for HBase?


Copy link to this message
-
Re: column based or row based storage for HBase?
Mohit Anchlia 2012-08-06, 16:30
On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma <[EMAIL PROTECTED]> wrote:

> Thank you for the informative reply, Mohit!
>
> Some more comments,
>
> 1. actually my confusion about column based storage is from the book
> "HBase The Definitive Guide", chapter 1, section "the Dawn of Big Data",
> which draw a picture showing HBase store the same column of all different
> rows continuously physically in storage. Any comments?
>
> 2. I want to confirm my understanding is correct -- supposing I have only
> one column family with 10 columns, the physical storage is row (with all
> related columns) after row, other than store 1st column of all rows, then
> store 2nd columns of all rows, etc?
>
> 3. It seems when we say column based storage, there are two meanings, (1)
> column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
> where the same column of different rows stored together, (2) and column
> oriented architecture, e.g. how Hbase is designed, which is used to
> describe the pattern to store sparse, large number of columns (with NULL
> for free). Any comments?
>
>
In simple terms, HBase is not a column Oriented store. All the data for a
row is stored together but the store file is created only per column family.
> regards,
> Lin
>
>
> On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>>
>> > Hi guys,
>> >
>> > I am wondering whether HBase is using column based storage or row based
>> > storage?
>> >
>> >    - I read some technical documents and mentioned advantages of HBase
>> is
>> >    using column based storage to store similar data together to foster
>> >    compression. So it means same columns of different rows are stored
>> > together;
>>
>>
>> Probably what you read was in context of Column Families. HBase has
>> concept
>> of column family similar to Google's bigtable. And the store files on disk
>> is per column family. All columns of a given column family are in one
>> store
>> file and columns of different column family is a different file.
>>
>>
>> >    - But I also learned HBase is a sorted key-value map in underlying
>> >    HFile. It uses key to address all related columns for that key (row),
>> > so it
>> >    seems to be a row based storage?
>> >
>> HBase stores entire row together along with columns represented by
>> KeyValue. This is also called cell in HBase.
>>
>>
>> > It is appreciated if anyone could clarify my confusions. Any related
>> > documents or code for more details are welcome.
>> >
>> > thanks in advance,
>> >
>> > Lin
>> >
>>
>
>