Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> confused about Data/Disk ratio

Copy link to this message
转发: confused about Data/Disk ratio
Thank you for your reply


I set the factor =1 , that is ,no replication there , I use it for research


And I get an  observation , that is,

When you store a small number of data into hbase , hbase will use a huge
disk space, i.e., when hbase store 3million messages, which use 1GB disk as
text in linuxFS, it will use 10GB disk in hbase,

While  when you continue adding more data into hbase, hbase will use more
disk , but with less addition, i.e., when hbase continue to store 200million
message, which use 60GB disk as text in linuxFS , it will use 180GB disk in

And when you continue these addion procession, i.e., when hbase store 6
billion message , which use 2TB disk as text in linux FS , it will use 3TB
disk in hbase,


Do I make it clear,


And I want to know why hbase use 10GB when only 3million messages,

and why the usage of disk does not grow with linear ,

that is , it does not grow to 600GB when hbase store 200 million messages,

and it does not grow to 36TB when 6 billion message in hbase,


I know it is a good feature for hbase to store big data,

I want to know why,



Could you help me


Thank you


Guanhua Tian







发件人: varun kumar [mailto:[EMAIL PROTECTED]]
发送时间: 2013年1月21日 16:56
主题: Re: confused about Data/Disk ratio


Hi Tian,


What is replication factor you mention in hdfs.



Varun Kumar.P


On Mon, Jan 21, 2013 at 12:17 PM, tgh <[EMAIL PROTECTED]> wrote:

        I use hbase to store Data, and I have an observation, that is,
        When hbase store 1Gb data, hdfs use 10Gb disk space, and when data
is 60Gb, hdfs use 180Gb disk, and when data is about 2Tb, hdfs use 3Tb disk,

        That is, the ratio of data/disk is not a linear one, and why,

        Could you help me
Thank you
Guanhua Tian



Varun Kumar.P