Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Estimating disk space requirements


Copy link to this message
-
Re: Estimating disk space requirements
Thnx for the reply Ted,

You can find 40 GB disks when u make virtual nodes on a cloud like
Rackspace ;-)

About the os partitions I did not exactly understand what you meant.
I have made a server on the cloud.. And I just installed and configured
hadoop and hbase in the /use/local folder.
And I am pretty sure it does not have a separate partition for root.

Please help me explain what u meant and what else precautions should I take.

Thanks,

Regards,
Ouch Whisper
01010101010
On Jan 18, 2013 11:11 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote:

> Where do you find 40gb disks now a days?
>
> Normally your performance is going to be better with more space but your
> network may be your limiting factor for some computations.  That could give
> you some paradoxical scaling.  Hbase will rarely show this behavior.
>
> Keep in mind you also want to allow for an os partition. Current standard
> practice is to reserve as much as 100 GB for that partition but in your
> case 10gb better:-)
>
> Note that if you account for this, the node counts don't scale as simply.
>  The overhead of these os partitions goes up with number of nodes.
>
> On Jan 18, 2013, at 8:55 AM, Panshul Whisper <[EMAIL PROTECTED]>
> wrote:
>
> If we look at it with performance in mind,
> is it better to have 20 Nodes with 40 GB HDD
> or is it better to have 10 Nodes with 80 GB HDD?
>
> they are connected on a gigabit LAN
>
> Thnx
>
>
> On Fri, Jan 18, 2013 at 2:26 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> 20 nodes with 40 GB will do the work.
>>
>> After that you will have to consider performances based on your access
>> pattern. But that's another story.
>>
>> JM
>>
>> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>> > Thank you for the replies,
>> >
>> > So I take it that I should have atleast 800 GB on total free space on
>> > HDFS.. (combined free space of all the nodes connected to the cluster).
>> So
>> > I can connect 20 nodes having 40 GB of hdd on each node to my cluster.
>> Will
>> > this be enough for the storage?
>> > Please confirm.
>> >
>> > Thanking You,
>> > Regards,
>> > Panshul.
>> >
>> >
>> > On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> Hi Panshul,
>> >>
>> >> If you have 20 GB with a replication factor set to 3, you have only
>> >> 6.6GB available, not 11GB. You need to divide the total space by the
>> >> replication factor.
>> >>
>> >> Also, if you store your JSon into HBase, you need to add the key size
>> >> to it. If you key is 4 bytes, or 1024 bytes, it makes a difference.
>> >>
>> >> So roughly, 24 000 000 * 5 * 1024 = 114GB. You don't have the space to
>> >> store it. Without including the key size. Even with a replication
>> >> factor set to 5 you don't have the space.
>> >>
>> >> Now, you can add some compression, but even with a lucky factor of 50%
>> >> you still don't have the space. You will need something like 90%
>> >> compression factor to be able to store this data in your cluster.
>> >>
>> >> A 1T drive is now less than $100... So you might think about replacing
>> >> you 20 GB drives by something bigger.
>> >> to reply to your last question, for your data here, you will need AT
>> >> LEAST 350GB overall storage. But that's a bare minimum. Don't go under
>> >> 500GB.
>> >>
>> >> IMHO
>> >>
>> >> JM
>> >>
>> >> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>> >> > Hello,
>> >> >
>> >> > I was estimating how much disk space do I need for my cluster.
>> >> >
>> >> > I have 24 million JSON documents approx. 5kb each
>> >> > the Json is to be stored into HBASE with some identifying data in
>> >> coloumns
>> >> > and I also want to store the Json for later retrieval based on the Id
>> >> data
>> >> > as keys in Hbase.
>> >> > I have my HDFS replication set to 3
>> >> > each node has Hadoop and hbase and Ubuntu installed on it.. so approx
>> >> > 11
>> >> GB
>> >> > is available for use on my 20 GB node.
>> >> >
>> >> > I have no idea, if I have not enabled Hbase replication, is the HDFS