Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Estimating disk space requirements


+
Jean-Marc Spaggiari 2013-01-18, 14:12
+
Panshul Whisper 2013-01-18, 14:24
+
Panshul Whisper 2013-01-18, 22:30
+
Ted Dunning 2013-01-18, 22:59
Copy link to this message
-
Re: Estimating disk space requirements
ah now i understand what you mean.
I will be creating 20 individual servers on the cloud, and not create one
big server and make several virtual nodes inside it.
I will be paying for 20 different nodes.. all configured with hadoop and
connected to the cluster.

Thanx for the intel :)
On Fri, Jan 18, 2013 at 11:59 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> It is usually better to not subdivide nodes into virtual nodes.  You will
> generally get better performance form the original node because you only
> pay for the OS once and because your disk I/O will be scheduled better.
>
> If you look at EC2 pricing, however, the spot market often has arbitrage
> opportunities where one size node is absurdly cheap relative to others.  In
> that case, it pays to scale the individual nodes up or down.
>
> The only reasonable reason to split nodes to very small levels is for
> testing and training.
>
>
> On Fri, Jan 18, 2013 at 2:30 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote:
>
>> Thnx for the reply Ted,
>>
>> You can find 40 GB disks when u make virtual nodes on a cloud like
>> Rackspace ;-)
>>
>> About the os partitions I did not exactly understand what you meant.
>> I have made a server on the cloud.. And I just installed and configured
>> hadoop and hbase in the /use/local folder.
>> And I am pretty sure it does not have a separate partition for root.
>>
>> Please help me explain what u meant and what else precautions should I
>> take.
>>
>> Thanks,
>>
>> Regards,
>> Ouch Whisper
>> 01010101010
>> On Jan 18, 2013 11:11 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote:
>>
>>> Where do you find 40gb disks now a days?
>>>
>>> Normally your performance is going to be better with more space but your
>>> network may be your limiting factor for some computations.  That could give
>>> you some paradoxical scaling.  Hbase will rarely show this behavior.
>>>
>>> Keep in mind you also want to allow for an os partition. Current
>>> standard practice is to reserve as much as 100 GB for that partition but in
>>> your case 10gb better:-)
>>>
>>> Note that if you account for this, the node counts don't scale as
>>> simply.  The overhead of these os partitions goes up with number of nodes.
>>>
>>> On Jan 18, 2013, at 8:55 AM, Panshul Whisper <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>> If we look at it with performance in mind,
>>> is it better to have 20 Nodes with 40 GB HDD
>>> or is it better to have 10 Nodes with 80 GB HDD?
>>>
>>> they are connected on a gigabit LAN
>>>
>>> Thnx
>>>
>>>
>>> On Fri, Jan 18, 2013 at 2:26 PM, Jean-Marc Spaggiari <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> 20 nodes with 40 GB will do the work.
>>>>
>>>> After that you will have to consider performances based on your access
>>>> pattern. But that's another story.
>>>>
>>>> JM
>>>>
>>>> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>>>> > Thank you for the replies,
>>>> >
>>>> > So I take it that I should have atleast 800 GB on total free space on
>>>> > HDFS.. (combined free space of all the nodes connected to the
>>>> cluster). So
>>>> > I can connect 20 nodes having 40 GB of hdd on each node to my
>>>> cluster. Will
>>>> > this be enough for the storage?
>>>> > Please confirm.
>>>> >
>>>> > Thanking You,
>>>> > Regards,
>>>> > Panshul.
>>>> >
>>>> >
>>>> > On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari <
>>>> > [EMAIL PROTECTED]> wrote:
>>>> >
>>>> >> Hi Panshul,
>>>> >>
>>>> >> If you have 20 GB with a replication factor set to 3, you have only
>>>> >> 6.6GB available, not 11GB. You need to divide the total space by the
>>>> >> replication factor.
>>>> >>
>>>> >> Also, if you store your JSon into HBase, you need to add the key size
>>>> >> to it. If you key is 4 bytes, or 1024 bytes, it makes a difference.
>>>> >>
>>>> >> So roughly, 24 000 000 * 5 * 1024 = 114GB. You don't have the space
>>>> to
>>>> >> store it. Without including the key size. Even with a replication
>>>> >> factor set to 5 you don't have the space.
>>>> >>
>>>> >> Now, you can add some compression, but even with a lucky factor of
Regards,
Ouch Whisper
010101010101