Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Estimating disk space requirements


+
Panshul Whisper 2013-01-18, 12:11
+
Mirko Kämpf 2013-01-18, 12:44
+
Jean-Marc Spaggiari 2013-01-18, 12:36
+
Panshul Whisper 2013-01-18, 13:20
+
Jean-Marc Spaggiari 2013-01-18, 13:26
+
Panshul Whisper 2013-01-18, 13:55
+
Mohammad Tariq 2013-01-18, 14:37
+
Mohammad Tariq 2013-01-18, 22:35
Copy link to this message
-
Re: Estimating disk space requirements
If you make 20 individual small servers, that isn't much different from 20
from one server.  The only difference would be if the neighbors of the
separate VMs use less resource.

On Fri, Jan 18, 2013 at 3:34 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote:

> ah now i understand what you mean.
> I will be creating 20 individual servers on the cloud, and not create one
> big server and make several virtual nodes inside it.
> I will be paying for 20 different nodes.. all configured with hadoop and
> connected to the cluster.
>
> Thanx for the intel :)
>
>
> On Fri, Jan 18, 2013 at 11:59 PM, Ted Dunning <[EMAIL PROTECTED]>wrote:
>
>> It is usually better to not subdivide nodes into virtual nodes.  You will
>> generally get better performance form the original node because you only
>> pay for the OS once and because your disk I/O will be scheduled better.
>>
>> If you look at EC2 pricing, however, the spot market often has arbitrage
>> opportunities where one size node is absurdly cheap relative to others.  In
>> that case, it pays to scale the individual nodes up or down.
>>
>> The only reasonable reason to split nodes to very small levels is for
>> testing and training.
>>
>>
>> On Fri, Jan 18, 2013 at 2:30 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote:
>>
>>> Thnx for the reply Ted,
>>>
>>> You can find 40 GB disks when u make virtual nodes on a cloud like
>>> Rackspace ;-)
>>>
>>> About the os partitions I did not exactly understand what you meant.
>>> I have made a server on the cloud.. And I just installed and configured
>>> hadoop and hbase in the /use/local folder.
>>> And I am pretty sure it does not have a separate partition for root.
>>>
>>> Please help me explain what u meant and what else precautions should I
>>> take.
>>>
>>> Thanks,
>>>
>>> Regards,
>>> Ouch Whisper
>>> 01010101010
>>> On Jan 18, 2013 11:11 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote:
>>>
>>>> Where do you find 40gb disks now a days?
>>>>
>>>> Normally your performance is going to be better with more space but
>>>> your network may be your limiting factor for some computations.  That could
>>>> give you some paradoxical scaling.  Hbase will rarely show this behavior.
>>>>
>>>> Keep in mind you also want to allow for an os partition. Current
>>>> standard practice is to reserve as much as 100 GB for that partition but in
>>>> your case 10gb better:-)
>>>>
>>>> Note that if you account for this, the node counts don't scale as
>>>> simply.  The overhead of these os partitions goes up with number of nodes.
>>>>
>>>> On Jan 18, 2013, at 8:55 AM, Panshul Whisper <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>> If we look at it with performance in mind,
>>>> is it better to have 20 Nodes with 40 GB HDD
>>>> or is it better to have 10 Nodes with 80 GB HDD?
>>>>
>>>> they are connected on a gigabit LAN
>>>>
>>>> Thnx
>>>>
>>>>
>>>> On Fri, Jan 18, 2013 at 2:26 PM, Jean-Marc Spaggiari <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> 20 nodes with 40 GB will do the work.
>>>>>
>>>>> After that you will have to consider performances based on your access
>>>>> pattern. But that's another story.
>>>>>
>>>>> JM
>>>>>
>>>>> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>>>>> > Thank you for the replies,
>>>>> >
>>>>> > So I take it that I should have atleast 800 GB on total free space on
>>>>> > HDFS.. (combined free space of all the nodes connected to the
>>>>> cluster). So
>>>>> > I can connect 20 nodes having 40 GB of hdd on each node to my
>>>>> cluster. Will
>>>>> > this be enough for the storage?
>>>>> > Please confirm.
>>>>> >
>>>>> > Thanking You,
>>>>> > Regards,
>>>>> > Panshul.
>>>>> >
>>>>> >
>>>>> > On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari <
>>>>> > [EMAIL PROTECTED]> wrote:
>>>>> >
>>>>> >> Hi Panshul,
>>>>> >>
>>>>> >> If you have 20 GB with a replication factor set to 3, you have only
>>>>> >> 6.6GB available, not 11GB. You need to divide the total space by the
>>>>> >> replication factor.
>>>>> >>
>>>>> >> Also, if you store your JSon into HBase, you need to add the key
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB