Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Estimating disk space requirements


Copy link to this message
-
Re: Estimating disk space requirements
I have been using AWS since quite sometime and I have
never faced any issue. Personally speaking, I found AWS
really flexible. You get a great deal of flexibility in choosing
services depending upon your requirements.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Jan 18, 2013 at 7:54 PM, Panshul Whisper <[EMAIL PROTECTED]>wrote:

> Thank you for the reply.
>
> It will be great if someone can suggest, if setting up my cluster on
> Rackspace is good or on Amazon using EC2 servers?
> keeping in mind Amazon services have been having a lot of downtimes...
> My main point of concern is performance and availablitiy.
> My cluster has to be very Highly Available.
>
> Thanks.
>
>
> On Fri, Jan 18, 2013 at 3:12 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> It all depend what you want to do with this data and the power of each
>> single node. There is no one size fit all rule.
>>
>> The more nodes you have, the more CPU power you will have to process
>> the data... But if you 80GB boxes CPUs are faster than your 40GB boxes
>> CPU ,maybe you should take the 80GB then.
>>
>> If you want to get better advices from the list, you will need to
>> beter define you needs and the nodes you can have.
>>
>> JM
>>
>> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>> > If we look at it with performance in mind,
>> > is it better to have 20 Nodes with 40 GB HDD
>> > or is it better to have 10 Nodes with 80 GB HDD?
>> >
>> > they are connected on a gigabit LAN
>> >
>> > Thnx
>> >
>> >
>> > On Fri, Jan 18, 2013 at 2:26 PM, Jean-Marc Spaggiari <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> 20 nodes with 40 GB will do the work.
>> >>
>> >> After that you will have to consider performances based on your access
>> >> pattern. But that's another story.
>> >>
>> >> JM
>> >>
>> >> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>> >> > Thank you for the replies,
>> >> >
>> >> > So I take it that I should have atleast 800 GB on total free space on
>> >> > HDFS.. (combined free space of all the nodes connected to the
>> cluster).
>> >> So
>> >> > I can connect 20 nodes having 40 GB of hdd on each node to my
>> cluster.
>> >> Will
>> >> > this be enough for the storage?
>> >> > Please confirm.
>> >> >
>> >> > Thanking You,
>> >> > Regards,
>> >> > Panshul.
>> >> >
>> >> >
>> >> > On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari <
>> >> > [EMAIL PROTECTED]> wrote:
>> >> >
>> >> >> Hi Panshul,
>> >> >>
>> >> >> If you have 20 GB with a replication factor set to 3, you have only
>> >> >> 6.6GB available, not 11GB. You need to divide the total space by the
>> >> >> replication factor.
>> >> >>
>> >> >> Also, if you store your JSon into HBase, you need to add the key
>> size
>> >> >> to it. If you key is 4 bytes, or 1024 bytes, it makes a difference.
>> >> >>
>> >> >> So roughly, 24 000 000 * 5 * 1024 = 114GB. You don't have the space
>> to
>> >> >> store it. Without including the key size. Even with a replication
>> >> >> factor set to 5 you don't have the space.
>> >> >>
>> >> >> Now, you can add some compression, but even with a lucky factor of
>> 50%
>> >> >> you still don't have the space. You will need something like 90%
>> >> >> compression factor to be able to store this data in your cluster.
>> >> >>
>> >> >> A 1T drive is now less than $100... So you might think about
>> replacing
>> >> >> you 20 GB drives by something bigger.
>> >> >> to reply to your last question, for your data here, you will need AT
>> >> >> LEAST 350GB overall storage. But that's a bare minimum. Don't go
>> under
>> >> >> 500GB.
>> >> >>
>> >> >> IMHO
>> >> >>
>> >> >> JM
>> >> >>
>> >> >> 2013/1/18, Panshul Whisper <[EMAIL PROTECTED]>:
>> >> >> > Hello,
>> >> >> >
>> >> >> > I was estimating how much disk space do I need for my cluster.
>> >> >> >
>> >> >> > I have 24 million JSON documents approx. 5kb each
>> >> >> > the Json is to be stored into HBASE with some identifying data in
>> >> >> coloumns
>> >> >> > and I also want to store the Json for later retrieval based on the