Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: HDFS using SAN


+
Tom Deutsch 2012-10-17, 13:31
+
Pamecha, Abhishek 2012-10-18, 00:21
+
Luca Pireddu 2012-10-18, 12:32
+
Tom Deutsch 2012-10-18, 14:37
+
Pamecha, Abhishek 2012-10-18, 15:18
+
Jitendra Kumar Singh 2012-10-18, 13:48
+
Michael Segel 2012-10-18, 13:58
Copy link to this message
-
Re: HDFS using SAN
Pamecha, Abhishek 2012-10-18, 15:08
Yes, I had similar views from  the netapp paper.  My usecase is io heavy and that's why ( atleast IMO), when data set grows, a shared SAN begins to make less sense as opposed to DAS for MR type of jobs.

As Lucas pointed out, sharing the same data with other apps is a great adv. w SAN.

Thanks
Abhishek
i Sent from my iPad with iMstakes

On Oct 18, 2012, at 6:59, "Michael Segel" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

I haven't played with a NetApp box, but the way it has been explained to me is that your SAN appears as if its direct attached storage.
Its possible, based on drives and other hardware, plus it looks like they are focusing on read times only.

I'd contact a NetApp rep for a better answer.

Actually if you are looking at a higher density in terms of storage, going with a storage / compute cluster  makes sense.

On Oct 18, 2012, at 8:48 AM, Jitendra Kumar Singh <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Hi,

In the NetApp whitepaper on SAN solution (link given by Kevin) it makes following statement. Can someone please elaborate (or give a link that explains) how 12-disk in SAN can give 2000 IOPS while if used as JBOD would give 600 IOPS?

"The E2660 can deliver up to 2,000 IOPS
from a 12-disk stripe (the bottleneck being the 12 disks). This headroom translates into better read times
for those 64KB blocks. Twelve copies of 12 MapReduce jobs reading from 12 SATA disks can at best
never exceed 12 x 50 IOPS, or 600 IOPS. The E2660 volume has five times the IOPS headroom, which
translates into faster read times and high MapReduce throughput "

Thanks and Regards,
--
Jitendra Kumar Singh

On Thu, Oct 18, 2012 at 6:02 PM, Luca Pireddu <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
On 10/18/2012 02:21 AM, Pamecha, Abhishek wrote:
Tom

Do you mean you are using GPFS instead of HDFS? Also, if you can share,
are you deploying it as DAS set up or a SAN?

Thanks,

Abhishek

Though I don't think I'd buy a SAN for a new Hadoop cluster, we have a SAN and are using it *instead of HDFS* with a small/medium Hadoop MapReduce cluster (up to 100 nodes or so, depending on our need).  We still use the local node disks for intermediate data (mapred local storage).  Although this set-up does limit our possibility to scale to a large number of nodes, that's not a concern for us.  On the plus, we gain the flexibility to be able to share our cluster with non-Hadoop users at our centre.
--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
09010 Pula (CA), Italy
Tel: +39 0709250452
+
seth 2012-10-18, 15:15
+
Zhani Pellumbi 2012-10-18, 15:46
+
Steve Loughran 2012-10-19, 08:06
+
Pamecha, Abhishek 2012-10-19, 00:29
+
Pamecha, Abhishek 2012-10-16, 18:28
+
Jeffrey Buell 2012-10-16, 21:24
+
lohit 2012-10-16, 22:26
+
Pamecha, Abhishek 2012-10-16, 23:28
+
Kevin Odell 2012-10-17, 13:25
+
Mohamed Riadh Trad 2012-10-17, 13:37
+
Pamecha, Abhishek 2012-10-18, 00:26