Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: HDFS using SAN


+
Tom Deutsch 2012-10-17, 13:31
+
Pamecha, Abhishek 2012-10-18, 00:21
+
Luca Pireddu 2012-10-18, 12:32
+
Tom Deutsch 2012-10-18, 14:37
+
Pamecha, Abhishek 2012-10-18, 15:18
+
Jitendra Kumar Singh 2012-10-18, 13:48
+
Michael Segel 2012-10-18, 13:58
+
Pamecha, Abhishek 2012-10-18, 15:08
+
seth 2012-10-18, 15:15
+
Zhani Pellumbi 2012-10-18, 15:46
+
Steve Loughran 2012-10-19, 08:06
+
Pamecha, Abhishek 2012-10-19, 00:29
Hi

I have read scattered documentation across the net which mostly say HDFS doesn't go well with SAN being used to store data. While some say, it is an emerging trend. I would love to know if there have been any tests performed which hint on what aspects does a direct storage excels/falls behind a SAN.

We are investigating whether a direct storage option is better than a SAN storage for a modest cluster with data in 100 TBs in steady state. The SAN of course can support order of magnitude more of iops we care about for now, but given it is a shared infrastructure and we may expand our data size, it may not be an advantage in the future.

Another thing I am interested in: for MR jobs, where data locality is the key driver, how does that span out when using a SAN instead of direct storage?

And of course on the subjective topics of availability and reliability on using a SAN for data storage in HDFS, I would love to receive your views.

Thanks,
Abhishek

+
Jeffrey Buell 2012-10-16, 21:24
+
lohit 2012-10-16, 22:26
+
Pamecha, Abhishek 2012-10-16, 23:28
+
Kevin Odell 2012-10-17, 13:25
+
Mohamed Riadh Trad 2012-10-17, 13:37
+
Pamecha, Abhishek 2012-10-18, 00:26