Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> HDFS using SAN


Copy link to this message
-
Re: HDFS using SAN
On 18 October 2012 16:46, Zhani Pellumbi <[EMAIL PROTECTED]> wrote:

>  Yes, Isilon  NAS runs HDFS natively- thus your nodes become "compute"
> nodes, running only task tracker processes.
> I read the NetApp paper, and this is fundamentally different architecture
> though.
> There are some obvious benefits , being able to scale out your storage
> layer independently from your compute layer, also since Isilon contains a
> large number of our datasets, it allows us to analyze that data in place
> without ingesting it into a diff location.
> Also because of Isilons OneFS filesystem, your name node is distributed
> across the entire Isilon cluster.  However isilons documentation is lacking
> on this :(
> We are currently in the early stages of testing this architecture, and
> cannot accurately speak on the performance of one vs the other yet.
> I wonder if anyone else is using Isilon to run HDFS and can add some more
> details :)
>
>
That's an interesting article -though I came out confused.

Where it talks about "HDFS protocol", what I think it means is that you can
plug in the EMC filestore into Hadoop as a new filesystem, with a new URI
schema (as there is with hdfs:// , webhdfs:// s3n:// and others. I think so
-though sometimes the drawings seem to blur things.
The Hadoop Filesystem API is sub-posix, so very easy to implement a bridge
for. The basic file:// schema works well with any distributed filesystem
where you don't care about locality -presumably the SAN is there to handle
that.

I'm not going to criticise any of the paper because I don't have any
experience of isilion and don't want to fault it. What I will say is this:
I fear SAN failures. When it is up, it is up. And when it is down you may
as well go home for the day as you won't see your files until the SAN
vendor's support team comes round.

I do not have any data on how often SAN failures happen in the field -I
will merely point people at MSR-TR-2004-67 *TerraServer SAN-Cluster
Architecture and Operations Experience*  [Gray 2004] which look at the
architecture, availability and failure modes of a multi-PB SAN at microsoft
(from a different vendor, eight years ago, ...etc).

see also: http://wiki.apache.org/hadoop/SPOF

-steve