Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Alternative distributed filesystem.


Copy link to this message
-
Re: Alternative distributed filesystem.
Hey Dmitry,

As Mike states, I think HDFS is a great fit for your use case. I have never
deployed any of the below systems into production, but I have seen some
complaints about the stability of GlusterFS (e.g.
http://gluster.org/pipermail/gluster-users/2009-October/003193.html), and
Lustre can be complex to set up and maintain. If you already have HDFS
expertise in house, you'll probably be fine with FUSE and HDFS.

Regards,
Jeff

On Fri, Nov 13, 2009 at 2:12 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> If you are looking for large distributed file system with posix locking
> look at:
>
> glusterfs
> lusterfs
> ocfs2
> redhat GFS
>
> Edward
> On Fri, Nov 13, 2009 at 5:07 PM, Michael Thomas <[EMAIL PROTECTED]>
> wrote:
> > Hi Dmitry,
> >
> > I still stand by my original statement.  We do use fuse_dfs for reading
> data
> > on all of the worker nodes.  We don't use it much for writing data, but
> only
> > because our project's data model was never designed to use a posix
> > filesystem for writing data, only reading.
> >
> > --Mike
> >
> > On 11/13/2009 02:04 PM, Dmitry Pushkarev wrote:
> >>
> >> Mike,
> >>
> >> I guess what I said referred to use of fuse_hdfs as general solution. If
> >> we
> >> were to use native APIs that'd be perfect. But we basically need to
> mount
> >> is
> >> as a place where programs can simultaneously dump large amounts of data.
> >>
> >> -----Original Message-----
> >> From: Michael Thomas [mailto:[EMAIL PROTECTED]]
> >> Sent: Friday, November 13, 2009 2:00 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: Alternative distributed filesystem.
> >>
> >> On 11/13/2009 01:56 PM, Dmitry Pushkarev wrote:
> >>>
> >>> Dear Hadoop users,
> >>>
> >>>
> >>>
> >>> One of our hadoop clusters is being converted to SGE to run very
> specific
> >>> application and we're thinking about how to utilize these huge
> >>> hard-drives
> >>> that are there. Since there will be no hadoop installed on these nodes
> >>
> >> we're
> >>>
> >>> looking for alternative distributed filesystem that will have decent
> >>> concurrent read/write performance (compared to HDFS) for large amounts
> of
> >>> data. Using single filestorage - like NAS RAID arrays proved to be very
> >>> ineffective when someone is pushing gigabytes of data on them.
> >>>
> >>>
> >>>
> >>> What other systems can we look at? We would like that FS to be mounted
> on
> >>> every node, open source, hopefully we'd like to have POSIX compliance
> and
> >>> decent random access performance (yet it isn't critical).
> >>>
> >>> HDFS doesn't fit the bill because mounting it via fuse_dfs and using
> >>
> >> without
> >>>
> >>> any mapred jobs (i.e. data will typically be pushed from 1-2 nodes at
> >>> most
> >>> at different times) seems slightly "ass-backward" to say the least.
> >>
> >> I would hardly call is ass-backwards.  I know of at least 3 HPC clusters
> >> that use only the HDFS component of Hadoop to serve 500TB+ of data to
> >> 100+ worker nodes.
> >>
> >> As a cluster filesystem, HDFS works pretty darn well.
> >>
> >> --Mike
> >>
> >>
> >>
> >
> >
> >
>