Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Suitability of HDFS for live file store


Copy link to this message
-
Re: Suitability of HDFS for live file store
Seems like a heavyweight solution unless you are actually processing the
images?

Wow, no mapreduce, no streaming writes, and relatively small files.  Im
surprised that you are considering hadoop at all ?

Im surprised there isnt a simpler solution that uses redundancy without all
the
daemons and name nodes and task trackers and stuff.

Might make it kind of awkward as a normal file system.

On Mon, Oct 15, 2012 at 4:08 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hey Matt,
>
> What do you mean by 'real-time' though? While HDFS has pretty good
> contiguous data read speeds (and you get N x replicas to read from),
> if you're looking to "cache" frequently accessed files into memory
> then HDFS does not natively have support for that. Otherwise, I agree
> with Brock, seems like you could make it work with HDFS (sans
> MapReduce - no need to run it if you don't need it).
>
> The presence of NameNode audit logging will help your file access
> analysis requirement.
>
> On Tue, Oct 16, 2012 at 1:17 AM, Matt Painter <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I am a new Hadoop user, and would really appreciate your opinions on
> whether
> > Hadoop is the right tool for what I'm thinking of using it for.
> >
> > I am investigating options for scaling an archive of around 100Tb of
> image
> > data. These images are typically TIFF files of around 50-100Mb each and
> need
> > to be made available online in realtime. Access to the files will be
> > sporadic and occasional, but writing the files will be a daily activity.
> > Speed of write is not particularly important.
> >
> > Our previous solution was a monolithic, expensive - and very full - SAN
> so I
> > am excited by Hadoop's distributed, extensible, redundant architecture.
> >
> > My concern is that a lot of the discussion on and use cases for Hadoop is
> > regarding data processing with MapReduce and - from what I understand -
> > using HDFS for the purpose of input for MapReduce jobs. My other concern
> is
> > vague indication that it's not a 'real-time' system. We may be using
> > MapReduce in small components of the application, but it will most
> likely be
> > in file access analysis rather than any processing on the files
> themselves.
> >
> > In other words, what I really want is a distributed, resilient, scalable
> > filesystem.
> >
> > Is Hadoop suitable if we just use this facility, or would I be misusing
> it
> > and inviting grief?
> >
> > M
>
>
>
> --
> Harsh J
>

--
Jay Vyas
MMSB/UCHC
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB