Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Suitability of HDFS for live file store


Copy link to this message
-
Re: Suitability of HDFS for live file store
Brock Noland 2012-10-15, 20:18
Hi,

Harsh makes a good point, there is no explicit way to say "these files
should remain in memory". However, I would note that give available
RAM on the datanodes, the operating system will cache recently
accessed blocks.

Brock

On Mon, Oct 15, 2012 at 3:08 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hey Matt,
>
> What do you mean by 'real-time' though? While HDFS has pretty good
> contiguous data read speeds (and you get N x replicas to read from),
> if you're looking to "cache" frequently accessed files into memory
> then HDFS does not natively have support for that. Otherwise, I agree
> with Brock, seems like you could make it work with HDFS (sans
> MapReduce - no need to run it if you don't need it).
>
> The presence of NameNode audit logging will help your file access
> analysis requirement.
>
> On Tue, Oct 16, 2012 at 1:17 AM, Matt Painter <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I am a new Hadoop user, and would really appreciate your opinions on whether
>> Hadoop is the right tool for what I'm thinking of using it for.
>>
>> I am investigating options for scaling an archive of around 100Tb of image
>> data. These images are typically TIFF files of around 50-100Mb each and need
>> to be made available online in realtime. Access to the files will be
>> sporadic and occasional, but writing the files will be a daily activity.
>> Speed of write is not particularly important.
>>
>> Our previous solution was a monolithic, expensive - and very full - SAN so I
>> am excited by Hadoop's distributed, extensible, redundant architecture.
>>
>> My concern is that a lot of the discussion on and use cases for Hadoop is
>> regarding data processing with MapReduce and - from what I understand -
>> using HDFS for the purpose of input for MapReduce jobs. My other concern is
>> vague indication that it's not a 'real-time' system. We may be using
>> MapReduce in small components of the application, but it will most likely be
>> in file access analysis rather than any processing on the files themselves.
>>
>> In other words, what I really want is a distributed, resilient, scalable
>> filesystem.
>>
>> Is Hadoop suitable if we just use this facility, or would I be misusing it
>> and inviting grief?
>>
>> M
>
>
>
> --
> Harsh J

--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/