Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: is HDFS RAID "data locality" efficient?


Copy link to this message
-
Re: is HDFS RAID "data locality" efficient?
Ok...

So under Apache Hadoop, how do you specify the location of when and where a directory will be created on HDFS?

As an example, if I want to create a /coldData directory in HDFS as a place to store my older data sets, How does that get assigned specifically to a RAIDed HDFS?
(Or even specific machines?)

I know I can do this in MapR's distribution, but I am not aware of this feature being made available in the Apache based releases?

Is this part of the latest feature set?

Thx

-Mike

On Aug 8, 2012, at 12:31 PM, Steve Loughran <[EMAIL PROTECTED]> wrote:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <[EMAIL PROTECTED]> wrote:
> Hi folks!
>
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
>
> -          Using normal HDFS with default replication=3 for my “fresh data”
>
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>
>  
>
>
>
>
> exactly: less space use on cold data, with the penalty that access performance can be worse. As the majority of data on a hadoop cluster is usually "cold", it's a space and power efficient story for the archive data
>
> --
> Steve Loughran
> Hortonworks Inc
>