Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: is HDFS RAID "data locality" efficient?


Copy link to this message
-
Re: is HDFS RAID "data locality" efficient?
Ok...

So under Apache Hadoop, how do you specify the location of when and where a directory will be created on HDFS?

As an example, if I want to create a /coldData directory in HDFS as a place to store my older data sets, How does that get assigned specifically to a RAIDed HDFS?
(Or even specific machines?)

I know I can do this in MapR's distribution, but I am not aware of this feature being made available in the Apache based releases?

Is this part of the latest feature set?

Thx

-Mike

On Aug 8, 2012, at 12:31 PM, Steve Loughran <[EMAIL PROTECTED]> wrote:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <[EMAIL PROTECTED]> wrote:
> Hi folks!
>
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
>
> -          Using normal HDFS with default replication=3 for my “fresh data”
>
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>
>  
>
>
>
>
> exactly: less space use on cold data, with the penalty that access performance can be worse. As the majority of data on a hadoop cluster is usually "cold", it's a space and power efficient story for the archive data
>
> --
> Steve Loughran
> Hortonworks Inc
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB