Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> Which proposed distro of Hadoop, 0.20.206 or 0.22, will be better for HBase?


+
Jagane Sundar 2011-10-02, 23:57
+
Milind.Bhandarkar@... 2011-10-05, 22:55
+
Jagane Sundar 2011-10-05, 23:20
+
Milind.Bhandarkar@... 2011-10-05, 23:55
+
Jagane Sundar 2011-10-06, 02:00
+
Roman Shaposhnik 2011-10-06, 02:38
+
Konstantin Boudnik 2011-10-06, 05:09
+
Jagane Sundar 2011-10-06, 05:40
+
Konstantin Boudnik 2011-10-06, 05:58
+
Steve Loughran 2011-10-06, 09:54
+
Steve Loughran 2011-10-06, 09:48
+
Milind.Bhandarkar@... 2011-10-06, 16:49
+
Steve Loughran 2011-10-07, 09:17
+
Milind.Bhandarkar@... 2011-10-07, 16:23
Copy link to this message
-
Re: Which proposed distro of Hadoop, 0.20.206 or 0.22, will be better for HBase?
On Fri, Oct 07, 2011 at 10:17AM, Steve Loughran wrote:
> On 06/10/2011 17:49, [EMAIL PROTECTED] wrote:
>> Steve,
>>
>>> Summary: I'm not sure that HDFS is the right FS in this world, as it
>>> contains a lot of assumptions about system stability and HDD persistence
>>> that aren't valid any more. With the ability to plug in new placers you
>>> could do tricks like ensure 1 replica lives in a persistent blockstore
>>> (and rely on it always being there), and add other replicas in transient
>>> storage if the data is about to be needed in jobs.
>>
>> Can you please shed more light on the statement "... as it
>> contains a lot of assumptions about system stability and HDD persistence
>> that aren't valid any more..." ?
>>
>> I know that you were doing some analysis of disk failure modes sometime
>> ago. Is this the result of that research ? I am very interested.
>
> no, it's unrelated -experience in hosting virtual hadoop  
> infrastructures. Which is how my short-lived clusters exist today
>
> -you don't know the hostname of the master nodes until allocated, so you  
> need to allocate them and dynamically push out configs to the workers

This is of course is a big win for non-autodiscoverable architecture ;)

> -the Datanodes spin when the namenode goes down, forever, rather than  
> checking somewhere to see if its changed. HDFS HA may fix that.
..
> -again, the TaskTrackers spin when the JT goes down, rather than look to  
> see if its moved.
..
> -Blacklisting isn't the right way to deal with task tracker failures:  
> termination of VM is.

See my above comment.

Auto-discovery would solve a lot of these issues and many others such as
shared distributed memory suitable for condig management etc.

Cos
+
Roman Shaposhnik 2011-10-06, 02:33
+
Jagane Sundar 2011-10-02, 23:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB