Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # general - Which proposed distro of Hadoop, 0.20.206 or 0.22, will be better for HBase?


+
Jagane Sundar 2011-10-02, 23:57
+
Milind.Bhandarkar@... 2011-10-05, 22:55
+
Jagane Sundar 2011-10-05, 23:20
+
Milind.Bhandarkar@... 2011-10-05, 23:55
+
Jagane Sundar 2011-10-06, 02:00
+
Roman Shaposhnik 2011-10-06, 02:38
+
Konstantin Boudnik 2011-10-06, 05:09
+
Jagane Sundar 2011-10-06, 05:40
+
Konstantin Boudnik 2011-10-06, 05:58
+
Steve Loughran 2011-10-06, 09:54
+
Steve Loughran 2011-10-06, 09:48
+
Milind.Bhandarkar@... 2011-10-06, 16:49
+
Steve Loughran 2011-10-07, 09:17
+
Milind.Bhandarkar@... 2011-10-07, 16:23
Copy link to this message
-
Re: Which proposed distro of Hadoop, 0.20.206 or 0.22, will be better for HBase?
Konstantin Boudnik 2011-10-07, 19:05
On Fri, Oct 07, 2011 at 10:17AM, Steve Loughran wrote:
> On 06/10/2011 17:49, [EMAIL PROTECTED] wrote:
>> Steve,
>>
>>> Summary: I'm not sure that HDFS is the right FS in this world, as it
>>> contains a lot of assumptions about system stability and HDD persistence
>>> that aren't valid any more. With the ability to plug in new placers you
>>> could do tricks like ensure 1 replica lives in a persistent blockstore
>>> (and rely on it always being there), and add other replicas in transient
>>> storage if the data is about to be needed in jobs.
>>
>> Can you please shed more light on the statement "... as it
>> contains a lot of assumptions about system stability and HDD persistence
>> that aren't valid any more..." ?
>>
>> I know that you were doing some analysis of disk failure modes sometime
>> ago. Is this the result of that research ? I am very interested.
>
> no, it's unrelated -experience in hosting virtual hadoop  
> infrastructures. Which is how my short-lived clusters exist today
>
> -you don't know the hostname of the master nodes until allocated, so you  
> need to allocate them and dynamically push out configs to the workers

This is of course is a big win for non-autodiscoverable architecture ;)

> -the Datanodes spin when the namenode goes down, forever, rather than  
> checking somewhere to see if its changed. HDFS HA may fix that.
..
> -again, the TaskTrackers spin when the JT goes down, rather than look to  
> see if its moved.
..
> -Blacklisting isn't the right way to deal with task tracker failures:  
> termination of VM is.

See my above comment.

Auto-discovery would solve a lot of these issues and many others such as
shared distributed memory suitable for condig management etc.

Cos
+
Roman Shaposhnik 2011-10-06, 02:33
+
Jagane Sundar 2011-10-02, 23:15