Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Shared HDFS for HBase and MapReduce


Copy link to this message
-
Re: Shared HDFS for HBase and MapReduce
If your workload is only batch processing (MR), you don't need to separate the clusters in the first place. So, you don't have the problem of moving large amounts of data between clusters.
Having a common HDFS cluster and using part of the nodes as HBase RS and part as the Hadoop TTs doesn't solve the problem of moving data from the HBase RS to the tasks you'll run as a part of your MR jobs if HBase is your source/sink. You will still be reading/writing over the network.

On the other hand, if your workload is 'realtime' random reads/writes, the amount of data you are likely going to be accessing is small and therefore not expensive. Moreover, that's going to be accessed from a client application of some sort that is not a MR job.
On Wednesday, June 6, 2012 at 12:23 PM, Atif Khan wrote:

> This is beginning to sound like a catch-22 problem. I think I personally
> would lean towards a single HDFS (high performing) cluster that can be
> shared between various types of applications (realtime vs analytics). Then
> control/balance resource requirements for each application. This would work
> for scenarios where I can predict the different types of
> applications/workloads before hand. However, if for some reason the nature
> of workload is to shift, that could potentially throw off the whole resource
> equilibrium.
>
> Are there any additional Hadoop specific monitoring tools that can be
> deployed to predict resource/performance bottlenecks in advance (in addition
> to regular BMC type tools)?
>
> --
> View this message in context: http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-tp4018856p4018881.html
> Sent from the HBase - Developer mailing list archive at Nabble.com (http://Nabble.com).
>
>