Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Shared HDFS for HBase and MapReduce


Copy link to this message
-
Re: Shared HDFS for HBase and MapReduce

Regarding locality, it's not just Lars' stuff, it's in the RefGuide (see
section 9.7.3)Š

http://hbase.apache.org/book.html#regions.arch

re:  "You will still be reading/writing over the network"

This is definitely true as far as writes go because of the replicas (see
the RefGuide for why), although I disagree on the read portion unless
there is an exceptional case (which typically the result of an RS going
down)

On 6/6/12 4:27 PM, "Atif Khan" <[EMAIL PROTECTED]> wrote:

>Thanks Amandeep!
>
>I think what I was saying that we are trying to support both types of
>workloads.  That is realtime transactional workloads, and batch processing
>for data analysis.  The big question being if a single HDFS cluster should
>be shared between the two workflows.
>
>The point that you are trying to make (if I am understanding you
>correctly)
>is of data "Locality".
>
>/Amandeep Khurana - "Having a common HDFS cluster and using part of the
>nodes as HBase RS and part as the Hadoop TTs doesn't solve the problem of
>moving data from the HBase RS to the tasks you'll run as a part of your MR
>jobs if HBase is your source/sink. You will still be reading/writing over
>the network."
>/
>
>When running MR jobs over HBase, data locality is provided by HBase
>(please
>see http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html,
>and
>also HBase the Definitive Guide by Lars George page 298 MapReduce
>Locality).
>In other words, the computation will be exported to where the data is,
>therefore limiting the need to transfer data over the network.  Proper
>data
>locality has a big impact on the overall performance.
>
>So I believe that a common HDFS cluster does not imply logical segregation
>between HBase RS and Hadoop TTs.  Therefore, your point seems in
>contradiction with Lars George's statement.
>
>Thoughts?
>
>
>--
>View this message in context:
>http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapRedu
>ce-tp4018856p4018884.html
>Sent from the HBase - Developer mailing list archive at Nabble.com.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB