Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> performance consideration when writing to HBase from MR job


Copy link to this message
-
Re: performance consideration when writing to HBase from MR job
>
> a) all the Puts are collected in Reduce or Map (if there is no reduce)  and
> a batch write is done
> b) writing out each <K,V> pair using context.write(k, v)
>
> If a) is considered instead of b) then wouldn't there be a violation of
> semantics w.r.t KEYOUT, VALUEOUT (because <K, V> is not being output)?? Is
> this OK?
>

1. If you can write from the mapper, you would avoid the overhead caused due
to shuffling and sorting between the map and reduce phase.
2. It would not make much difference if you are using the HBase API directly
in the mapper/reducer to write to the table instead of writing out to the
context and using one of the output formats that writes to the table.
However, if you plan to use the bulkload utility (HBASE-48 jira), you will
get much better performance than using the HBase API directly.
Regarding the semantics - no there would not be a problem as long as you
create your Puts properly.

-Amandeep
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB