Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> reference architecture


Copy link to this message
-
Re: reference architecture
On 25 October 2012 23:17, Daniel Käfer <[EMAIL PROTECTED]> wrote:

> Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran:
> > Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig
> > and Hive can work with that as well as rawer data kept in HDFS
> > directly
>
> But is that the best idea? HBase is great for random read and small
> range scan. But the Hive (SQL) performance is 4-5x slower than plain
> HDFS. [0]
>
>

> I guess first data (raw data) in HDFS and last data in HBase is a good
> idea. But how to store the data between individual mapreduce jobs?
>

Depends on the amount of data and expected use. If it's transient food for
the next MR jobs: HDFS
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB