Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> reference architecture

Copy link to this message
Re: reference architecture
On 25 October 2012 23:17, Daniel Käfer <[EMAIL PROTECTED]> wrote:

> Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran:
> > Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig
> > and Hive can work with that as well as rawer data kept in HDFS
> > directly
> But is that the best idea? HBase is great for random read and small
> range scan. But the Hive (SQL) performance is 4-5x slower than plain
> HDFS. [0]

> I guess first data (raw data) in HDFS and last data in HBase is a good
> idea. But how to store the data between individual mapreduce jobs?

Depends on the amount of data and expected use. If it's transient food for
the next MR jobs: HDFS