Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - reference architecture


Copy link to this message
-
Re: reference architecture
Steve Loughran 2012-10-26, 17:25
On 25 October 2012 23:17, Daniel Käfer <[EMAIL PROTECTED]> wrote:

> Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran:
> > Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig
> > and Hive can work with that as well as rawer data kept in HDFS
> > directly
>
> But is that the best idea? HBase is great for random read and small
> range scan. But the Hive (SQL) performance is 4-5x slower than plain
> HDFS. [0]
>
>

> I guess first data (raw data) in HDFS and last data in HBase is a good
> idea. But how to store the data between individual mapreduce jobs?
>

Depends on the amount of data and expected use. If it's transient food for
the next MR jobs: HDFS