Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> reference architecture


Copy link to this message
-
Re: reference architecture
On 25 October 2012 20:24, Daniel Käfer <[EMAIL PROTECTED]> wrote:

> Hello all,
>
> I'm looking for a reference architecture for hadoop. The only result I
> found is Lambda architecture from Nathan Marz[0].
>

I quite like the new Hadoop in Practice for a lot of that, especially the
answer to #2, "how to store the data", where he looks at all the options.
Joining is the other big issue.

http://steveloughran.blogspot.co.uk/2012/10/hadoop-in-practice-applied-hadoop.html

Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and
Hive can work with that as well as rawer data kept in HDFS directly
> With architecture I mean answers to question like:
> - How should I store the data? CSV, Thirft, ProtoBuf
> - How should I model the data? ER-Model, Starschema, something new?
> - normalized or denormalized or both (master data normalized, then
> transformation to denormalized, like ETL)
> - How should i combine database and HDFS-Files?
>
> Are there any other documented architectures for hadoop?
>
> Regards
> Daniel Käfer
>
>
> [0] http://www.manning.com/marz/ just a preprint yet, not completed
>
>