Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> reference architecture

Copy link to this message
Re: reference architecture
On 25 October 2012 20:24, Daniel Käfer <[EMAIL PROTECTED]> wrote:

> Hello all,
> I'm looking for a reference architecture for hadoop. The only result I
> found is Lambda architecture from Nathan Marz[0].

I quite like the new Hadoop in Practice for a lot of that, especially the
answer to #2, "how to store the data", where he looks at all the options.
Joining is the other big issue.


Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and
Hive can work with that as well as rawer data kept in HDFS directly
> With architecture I mean answers to question like:
> - How should I store the data? CSV, Thirft, ProtoBuf
> - How should I model the data? ER-Model, Starschema, something new?
> - normalized or denormalized or both (master data normalized, then
> transformation to denormalized, like ETL)
> - How should i combine database and HDFS-Files?
> Are there any other documented architectures for hadoop?
> Regards
> Daniel Käfer
> [0] http://www.manning.com/marz/ just a preprint yet, not completed