Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> reference architecture


Copy link to this message
-
Re: reference architecture
On 25 October 2012 20:24, Daniel Käfer <[EMAIL PROTECTED]> wrote:

> Hello all,
>
> I'm looking for a reference architecture for hadoop. The only result I
> found is Lambda architecture from Nathan Marz[0].
>

I quite like the new Hadoop in Practice for a lot of that, especially the
answer to #2, "how to store the data", where he looks at all the options.
Joining is the other big issue.

http://steveloughran.blogspot.co.uk/2012/10/hadoop-in-practice-applied-hadoop.html

Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and
Hive can work with that as well as rawer data kept in HDFS directly
> With architecture I mean answers to question like:
> - How should I store the data? CSV, Thirft, ProtoBuf
> - How should I model the data? ER-Model, Starschema, something new?
> - normalized or denormalized or both (master data normalized, then
> transformation to denormalized, like ETL)
> - How should i combine database and HDFS-Files?
>
> Are there any other documented architectures for hadoop?
>
> Regards
> Daniel Käfer
>
>
> [0] http://www.manning.com/marz/ just a preprint yet, not completed
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB