On 25 October 2012 20:24, Daniel Käfer <[EMAIL PROTECTED]> wrote:
> Hello all,
> I'm looking for a reference architecture for hadoop. The only result I
> found is Lambda architecture from Nathan Marz.
I quite like the new Hadoop in Practice for a lot of that, especially the
answer to #2, "how to store the data", where he looks at all the options.
Joining is the other big issue.
Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and
Hive can work with that as well as rawer data kept in HDFS directly
> With architecture I mean answers to question like:
> - How should I store the data? CSV, Thirft, ProtoBuf
> - How should I model the data? ER-Model, Starschema, something new?
> - normalized or denormalized or both (master data normalized, then
> transformation to denormalized, like ETL)
> - How should i combine database and HDFS-Files?
> Are there any other documented architectures for hadoop?
> Daniel Käfer
>  http://www.manning.com/marz/ just a preprint yet, not completed