If you need to run fast queries on your 'Account/User data' then you got to
use a nosql solution. If you only constraint is frequent updates you may
still manage to keep the data in hdfs, just rewrite it everytime there is
change. So the key consideration is whether you want to run fast queries
you are fine with offline slow queries of the hdfs data.
On Sat, Nov 24, 2012 at 12:56 PM, jeff l <[EMAIL PROTECTED]> wrote:
> Hi All,
> I'm coming from the RDBMS world and am looking at hdfs for long term data
> storage and analysis.
> I've done some research and set up some smallish hdfs clusters with hive
> for testing but I'm having a little trouble understanding how everything
> fits together and was hoping someone could point me in the right direction.
> I'm looking at storing two types of data:
> 1. Append-only data - e.g. weblogs or user logins
> 2. Account/User data
> HDFS seems to be perfect for append-only data like #1, but I'm having
> trouble figuring out what to do with data that may change frequently.
> A simple example would be user data where various bits of information:
> email, etc may change from day to day. Would hbase or cassandra be the
> better way to go for this type of data, and can I overlay hive over all (
> hdfs, hbase, cassandra ) so that I can query the data through a single
> Thanks in advance for any help.