Hbase is very good at this kind of thing.
Depending on your aggregation needs OpenTSDB might be interesting since they
store and query against large amounts of time ordered data similar to what
you want to do.
It isn't clear to whether your data is primarily about current state or
about time-embedded state transitions. You can easily store both in hbase,
but the arrangements will be a bit different.
On Wed, Apr 13, 2011 at 6:12 PM, Sam Seigal <[EMAIL PROTECTED]> wrote:
> I have a requirement where I have large sets of incoming data into a
> system I own.
> A single unit of data in this set has a set of immutable attributes +
> state attached to it. The state is dynamic and can change at any time.
> What is the best way to run analytical queries on data of such nature
> One way is to maintain this data in a separate store, take a snapshot
> in point of time, and then import into the HDFS filesystem for
> analysis using Hadoop Map-Reduce. I do not see this approach scaling,
> since moving data is obviously expensive.
> If i was to directly maintain this data as Sequence Files in HDFS, how
> would updates work ?
> I am new to Hadoop/HDFS , so any suggestions/critique is welcome. I
> know that HBase works around this problem through multi version
> concurrency control techniques. Is that the only option ? Are there
> any alternatives ?
> Also note that all aggregation and analysis I want to do is time based
> i.e. sum of x on pivot y over a day, 2 days, week, month etc. For such
> use cases, is it advisable to use HDFS directly or use systems built
> on top of hadoop like Hive or Hbase ?