I am very interested in this also. I posed the question somewhere a couple years ago and hadn't heard anything. We decided to go with hbase to store a "working set" of the data - data that we would want to view with low latency and relatively randomly. Then, we store everything else to HDFS for later processing.
We are working with medical/physiological sensor data.
--
Andrew Nguyen
On Tuesday, May 29, 2012 at 10:13 AM, Josh Patterson wrote:
> unless you need low latency access to all of this time series, it
> might be a more cost efficient path to store large archives of the
> data in plain HDFS.
>
> The scanning can be done more efficiently in a lot of cases in MapReduce + HDFS.
>
> Some links:
>
> OSCON-data presentation (good TVA story here):
>
>
http://www.slideshare.net/jpatanooga/oscon-data-2011-lumberyard>
>
http://www.slideshare.net/cloudera/hadoop-as-the-platform-for-the-smartgrid-at-tva>
>
> Engineering Literature:
>
>
http://openpdc.codeplex.com/>
> Josh
>
> On Thu, May 17, 2012 at 7:23 PM, Rita <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Hello,
> >
> > Currently, using hbase to store sensor data -- basically large time series
> > data hitting close to 2 billion rows for a type of sensor. I was wondering
> > how hbase differs from HDF (
http://www.hdfgroup.org/HDF5/) file format.
> > Most of my operations are scanning a range and getting its values but it
> > seems I can achieve this usind HDF. Does anyone have experience with this
> > file container format and shed some light?
> >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>
>
>
>
> --
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop:
http://www.cloudera.com>
>