Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Best practice for storage of data that changes


Copy link to this message
-
Re: Best practice for storage of data that changes
Hi Jeff,

Please look at [1] . You can store your data in HBase tables and query them
normally just by mapping them to Hive tables. Regarding Cassandra support,
please follow JIRA [2], its not yet in the trunk I suppose!

[1] https://cwiki.apache.org/Hive/hbaseintegration.html
[2] https://issues.apache.org/jira/browse/HIVE-1434

Thanks,

On Sun, Nov 25, 2012 at 2:26 AM, jeff l <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I'm coming from the RDBMS world and am looking at hdfs for long term data
> storage and analysis.
>
> I've done some research and set up some smallish hdfs clusters with hive
> for testing but I'm having a little trouble understanding how everything
> fits together and was hoping someone could point me in the right direction.
>
> I'm looking at storing two types of data:
>
> 1. Append-only data - e.g. weblogs or user logins
> 2. Account/User data
>
> HDFS seems to be perfect for append-only data like #1, but I'm having
> trouble figuring out what to do with data that may change frequently.
>
> A simple example would be user data where various bits of information:
> email, etc may change from day to day.  Would hbase or cassandra be the
> better way to go for this type of data, and can I overlay hive over all (
> hdfs, hbase, cassandra ) so that I can query the data through a single
> interface?
>
> Thanks in advance for any help.
>

--
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v
+
Mahesh Balija 2012-11-25, 12:52
+
anil gupta 2012-11-25, 21:11
+
Peyman Mohajerian 2012-11-24, 22:32