Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Best practice for storage of data that changes

Copy link to this message
Re: Best practice for storage of data that changes
Hi Jeff,

Please look at [1] . You can store your data in HBase tables and query them
normally just by mapping them to Hive tables. Regarding Cassandra support,
please follow JIRA [2], its not yet in the trunk I suppose!

[1] https://cwiki.apache.org/Hive/hbaseintegration.html
[2] https://issues.apache.org/jira/browse/HIVE-1434


On Sun, Nov 25, 2012 at 2:26 AM, jeff l <[EMAIL PROTECTED]> wrote:

> Hi All,
> I'm coming from the RDBMS world and am looking at hdfs for long term data
> storage and analysis.
> I've done some research and set up some smallish hdfs clusters with hive
> for testing but I'm having a little trouble understanding how everything
> fits together and was hoping someone could point me in the right direction.
> I'm looking at storing two types of data:
> 1. Append-only data - e.g. weblogs or user logins
> 2. Account/User data
> HDFS seems to be perfect for append-only data like #1, but I'm having
> trouble figuring out what to do with data that may change frequently.
> A simple example would be user data where various bits of information:
> email, etc may change from day to day.  Would hbase or cassandra be the
> better way to go for this type of data, and can I overlay hive over all (
> hdfs, hbase, cassandra ) so that I can query the data through a single
> interface?
> Thanks in advance for any help.

Bharath .V
Mahesh Balija 2012-11-25, 12:52
anil gupta 2012-11-25, 21:11
Peyman Mohajerian 2012-11-24, 22:32