Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Best practice for storage of data that changes


Copy link to this message
-
Re: Best practice for storage of data that changes
Here's the simple thing to consider...

If you are running M/R jobs against the data... HBase hands down is the winner.

If you are looking at a stand alone cluster ... Cassandra wins. HBase is still a fickle beast.

Of course I just bottom lined it.  :-)
On Nov 29, 2012, at 10:51 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:

> Please! There are lots of blogs etc. about the two, but very few head-to-head for a real use case.
>
> From: "anil gupta" <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Wednesday, November 28, 2012 11:01:55 AM
> Subject: Re: Best practice for storage of data that changes
>
> Hi Jeff,
>
> At my workplace "Intuit", we did some detailed study to evaluate HBase and Cassandra for our use case. I will see if i can post the comparative study on my public blog or on this mailing list.
>
> BTW, What is your use case? What bottleneck are you hitting at current solutions? If you can share some details then HBase community will try to help you out.
>
> Thanks,
> Anil Gupta
>
>
> On Wed, Nov 28, 2012 at 9:55 AM, jeff l <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have quite a bit of experience with RDBMSs ( Oracle, Postgres, Mysql ) and MongoDB but don't feel any are quite right for this problem.  The amount of data being stored and access requirements just don't match up well.
>
> I was hoping to keep the stack as simple as possible and just use hdfs but everything I was seeing kept pointing to the need for some other datastore.  I'll check out both HBase and Cassandra.
>
> Thanks for the feedback.
>
>
> On Sun, Nov 25, 2012 at 1:11 PM, anil gupta <[EMAIL PROTECTED]> wrote:
> Hi Jeff,
>
> My two cents below:
>
> 1st use case: Append-only data - e.g. weblogs or user logins
> As others have already mentioned that Hadoop is suitable enough to store append only data. If you want to do analysis of weblogs or user logins then Hadoop is a suitable solution for it.
>
>
> 2nd use case: Account/User data
> First, of all i would suggest you to have a look at your use case then analyze whether it really needs a NoSql solution or not.
> As you were talking about maintaining User Data in NoSql. Why NoSql instead of RDBMS? What is the size of data? Which NoSql features are the selling points for you?
>
> For real time read writes you can have a look at Cassandra or HBase. But, i would suggest you to have a very close look at both of them because both of them have their own advantages. So, the choice will be dependent on your use case.
>
> One added advantage with HBase is that it has a deeper integration with Hadoop ecosystem so you can do a lot of stuff on HBase data  using Hadoop Tools. HBase has integration with Hive querying but AFAIK it has some limitations.
>
> HTH,
> Anil Gupta
>
>
> On Sun, Nov 25, 2012 at 4:52 AM, Mahesh Balija<[EMAIL PROTECTED]> wrote:
> Hi Jeff,
>
>         As HDFS paradigm is "Write once and read many" you cannot be able to update the files on HDFS.
>         But for your problem what you can do is you keep the logs/userdata in hdfs with different timestamps.
>         Run some mapreduce jobs at certain intervals to extract required data from those logs and put it to Hbase/Cassandra/Mongodb.
>
>         Mongodb read performance is quite faster also it supports ad-hoc querying. Also you can use Hadoop-MongoDB connector to read/write the data to Mongodb thru Hadoop-Mapreduce.
>      
>         If you are very specific about updating the hdfs files directly then you have to use any commercial Hadoop packages like MapR which supports updating the HDFS files.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
>
> On Sun, Nov 25, 2012 at 9:40 AM, bharath vissapragada<[EMAIL PROTECTED]> wrote:
> Hi Jeff,
>
> Please look at [1] . You can store your data in HBase tables and query them normally just by mapping them to Hive tables. Regarding Cassandra support, please follow JIRA [2], its not yet in the trunk I suppose!
+
anil gupta 2012-11-30, 20:35
+
jeff l 2012-11-24, 20:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB