Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Best practice for storage of data that changes


Copy link to this message
-
Re: Best practice for storage of data that changes
Here's the simple thing to consider...

If you are running M/R jobs against the data... HBase hands down is the winner.

If you are looking at a stand alone cluster ... Cassandra wins. HBase is still a fickle beast.

Of course I just bottom lined it.  :-)
On Nov 29, 2012, at 10:51 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:

> Please! There are lots of blogs etc. about the two, but very few head-to-head for a real use case.
>
> From: "anil gupta" <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Wednesday, November 28, 2012 11:01:55 AM
> Subject: Re: Best practice for storage of data that changes
>
> Hi Jeff,
>
> At my workplace "Intuit", we did some detailed study to evaluate HBase and Cassandra for our use case. I will see if i can post the comparative study on my public blog or on this mailing list.
>
> BTW, What is your use case? What bottleneck are you hitting at current solutions? If you can share some details then HBase community will try to help you out.
>
> Thanks,
> Anil Gupta
>
>
> On Wed, Nov 28, 2012 at 9:55 AM, jeff l <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have quite a bit of experience with RDBMSs ( Oracle, Postgres, Mysql ) and MongoDB but don't feel any are quite right for this problem.  The amount of data being stored and access requirements just don't match up well.
>
> I was hoping to keep the stack as simple as possible and just use hdfs but everything I was seeing kept pointing to the need for some other datastore.  I'll check out both HBase and Cassandra.
>
> Thanks for the feedback.
>
>
> On Sun, Nov 25, 2012 at 1:11 PM, anil gupta <[EMAIL PROTECTED]> wrote:
> Hi Jeff,
>
> My two cents below:
>
> 1st use case: Append-only data - e.g. weblogs or user logins
> As others have already mentioned that Hadoop is suitable enough to store append only data. If you want to do analysis of weblogs or user logins then Hadoop is a suitable solution for it.
>
>
> 2nd use case: Account/User data
> First, of all i would suggest you to have a look at your use case then analyze whether it really needs a NoSql solution or not.
> As you were talking about maintaining User Data in NoSql. Why NoSql instead of RDBMS? What is the size of data? Which NoSql features are the selling points for you?
>
> For real time read writes you can have a look at Cassandra or HBase. But, i would suggest you to have a very close look at both of them because both of them have their own advantages. So, the choice will be dependent on your use case.
>
> One added advantage with HBase is that it has a deeper integration with Hadoop ecosystem so you can do a lot of stuff on HBase data  using Hadoop Tools. HBase has integration with Hive querying but AFAIK it has some limitations.
>
> HTH,
> Anil Gupta
>
>
> On Sun, Nov 25, 2012 at 4:52 AM, Mahesh Balija<[EMAIL PROTECTED]> wrote:
> Hi Jeff,
>
>         As HDFS paradigm is "Write once and read many" you cannot be able to update the files on HDFS.
>         But for your problem what you can do is you keep the logs/userdata in hdfs with different timestamps.
>         Run some mapreduce jobs at certain intervals to extract required data from those logs and put it to Hbase/Cassandra/Mongodb.
>
>         Mongodb read performance is quite faster also it supports ad-hoc querying. Also you can use Hadoop-MongoDB connector to read/write the data to Mongodb thru Hadoop-Mapreduce.
>      
>         If you are very specific about updating the hdfs files directly then you have to use any commercial Hadoop packages like MapR which supports updating the HDFS files.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
>
> On Sun, Nov 25, 2012 at 9:40 AM, bharath vissapragada<[EMAIL PROTECTED]> wrote:
> Hi Jeff,
>
> Please look at [1] . You can store your data in HBase tables and query them normally just by mapping them to Hive tables. Regarding Cassandra support, please follow JIRA [2], its not yet in the trunk I suppose!
+
anil gupta 2012-11-30, 20:35
+
jeff l 2012-11-24, 20:56