|
|
Laurent Hatier 2011-08-10, 17:02
Hi all,
I would like to know why MongoDB is faster than HBase to select items. I explain my case : I've inserted 4'000'000 lines into HBase and MongoDB and i must calculate the geolocation with the IP. I calculate a Long number with the IP and i go to find it into the 4'000'000 lines. it's take 5 ms to select the right row with Mongo instead of HBase takes 5 seconds. I think that the reason is the method : cur.limit(1) with MongoDB but is there no function like this with HBase ?
-- Laurent HATIER Étudiant en 2e année du Cycle Ingénieur à l'EISTI
Chris Tarnas 2011-08-10, 17:09
Hi Laurent,
Without more details on your schema and how you are finding that number in your table it is impossible to fully answer the question. I suspect what you are seeing is mongo's native support for secondary indexes. If you were to add secondary indexes in HBase then retrieving that row should be on the order of 3-30ms. If that is you main query method then you could reorganize your table to make that long number your row key, then you would get even faster reads.
-chris On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote:
> Hi all, > > I would like to know why MongoDB is faster than HBase to select items. > I explain my case : > I've inserted 4'000'000 lines into HBase and MongoDB and i must calculate > the geolocation with the IP. I calculate a Long number with the IP and i go > to find it into the 4'000'000 lines. > it's take 5 ms to select the right row with Mongo instead of HBase takes 5 > seconds. > I think that the reason is the method : cur.limit(1) with MongoDB but is > there no function like this with HBase ? > > -- > Laurent HATIER > Étudiant en 2e année du Cycle Ingénieur à l'EISTI
Laurent Hatier 2011-08-10, 20:15
Yes, i have heard this index but is it available on hbase 0.90.3 ?
2011/8/10 Chris Tarnas <[EMAIL PROTECTED]>
> Hi Laurent, > > Without more details on your schema and how you are finding that number in > your table it is impossible to fully answer the question. I suspect what you > are seeing is mongo's native support for secondary indexes. If you were to > add secondary indexes in HBase then retrieving that row should be on the > order of 3-30ms. If that is you main query method then you could reorganize > your table to make that long number your row key, then you would get even > faster reads. > > -chris > > > On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote: > > > Hi all, > > > > I would like to know why MongoDB is faster than HBase to select items. > > I explain my case : > > I've inserted 4'000'000 lines into HBase and MongoDB and i must calculate > > the geolocation with the IP. I calculate a Long number with the IP and i > go > > to find it into the 4'000'000 lines. > > it's take 5 ms to select the right row with Mongo instead of HBase takes > 5 > > seconds. > > I think that the reason is the method : cur.limit(1) with MongoDB but is > > there no function like this with HBase ? > > > > -- > > Laurent HATIER > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > > -- Laurent HATIER Étudiant en 2e année du Cycle Ingénieur à l'EISTI
You'll have to build your own secondary indexes for now.
On Wed, Aug 10, 2011 at 1:15 PM, Laurent Hatier <[EMAIL PROTECTED]>wrote:
> Yes, i have heard this index but is it available on hbase 0.90.3 ? > > 2011/8/10 Chris Tarnas <[EMAIL PROTECTED]> > > > Hi Laurent, > > > > Without more details on your schema and how you are finding that number > in > > your table it is impossible to fully answer the question. I suspect what > you > > are seeing is mongo's native support for secondary indexes. If you were > to > > add secondary indexes in HBase then retrieving that row should be on the > > order of 3-30ms. If that is you main query method then you could > reorganize > > your table to make that long number your row key, then you would get even > > faster reads. > > > > -chris > > > > > > On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote: > > > > > Hi all, > > > > > > I would like to know why MongoDB is faster than HBase to select items. > > > I explain my case : > > > I've inserted 4'000'000 lines into HBase and MongoDB and i must > calculate > > > the geolocation with the IP. I calculate a Long number with the IP and > i > > go > > > to find it into the 4'000'000 lines. > > > it's take 5 ms to select the right row with Mongo instead of HBase > takes > > 5 > > > seconds. > > > I think that the reason is the method : cur.limit(1) with MongoDB but > is > > > there no function like this with HBase ? > > > > > > -- > > > Laurent HATIER > > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > > > > > > > -- > Laurent HATIER > Étudiant en 2e année du Cycle Ingénieur à l'EISTI >
Edward Capriolo 2011-08-11, 00:44
On Wed, Aug 10, 2011 at 4:26 PM, Li Pi <[EMAIL PROTECTED]> wrote: > You'll have to build your own secondary indexes for now. > > On Wed, Aug 10, 2011 at 1:15 PM, Laurent Hatier <[EMAIL PROTECTED] > >wrote: > > > Yes, i have heard this index but is it available on hbase 0.90.3 ? > > > > 2011/8/10 Chris Tarnas <[EMAIL PROTECTED]> > > > > > Hi Laurent, > > > > > > Without more details on your schema and how you are finding that number > > in > > > your table it is impossible to fully answer the question. I suspect > what > > you > > > are seeing is mongo's native support for secondary indexes. If you were > > to > > > add secondary indexes in HBase then retrieving that row should be on > the > > > order of 3-30ms. If that is you main query method then you could > > reorganize > > > your table to make that long number your row key, then you would get > even > > > faster reads. > > > > > > -chris > > > > > > > > > On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote: > > > > > > > Hi all, > > > > > > > > I would like to know why MongoDB is faster than HBase to select > items. > > > > I explain my case : > > > > I've inserted 4'000'000 lines into HBase and MongoDB and i must > > calculate > > > > the geolocation with the IP. I calculate a Long number with the IP > and > > i > > > go > > > > to find it into the 4'000'000 lines. > > > > it's take 5 ms to select the right row with Mongo instead of HBase > > takes > > > 5 > > > > seconds. > > > > I think that the reason is the method : cur.limit(1) with MongoDB but > > is > > > > there no function like this with HBase ? > > > > > > > > -- > > > > Laurent HATIER > > > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > > > > > > > > > > > > -- > > Laurent HATIER > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > > > http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale
Ryan Rawson 2011-08-11, 00:57
Mongodb does an excellent job at single node scalability - they use mmap and many smart things and really kick ass ... ON A SINGLE NODE. That single node must have raid (raid it going out of fashion btw), and you wont be able to scale without resorting to: - replication (complex setup!) - sharding mongo claims to help on the last item, but it is still a risk point. For really large data that must span multiple machines, there is no "clustered sql" type solution that isnt (a) borked in various ways (Oracle RAC I'm looking at you) or (b) stupid expensive (Oracle RAC, STILL looking at you) Tools like HBase give you scalability at the cost of features (no automated secondary indexing, no query language). Welcome... to... big... data. -ryan On Thu, Aug 11, 2011 at 12:44 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > On Wed, Aug 10, 2011 at 4:26 PM, Li Pi <[EMAIL PROTECTED]> wrote: > >> You'll have to build your own secondary indexes for now. >> >> On Wed, Aug 10, 2011 at 1:15 PM, Laurent Hatier <[EMAIL PROTECTED] >> >wrote: >> >> > Yes, i have heard this index but is it available on hbase 0.90.3 ? >> > >> > 2011/8/10 Chris Tarnas <[EMAIL PROTECTED]> >> > >> > > Hi Laurent, >> > > >> > > Without more details on your schema and how you are finding that number >> > in >> > > your table it is impossible to fully answer the question. I suspect >> what >> > you >> > > are seeing is mongo's native support for secondary indexes. If you were >> > to >> > > add secondary indexes in HBase then retrieving that row should be on >> the >> > > order of 3-30ms. If that is you main query method then you could >> > reorganize >> > > your table to make that long number your row key, then you would get >> even >> > > faster reads. >> > > >> > > -chris >> > > >> > > >> > > On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote: >> > > >> > > > Hi all, >> > > > >> > > > I would like to know why MongoDB is faster than HBase to select >> items. >> > > > I explain my case : >> > > > I've inserted 4'000'000 lines into HBase and MongoDB and i must >> > calculate >> > > > the geolocation with the IP. I calculate a Long number with the IP >> and >> > i >> > > go >> > > > to find it into the 4'000'000 lines. >> > > > it's take 5 ms to select the right row with Mongo instead of HBase >> > takes >> > > 5 >> > > > seconds. >> > > > I think that the reason is the method : cur.limit(1) with MongoDB but >> > is >> > > > there no function like this with HBase ? >> > > > >> > > > -- >> > > > Laurent HATIER >> > > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI >> > > >> > > >> > >> > >> > -- >> > Laurent HATIER >> > Étudiant en 2e année du Cycle Ingénieur à l'EISTI >> > >> > > http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale>
Blake Lemoine 2011-08-11, 03:07
I'm just curious here. I'm working on a google summer of code project currently that utilizes HBase and several times now I've made secondary indices based on what I think are standard practices. Is there any principled reason that this process couldn't be automated or is it just that no one has implemented it yet? On Wed, Aug 10, 2011 at 7:57 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > Mongodb does an excellent job at single node scalability - they use > mmap and many smart things and really kick ass ... ON A SINGLE NODE. > > That single node must have raid (raid it going out of fashion btw), > and you wont be able to scale without resorting to: > - replication (complex setup!) > - sharding > > mongo claims to help on the last item, but it is still a risk point. > > For really large data that must span multiple machines, there is no > "clustered sql" type solution that isnt (a) borked in various ways > (Oracle RAC I'm looking at you) or (b) stupid expensive (Oracle RAC, > STILL looking at you) > > Tools like HBase give you scalability at the cost of features (no > automated secondary indexing, no query language). > > Welcome... to... big... data. > > -ryan > > On Thu, Aug 11, 2011 at 12:44 AM, Edward Capriolo <[EMAIL PROTECTED]> > wrote: > > On Wed, Aug 10, 2011 at 4:26 PM, Li Pi <[EMAIL PROTECTED]> wrote: > > > >> You'll have to build your own secondary indexes for now. > >> > >> On Wed, Aug 10, 2011 at 1:15 PM, Laurent Hatier < > [EMAIL PROTECTED] > >> >wrote: > >> > >> > Yes, i have heard this index but is it available on hbase 0.90.3 ? > >> > > >> > 2011/8/10 Chris Tarnas <[EMAIL PROTECTED]> > >> > > >> > > Hi Laurent, > >> > > > >> > > Without more details on your schema and how you are finding that > number > >> > in > >> > > your table it is impossible to fully answer the question. I suspect > >> what > >> > you > >> > > are seeing is mongo's native support for secondary indexes. If you > were > >> > to > >> > > add secondary indexes in HBase then retrieving that row should be on > >> the > >> > > order of 3-30ms. If that is you main query method then you could > >> > reorganize > >> > > your table to make that long number your row key, then you would get > >> even > >> > > faster reads. > >> > > > >> > > -chris > >> > > > >> > > > >> > > On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote: > >> > > > >> > > > Hi all, > >> > > > > >> > > > I would like to know why MongoDB is faster than HBase to select > >> items. > >> > > > I explain my case : > >> > > > I've inserted 4'000'000 lines into HBase and MongoDB and i must > >> > calculate > >> > > > the geolocation with the IP. I calculate a Long number with the IP > >> and > >> > i > >> > > go > >> > > > to find it into the 4'000'000 lines. > >> > > > it's take 5 ms to select the right row with Mongo instead of HBase > >> > takes > >> > > 5 > >> > > > seconds. > >> > > > I think that the reason is the method : cur.limit(1) with MongoDB > but > >> > is > >> > > > there no function like this with HBase ? > >> > > > > >> > > > -- > >> > > > Laurent HATIER > >> > > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > >> > > > >> > > > >> > > >> > > >> > -- > >> > Laurent HATIER > >> > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > >> > > >> > > > > http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale> > >
Fuad Efendi 2011-08-11, 03:12
And I LOVE JavaScript-based single-process (and of course single thread) MapReduce Mongo-way :( Sent on the TELUS Mobility network with BlackBerry -----Original Message----- From: Ryan Rawson <[EMAIL PROTECTED]> Date: Thu, 11 Aug 2011 00:57:21 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: Mongo vs HBase Mongodb does an excellent job at single node scalability - they use mmap and many smart things and really kick ass ... ON A SINGLE NODE. That single node must have raid (raid it going out of fashion btw), and you wont be able to scale without resorting to: - replication (complex setup!) - sharding mongo claims to help on the last item, but it is still a risk point. For really large data that must span multiple machines, there is no "clustered sql" type solution that isnt (a) borked in various ways (Oracle RAC I'm looking at you) or (b) stupid expensive (Oracle RAC, STILL looking at you) Tools like HBase give you scalability at the cost of features (no automated secondary indexing, no query language). Welcome... to... big... data. -ryan On Thu, Aug 11, 2011 at 12:44 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > On Wed, Aug 10, 2011 at 4:26 PM, Li Pi <[EMAIL PROTECTED]> wrote: > >> You'll have to build your own secondary indexes for now. >> >> On Wed, Aug 10, 2011 at 1:15 PM, Laurent Hatier <[EMAIL PROTECTED] >> >wrote: >> >> > Yes, i have heard this index but is it available on hbase 0.90.3 ? >> > >> > 2011/8/10 Chris Tarnas <[EMAIL PROTECTED]> >> > >> > > Hi Laurent, >> > > >> > > Without more details on your schema and how you are finding that number >> > in >> > > your table it is impossible to fully answer the question. I suspect >> what >> > you >> > > are seeing is mongo's native support for secondary indexes. If you were >> > to >> > > add secondary indexes in HBase then retrieving that row should be on >> the >> > > order of 3-30ms. If that is you main query method then you could >> > reorganize >> > > your table to make that long number your row key, then you would get >> even >> > > faster reads. >> > > >> > > -chris >> > > >> > > >> > > On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote: >> > > >> > > > Hi all, >> > > > >> > > > I would like to know why MongoDB is faster than HBase to select >> items. >> > > > I explain my case : >> > > > I've inserted 4'000'000 lines into HBase and MongoDB and i must >> > calculate >> > > > the geolocation with the IP. I calculate a Long number with the IP >> and >> > i >> > > go >> > > > to find it into the 4'000'000 lines. >> > > > it's take 5 ms to select the right row with Mongo instead of HBase >> > takes >> > > 5 >> > > > seconds. >> > > > I think that the reason is the method : cur.limit(1) with MongoDB but >> > is >> > > > there no function like this with HBase ? >> > > > >> > > > -- >> > > > Laurent HATIER >> > > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI >> > > >> > > >> > >> > >> > -- >> > Laurent HATIER >> > Étudiant en 2e année du Cycle Ingénieur à l'EISTI >> > >> > > http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale>
There have been a few attempts, some of them up to date, others deprecated. See IHBase, IHTBase, and Lily. Also see https://issues.apache.org/jira/browse/HBASE-3340 Lily is the only one which is up to date. On Wed, Aug 10, 2011 at 8:07 PM, Blake Lemoine <[EMAIL PROTECTED]> wrote: > I'm just curious here. I'm working on a google summer of code project > currently that utilizes HBase and several times now I've made secondary > indices based on what I think are standard practices. Is there any > principled reason that this process couldn't be automated or is it just > that > no one has implemented it yet? > > On Wed, Aug 10, 2011 at 7:57 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > > Mongodb does an excellent job at single node scalability - they use > > mmap and many smart things and really kick ass ... ON A SINGLE NODE. > > > > That single node must have raid (raid it going out of fashion btw), > > and you wont be able to scale without resorting to: > > - replication (complex setup!) > > - sharding > > > > mongo claims to help on the last item, but it is still a risk point. > > > > For really large data that must span multiple machines, there is no > > "clustered sql" type solution that isnt (a) borked in various ways > > (Oracle RAC I'm looking at you) or (b) stupid expensive (Oracle RAC, > > STILL looking at you) > > > > Tools like HBase give you scalability at the cost of features (no > > automated secondary indexing, no query language). > > > > Welcome... to... big... data. > > > > -ryan > > > > On Thu, Aug 11, 2011 at 12:44 AM, Edward Capriolo <[EMAIL PROTECTED] > > > > wrote: > > > On Wed, Aug 10, 2011 at 4:26 PM, Li Pi <[EMAIL PROTECTED]> wrote: > > > > > >> You'll have to build your own secondary indexes for now. > > >> > > >> On Wed, Aug 10, 2011 at 1:15 PM, Laurent Hatier < > > [EMAIL PROTECTED] > > >> >wrote: > > >> > > >> > Yes, i have heard this index but is it available on hbase 0.90.3 ? > > >> > > > >> > 2011/8/10 Chris Tarnas <[EMAIL PROTECTED]> > > >> > > > >> > > Hi Laurent, > > >> > > > > >> > > Without more details on your schema and how you are finding that > > number > > >> > in > > >> > > your table it is impossible to fully answer the question. I > suspect > > >> what > > >> > you > > >> > > are seeing is mongo's native support for secondary indexes. If you > > were > > >> > to > > >> > > add secondary indexes in HBase then retrieving that row should be > on > > >> the > > >> > > order of 3-30ms. If that is you main query method then you could > > >> > reorganize > > >> > > your table to make that long number your row key, then you would > get > > >> even > > >> > > faster reads. > > >> > > > > >> > > -chris > > >> > > > > >> > > > > >> > > On Aug 10, 2011, at 10:02 AM, Laurent Hatier wrote: > > >> > > > > >> > > > Hi all, > > >> > > > > > >> > > > I would like to know why MongoDB is faster than HBase to select > > >> items. > > >> > > > I explain my case : > > >> > > > I've inserted 4'000'000 lines into HBase and MongoDB and i must > > >> > calculate > > >> > > > the geolocation with the IP. I calculate a Long number with the > IP > > >> and > > >> > i > > >> > > go > > >> > > > to find it into the 4'000'000 lines. > > >> > > > it's take 5 ms to select the right row with Mongo instead of > HBase > > >> > takes > > >> > > 5 > > >> > > > seconds. > > >> > > > I think that the reason is the method : cur.limit(1) with > MongoDB > > but > > >> > is > > >> > > > there no function like this with HBase ? > > >> > > > > > >> > > > -- > > >> > > > Laurent HATIER > > >> > > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > > >> > > > > >> > > > > >> > > > >> > > > >> > -- > > >> > Laurent HATIER > > >> > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > > >> > > > >> > > > > > > http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale> > > > > >
Jason Rutherglen 2011-08-11, 05:48
Laurent,
This could be implemented with Lucene, eg, HBASE-3529. Contact me offline if you are interested in pursuing that angle.
Cheers.
On Wed, Aug 10, 2011 at 10:02 AM, Laurent Hatier <[EMAIL PROTECTED]> wrote: > Hi all, > > I would like to know why MongoDB is faster than HBase to select items. > I explain my case : > I've inserted 4'000'000 lines into HBase and MongoDB and i must calculate > the geolocation with the IP. I calculate a Long number with the IP and i go > to find it into the 4'000'000 lines. > it's take 5 ms to select the right row with Mongo instead of HBase takes 5 > seconds. > I think that the reason is the method : cur.limit(1) with MongoDB but is > there no function like this with HBase ? > > -- > Laurent HATIER > Étudiant en 2e année du Cycle Ingénieur à l'EISTI >
Laurent Hatier 2011-08-11, 08:13
Thanks all.
i've seen that there is no limit with HBase. I mean the following statement : "SELECT ... FROM ... LIMIT 1". (Because there is this method with Mongo^^) Is it implemented ?
2011/8/11 Jason Rutherglen <[EMAIL PROTECTED]>
> Laurent, > > This could be implemented with Lucene, eg, HBASE-3529. Contact me > offline if you are interested in pursuing that angle. > > Cheers. > > On Wed, Aug 10, 2011 at 10:02 AM, Laurent Hatier > <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > I would like to know why MongoDB is faster than HBase to select items. > > I explain my case : > > I've inserted 4'000'000 lines into HBase and MongoDB and i must calculate > > the geolocation with the IP. I calculate a Long number with the IP and i > go > > to find it into the 4'000'000 lines. > > it's take 5 ms to select the right row with Mongo instead of HBase takes > 5 > > seconds. > > I think that the reason is the method : cur.limit(1) with MongoDB but is > > there no function like this with HBase ? > > > > -- > > Laurent HATIER > > Étudiant en 2e année du Cycle Ingénieur à l'EISTI > > >
-- Laurent HATIER Étudiant en 2e année du Cycle Ingénieur à l'EISTI
Fuad Efendi 2011-08-11, 13:53
Sorry for off topic, but just as a sample to understand fundamental difference: 1. "SELECT COUNT" will take few hours on MySQL InnoDB in most typical cases, and _it_is_ implemented. 2. Same with HBase: full table scan. However, with MapReduce it might take less time. Or, we can query Solr (Lily-way) to get number of records, but data won't be absolutely correct. Just as a sample Of course, we can "transactionally" store number of records somewhere and _kill_performance_. Another solution is to use fixed-width records (similar to MyISAM) - but data will be sparse etc. Lily provides Hbase -based "Write Ahead Log", Hbase-based "Message Queue", and Hbase -based "Secondary Index" (separate library); and it also provides framework support to subscribe to a queue of messages. -- Fuad Efendi http://www.tokenizer.caOn 11-08-11 4:13 AM, "Laurent Hatier" <[EMAIL PROTECTED]> wrote: >Thanks all. > >i've seen that there is no limit with HBase. I mean the following >statement >: "SELECT ... FROM ... LIMIT 1". (Because there is this method with >Mongo^^) >Is it implemented ? > >2011/8/11 Jason Rutherglen <[EMAIL PROTECTED]> > >> Laurent, >> >> This could be implemented with Lucene, eg, HBASE-3529. Contact me >> offline if you are interested in pursuing that angle. >> >> Cheers. >> >> On Wed, Aug 10, 2011 at 10:02 AM, Laurent Hatier >> <[EMAIL PROTECTED]> wrote: >> > Hi all, >> > >> > I would like to know why MongoDB is faster than HBase to select items. >> > I explain my case : >> > I've inserted 4'000'000 lines into HBase and MongoDB and i must >>calculate >> > the geolocation with the IP. I calculate a Long number with the IP >>and i >> go >> > to find it into the 4'000'000 lines. >> > it's take 5 ms to select the right row with Mongo instead of HBase >>takes >> 5 >> > seconds. >> > I think that the reason is the method : cur.limit(1) with MongoDB but >>is >> > there no function like this with HBase ? >> > >> > -- >> > Laurent HATIER >> > Étudiant en 2e année du Cycle Ingénieur à l'EISTI >> > >> > > > >-- >Laurent HATIER >Étudiant en 2e année du Cycle Ingénieur à l'EISTI
|
|