|
Jack Levin
2010-12-11, 05:59
Stack
2010-12-11, 18:55
Jack Levin
2010-12-17, 19:01
Jonathan Gray
2010-12-17, 19:15
Jack Levin
2010-12-17, 19:27
Jonathan Gray
2010-12-17, 19:57
Jack Levin
2010-12-17, 20:43
Jonathan Gray
2010-12-17, 21:21
Jack Levin
2010-12-17, 21:28
Jonathan Gray
2010-12-17, 21:32
Jack Levin
2010-12-17, 21:45
Jonathan Gray
2010-12-17, 22:26
Andrew Purtell
2010-12-17, 23:56
Jack Levin
2010-12-18, 00:53
Ryan Rawson
2010-12-18, 01:06
Jack Levin
2010-12-18, 01:14
Ryan Rawson
2010-12-18, 01:18
Andrew Purtell
2010-12-18, 01:27
Jack Levin
2010-12-18, 19:28
|
-
question about multi-transaction queriesJack Levin 2010-12-11, 05:59
Hello. We plan to run a set of queries on tables with multiple
columns. What is the most efficient method to say, insert 1000 rows, and/or read 1000 rows. We are considering just using REST. But what about jython? Will it be faster? Another way to have our apps talk to nginx and some sort of app tier running via fast-cgi. Any ideas? -Jack
-
Re: question about multi-transaction queriesStack 2010-12-11, 18:55
How many columns? Its columns right, and not column families?
Are the 1k rows contiguous? Can you Scan? For insert of 1k rows, you know how to do that now, right? Will they be substantial rows -- 10s to 100s of ks? -- or just small? Do you have multiput available in the REST interface, I don't recall. Try REST since you know that interface. Jython might be faster though a test done more than a year ago had jython as slow (http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html) but a bunch has changed since then -- hbase-wise and jython has probably gotten a lot better. If jython route, make sure you keep the interpreter afloat rather than launch it per request (so yes, fastcgi would make sense). St.Ack On Fri, Dec 10, 2010 at 9:59 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > Hello. We plan to run a set of queries on tables with multiple > columns. What is the most efficient method to say, insert 1000 rows, > and/or read 1000 rows. > We are considering just using REST. But what about jython? Will it be > faster? Another way to have our apps talk to nginx and some sort of > app tier running via fast-cgi. > > Any ideas? > > -Jack >
-
Re: question about multi-transaction queriesJack Levin 2010-12-17, 19:01
Lets just say its one row key with two columns. Non contiguous
records. We want to read as fast as possible. So we did some tests, and with MongoDB the random reads of 1000 records is about 80ms. While HBASE with jython is 400ms or so. Question is, as we develop our applications what is the best method to retrieve many rows the fastest way possible? We are talking about 1 client here, not many clients. For many clients, REST seems to be appropriate, but here we have a Frontend server rendering content quickly and we need to reduce the query overhead for HBASE and get data fast. -Jack On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: > How many columns? Its columns right, and not column families? > > Are the 1k rows contiguous? Can you Scan? For insert of 1k rows, you > know how to do that now, right? Will they be substantial rows -- 10s > to 100s of ks? -- or just small? Do you have multiput available in > the REST interface, I don't recall. > > Try REST since you know that interface. Jython might be faster though > a test done more than a year ago had jython as slow > (http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html) > but a bunch has changed since then -- hbase-wise and jython has > probably gotten a lot better. If jython route, make sure you keep the > interpreter afloat rather than launch it per request (so yes, fastcgi > would make sense). > > St.Ack > > On Fri, Dec 10, 2010 at 9:59 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> Hello. We plan to run a set of queries on tables with multiple >> columns. What is the most efficient method to say, insert 1000 rows, >> and/or read 1000 rows. >> We are considering just using REST. But what about jython? Will it be >> faster? Another way to have our apps talk to nginx and some sort of >> app tier running via fast-cgi. >> >> Any ideas? >> >> -Jack >> >
-
RE: question about multi-transaction queriesJonathan Gray 2010-12-17, 19:15
All of my experience doing something like this was with straight Java.
There are MultiGet and MultiPut capabilities in the Java client that will help you out significantly. I played with Jython and HBase a couple years ago and back then the performance was horrible. I never looked back but I have no idea if it's gotten better in the meantime. JG > -----Original Message----- > From: Jack Levin [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 17, 2010 11:01 AM > To: [EMAIL PROTECTED] > Subject: Re: question about multi-transaction queries > > Lets just say its one row key with two columns. Non contiguous records. We > want to read as fast as possible. So we did some tests, and with MongoDB > the random reads of 1000 records is about 80ms. > While HBASE with jython is 400ms or so. > Question is, as we develop our applications what is the best method to > retrieve many rows the fastest way possible? We are talking about 1 client > here, not many clients. For many clients, REST seems to be appropriate, but > here we have a Frontend server rendering content quickly and we need to > reduce the query overhead for HBASE and get data fast. > > -Jack > > On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: > > How many columns? Its columns right, and not column families? > > > > Are the 1k rows contiguous? Can you Scan? For insert of 1k rows, you > > know how to do that now, right? Will they be substantial rows -- 10s > > to 100s of ks? -- or just small? Do you have multiput available in > > the REST interface, I don't recall. > > > > Try REST since you know that interface. Jython might be faster though > > a test done more than a year ago had jython as slow > > (http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase- > impor > > ting.html) but a bunch has changed since then -- hbase-wise and jython > > has probably gotten a lot better. If jython route, make sure you keep > > the interpreter afloat rather than launch it per request (so yes, > > fastcgi would make sense). > > > > St.Ack > > > > On Fri, Dec 10, 2010 at 9:59 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> Hello. We plan to run a set of queries on tables with multiple > >> columns. What is the most efficient method to say, insert 1000 rows, > >> and/or read 1000 rows. > >> We are considering just using REST. But what about jython? Will it > >> be faster? Another way to have our apps talk to nginx and some sort > >> of app tier running via fast-cgi. > >> > >> Any ideas? > >> > >> -Jack > >> > >
-
Re: question about multi-transaction queriesJack Levin 2010-12-17, 19:27
Ok, does it mean though we would incur Java startup cost? Or do you
propose we write some sort of java server that has the JVM running and is able to get multi-get queries? Thanks. -Jack On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > All of my experience doing something like this was with straight Java. > > There are MultiGet and MultiPut capabilities in the Java client that will help you out significantly. > > I played with Jython and HBase a couple years ago and back then the performance was horrible. I never looked back but I have no idea if it's gotten better in the meantime. > > JG > >> -----Original Message----- >> From: Jack Levin [mailto:[EMAIL PROTECTED]] >> Sent: Friday, December 17, 2010 11:01 AM >> To: [EMAIL PROTECTED] >> Subject: Re: question about multi-transaction queries >> >> Lets just say its one row key with two columns. Non contiguous records. We >> want to read as fast as possible. So we did some tests, and with MongoDB >> the random reads of 1000 records is about 80ms. >> While HBASE with jython is 400ms or so. >> Question is, as we develop our applications what is the best method to >> retrieve many rows the fastest way possible? We are talking about 1 client >> here, not many clients. For many clients, REST seems to be appropriate, but >> here we have a Frontend server rendering content quickly and we need to >> reduce the query overhead for HBASE and get data fast. >> >> -Jack >> >> On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: >> > How many columns? Its columns right, and not column families? >> > >> > Are the 1k rows contiguous? Can you Scan? For insert of 1k rows, you >> > know how to do that now, right? Will they be substantial rows -- 10s >> > to 100s of ks? -- or just small? Do you have multiput available in >> > the REST interface, I don't recall. >> > >> > Try REST since you know that interface. Jython might be faster though >> > a test done more than a year ago had jython as slow >> > (http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase- >> impor >> > ting.html) but a bunch has changed since then -- hbase-wise and jython >> > has probably gotten a lot better. If jython route, make sure you keep >> > the interpreter afloat rather than launch it per request (so yes, >> > fastcgi would make sense). >> > >> > St.Ack >> > >> > On Fri, Dec 10, 2010 at 9:59 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> >> Hello. We plan to run a set of queries on tables with multiple >> >> columns. What is the most efficient method to say, insert 1000 rows, >> >> and/or read 1000 rows. >> >> We are considering just using REST. But what about jython? Will it >> >> be faster? Another way to have our apps talk to nginx and some sort >> >> of app tier running via fast-cgi. >> >> >> >> Any ideas? >> >> >> >> -Jack >> >> >> > >
-
RE: question about multi-transaction queriesJonathan Gray 2010-12-17, 19:57
Yes, some kind of running JVM. I would not recommend starting a JVM for each query :)
> -----Original Message----- > From: Jack Levin [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 17, 2010 11:28 AM > To: [EMAIL PROTECTED] > Subject: Re: question about multi-transaction queries > > Ok, does it mean though we would incur Java startup cost? Or do you > propose we write some sort of java server that has the JVM running and is > able to get multi-get queries? > > Thanks. > > -Jack > > On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > All of my experience doing something like this was with straight Java. > > > > There are MultiGet and MultiPut capabilities in the Java client that will help > you out significantly. > > > > I played with Jython and HBase a couple years ago and back then the > performance was horrible. I never looked back but I have no idea if it's > gotten better in the meantime. > > > > JG > > > >> -----Original Message----- > >> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, December 17, 2010 11:01 AM > >> To: [EMAIL PROTECTED] > >> Subject: Re: question about multi-transaction queries > >> > >> Lets just say its one row key with two columns. Non contiguous > >> records. We want to read as fast as possible. So we did some tests, > >> and with MongoDB the random reads of 1000 records is about 80ms. > >> While HBASE with jython is 400ms or so. > >> Question is, as we develop our applications what is the best method > >> to retrieve many rows the fastest way possible? We are talking about > >> 1 client here, not many clients. For many clients, REST seems to be > >> appropriate, but here we have a Frontend server rendering content > >> quickly and we need to reduce the query overhead for HBASE and get > data fast. > >> > >> -Jack > >> > >> On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: > >> > How many columns? Its columns right, and not column families? > >> > > >> > Are the 1k rows contiguous? Can you Scan? For insert of 1k rows, > >> > you know how to do that now, right? Will they be substantial rows > >> > -- 10s to 100s of ks? -- or just small? Do you have multiput > >> > available in the REST interface, I don't recall. > >> > > >> > Try REST since you know that interface. Jython might be faster > >> > though a test done more than a year ago had jython as slow > >> > (http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase- > >> impor > >> > ting.html) but a bunch has changed since then -- hbase-wise and > >> > jython has probably gotten a lot better. If jython route, make > >> > sure you keep the interpreter afloat rather than launch it per > >> > request (so yes, fastcgi would make sense). > >> > > >> > St.Ack > >> > > >> > On Fri, Dec 10, 2010 at 9:59 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> >> Hello. We plan to run a set of queries on tables with multiple > >> >> columns. What is the most efficient method to say, insert 1000 > >> >> rows, and/or read 1000 rows. > >> >> We are considering just using REST. But what about jython? Will > >> >> it be faster? Another way to have our apps talk to nginx and some > >> >> sort of app tier running via fast-cgi. > >> >> > >> >> Any ideas? > >> >> > >> >> -Jack > >> >> > >> > > >
-
Re: question about multi-transaction queriesJack Levin 2010-12-17, 20:43
Do you happen to know if anyone have written or using something like
that as open source? I would imagine this being super useful. There is a question of interface too, I assume it would be TCP. Is there sort of Jetty plugin available? Now I somewhat realize that I am just describing existing REST, but afaik, it does not support multi-get. -Jack On Fri, Dec 17, 2010 at 11:57 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > Yes, some kind of running JVM. I would not recommend starting a JVM for each query :) > >> -----Original Message----- >> From: Jack Levin [mailto:[EMAIL PROTECTED]] >> Sent: Friday, December 17, 2010 11:28 AM >> To: [EMAIL PROTECTED] >> Subject: Re: question about multi-transaction queries >> >> Ok, does it mean though we would incur Java startup cost? Or do you >> propose we write some sort of java server that has the JVM running and is >> able to get multi-get queries? >> >> Thanks. >> >> -Jack >> >> On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: >> > All of my experience doing something like this was with straight Java. >> > >> > There are MultiGet and MultiPut capabilities in the Java client that will help >> you out significantly. >> > >> > I played with Jython and HBase a couple years ago and back then the >> performance was horrible. I never looked back but I have no idea if it's >> gotten better in the meantime. >> > >> > JG >> > >> >> -----Original Message----- >> >> From: Jack Levin [mailto:[EMAIL PROTECTED]] >> >> Sent: Friday, December 17, 2010 11:01 AM >> >> To: [EMAIL PROTECTED] >> >> Subject: Re: question about multi-transaction queries >> >> >> >> Lets just say its one row key with two columns. Non contiguous >> >> records. We want to read as fast as possible. So we did some tests, >> >> and with MongoDB the random reads of 1000 records is about 80ms. >> >> While HBASE with jython is 400ms or so. >> >> Question is, as we develop our applications what is the best method >> >> to retrieve many rows the fastest way possible? We are talking about >> >> 1 client here, not many clients. For many clients, REST seems to be >> >> appropriate, but here we have a Frontend server rendering content >> >> quickly and we need to reduce the query overhead for HBASE and get >> data fast. >> >> >> >> -Jack >> >> >> >> On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: >> >> > How many columns? Its columns right, and not column families? >> >> > >> >> > Are the 1k rows contiguous? Can you Scan? For insert of 1k rows, >> >> > you know how to do that now, right? Will they be substantial rows >> >> > -- 10s to 100s of ks? -- or just small? Do you have multiput >> >> > available in the REST interface, I don't recall. >> >> > >> >> > Try REST since you know that interface. Jython might be faster >> >> > though a test done more than a year ago had jython as slow >> >> > (http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase- >> >> impor >> >> > ting.html) but a bunch has changed since then -- hbase-wise and >> >> > jython has probably gotten a lot better. If jython route, make >> >> > sure you keep the interpreter afloat rather than launch it per >> >> > request (so yes, fastcgi would make sense). >> >> > >> >> > St.Ack >> >> > >> >> > On Fri, Dec 10, 2010 at 9:59 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> >> >> Hello. We plan to run a set of queries on tables with multiple >> >> >> columns. What is the most efficient method to say, insert 1000 >> >> >> rows, and/or read 1000 rows. >> >> >> We are considering just using REST. But what about jython? Will >> >> >> it be faster? Another way to have our apps talk to nginx and some >> >> >> sort of app tier running via fast-cgi. >> >> >> >> >> >> Any ideas? >> >> >> >> >> >> -Jack >> >> >> >> >> > >> > >
-
RE: question about multi-transaction queriesJonathan Gray 2010-12-17, 21:21
I'm not sure exactly what your requirements are but what exactly is your client interface? There is no persistent process anywhere serving client requests?
> -----Original Message----- > From: Jack Levin [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 17, 2010 12:44 PM > To: [EMAIL PROTECTED] > Subject: Re: question about multi-transaction queries > > Do you happen to know if anyone have written or using something like that > as open source? I would imagine this being super useful. There is a question > of interface too, I assume it would be TCP. Is there sort of Jetty plugin > available? Now I somewhat realize that I am just describing existing REST, > but afaik, it does not support multi-get. > > -Jack > > On Fri, Dec 17, 2010 at 11:57 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > Yes, some kind of running JVM. I would not recommend starting a JVM > > for each query :) > > > >> -----Original Message----- > >> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, December 17, 2010 11:28 AM > >> To: [EMAIL PROTECTED] > >> Subject: Re: question about multi-transaction queries > >> > >> Ok, does it mean though we would incur Java startup cost? Or do you > >> propose we write some sort of java server that has the JVM running > >> and is able to get multi-get queries? > >> > >> Thanks. > >> > >> -Jack > >> > >> On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > >> > All of my experience doing something like this was with straight Java. > >> > > >> > There are MultiGet and MultiPut capabilities in the Java client > >> > that will help > >> you out significantly. > >> > > >> > I played with Jython and HBase a couple years ago and back then the > >> performance was horrible. I never looked back but I have no idea if > >> it's gotten better in the meantime. > >> > > >> > JG > >> > > >> >> -----Original Message----- > >> >> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >> >> Sent: Friday, December 17, 2010 11:01 AM > >> >> To: [EMAIL PROTECTED] > >> >> Subject: Re: question about multi-transaction queries > >> >> > >> >> Lets just say its one row key with two columns. Non contiguous > >> >> records. We want to read as fast as possible. So we did some > >> >> tests, and with MongoDB the random reads of 1000 records is about > 80ms. > >> >> While HBASE with jython is 400ms or so. > >> >> Question is, as we develop our applications what is the best > >> >> method to retrieve many rows the fastest way possible? We are > >> >> talking about > >> >> 1 client here, not many clients. For many clients, REST seems to > >> >> be appropriate, but here we have a Frontend server rendering > >> >> content quickly and we need to reduce the query overhead for HBASE > >> >> and get > >> data fast. > >> >> > >> >> -Jack > >> >> > >> >> On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: > >> >> > How many columns? Its columns right, and not column families? > >> >> > > >> >> > Are the 1k rows contiguous? Can you Scan? For insert of 1k > >> >> > rows, you know how to do that now, right? Will they be > >> >> > substantial rows > >> >> > -- 10s to 100s of ks? -- or just small? Do you have multiput > >> >> > available in the REST interface, I don't recall. > >> >> > > >> >> > Try REST since you know that interface. Jython might be faster > >> >> > though a test done more than a year ago had jython as slow > >> >> > (http://ryantwopointoh.blogspot.com/2009/01/performance-of- > hbase > >> >> > - > >> >> impor > >> >> > ting.html) but a bunch has changed since then -- hbase-wise and > >> >> > jython has probably gotten a lot better. If jython route, make > >> >> > sure you keep the interpreter afloat rather than launch it per > >> >> > request (so yes, fastcgi would make sense). > >> >> > > >> >> > St.Ack > >> >> > > >> >> > On Fri, Dec 10, 2010 at 9:59 PM, Jack Levin <[EMAIL PROTECTED]> > wrote: > >> >> >> Hello. We plan to run a set of queries on tables with > >> >> >> multiple columns. What is the most efficient method to say,
-
Re: question about multi-transaction queriesJack Levin 2010-12-17, 21:28
Client is a tcp framework similar to mysql client that should be able to send 1000 gets in one transaction, like a json obj that has all the keys
-Jack On Dec 17, 2010, at 1:21 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > I'm not sure exactly what your requirements are but what exactly is your client interface? There is no persistent process anywhere serving client requests? > >> -----Original Message----- >> From: Jack Levin [mailto:[EMAIL PROTECTED]] >> Sent: Friday, December 17, 2010 12:44 PM >> To: [EMAIL PROTECTED] >> Subject: Re: question about multi-transaction queries >> >> Do you happen to know if anyone have written or using something like that >> as open source? I would imagine this being super useful. There is a question >> of interface too, I assume it would be TCP. Is there sort of Jetty plugin >> available? Now I somewhat realize that I am just describing existing REST, >> but afaik, it does not support multi-get. >> >> -Jack >> >> On Fri, Dec 17, 2010 at 11:57 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: >>> Yes, some kind of running JVM. I would not recommend starting a JVM >>> for each query :) >>> >>>> -----Original Message----- >>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] >>>> Sent: Friday, December 17, 2010 11:28 AM >>>> To: [EMAIL PROTECTED] >>>> Subject: Re: question about multi-transaction queries >>>> >>>> Ok, does it mean though we would incur Java startup cost? Or do you >>>> propose we write some sort of java server that has the JVM running >>>> and is able to get multi-get queries? >>>> >>>> Thanks. >>>> >>>> -Jack >>>> >>>> On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: >>>>> All of my experience doing something like this was with straight Java. >>>>> >>>>> There are MultiGet and MultiPut capabilities in the Java client >>>>> that will help >>>> you out significantly. >>>>> >>>>> I played with Jython and HBase a couple years ago and back then the >>>> performance was horrible. I never looked back but I have no idea if >>>> it's gotten better in the meantime. >>>>> >>>>> JG >>>>> >>>>>> -----Original Message----- >>>>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] >>>>>> Sent: Friday, December 17, 2010 11:01 AM >>>>>> To: [EMAIL PROTECTED] >>>>>> Subject: Re: question about multi-transaction queries >>>>>> >>>>>> Lets just say its one row key with two columns. Non contiguous >>>>>> records. We want to read as fast as possible. So we did some >>>>>> tests, and with MongoDB the random reads of 1000 records is about >> 80ms. >>>>>> While HBASE with jython is 400ms or so. >>>>>> Question is, as we develop our applications what is the best >>>>>> method to retrieve many rows the fastest way possible? We are >>>>>> talking about >>>>>> 1 client here, not many clients. For many clients, REST seems to >>>>>> be appropriate, but here we have a Frontend server rendering >>>>>> content quickly and we need to reduce the query overhead for HBASE >>>>>> and get >>>> data fast. >>>>>> >>>>>> -Jack >>>>>> >>>>>> On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: >>>>>>> How many columns? Its columns right, and not column families? >>>>>>> >>>>>>> Are the 1k rows contiguous? Can you Scan? For insert of 1k >>>>>>> rows, you know how to do that now, right? Will they be >>>>>>> substantial rows >>>>>>> -- 10s to 100s of ks? -- or just small? Do you have multiput >>>>>>> available in the REST interface, I don't recall. >>>>>>> >>>>>>> Try REST since you know that interface. Jython might be faster >>>>>>> though a test done more than a year ago had jython as slow >>>>>>> (http://ryantwopointoh.blogspot.com/2009/01/performance-of- >> hbase >>>>>>> - >>>>>> impor >>>>>>> ting.html) but a bunch has changed since then -- hbase-wise and >>>>>>> jython has probably gotten a lot better. If jython route, make >>>>>>> sure you keep the interpreter afloat rather than launch it per >>>>>>> request (so yes, fastcgi would make sense). >>>>>>> >>>>
-
RE: question about multi-transaction queriesJonathan Gray 2010-12-17, 21:32
I'm not sure I understand.
Are you trying to build a client? Or you want something that behaves like the mysql client? > -----Original Message----- > From: Jack Levin [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 17, 2010 1:28 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: question about multi-transaction queries > > Client is a tcp framework similar to mysql client that should be able to send > 1000 gets in one transaction, like a json obj that has all the keys > > -Jack > > > On Dec 17, 2010, at 1:21 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > I'm not sure exactly what your requirements are but what exactly is your > client interface? There is no persistent process anywhere serving client > requests? > > > >> -----Original Message----- > >> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, December 17, 2010 12:44 PM > >> To: [EMAIL PROTECTED] > >> Subject: Re: question about multi-transaction queries > >> > >> Do you happen to know if anyone have written or using something like > >> that as open source? I would imagine this being super useful. There > >> is a question of interface too, I assume it would be TCP. Is there > >> sort of Jetty plugin available? Now I somewhat realize that I am > >> just describing existing REST, but afaik, it does not support multi-get. > >> > >> -Jack > >> > >> On Fri, Dec 17, 2010 at 11:57 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > >>> Yes, some kind of running JVM. I would not recommend starting a JVM > >>> for each query :) > >>> > >>>> -----Original Message----- > >>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >>>> Sent: Friday, December 17, 2010 11:28 AM > >>>> To: [EMAIL PROTECTED] > >>>> Subject: Re: question about multi-transaction queries > >>>> > >>>> Ok, does it mean though we would incur Java startup cost? Or do > >>>> you propose we write some sort of java server that has the JVM > >>>> running and is able to get multi-get queries? > >>>> > >>>> Thanks. > >>>> > >>>> -Jack > >>>> > >>>> On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray <[EMAIL PROTECTED]> > wrote: > >>>>> All of my experience doing something like this was with straight Java. > >>>>> > >>>>> There are MultiGet and MultiPut capabilities in the Java client > >>>>> that will help > >>>> you out significantly. > >>>>> > >>>>> I played with Jython and HBase a couple years ago and back then > >>>>> the > >>>> performance was horrible. I never looked back but I have no idea > >>>> if it's gotten better in the meantime. > >>>>> > >>>>> JG > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >>>>>> Sent: Friday, December 17, 2010 11:01 AM > >>>>>> To: [EMAIL PROTECTED] > >>>>>> Subject: Re: question about multi-transaction queries > >>>>>> > >>>>>> Lets just say its one row key with two columns. Non contiguous > >>>>>> records. We want to read as fast as possible. So we did some > >>>>>> tests, and with MongoDB the random reads of 1000 records is about > >> 80ms. > >>>>>> While HBASE with jython is 400ms or so. > >>>>>> Question is, as we develop our applications what is the best > >>>>>> method to retrieve many rows the fastest way possible? We are > >>>>>> talking about > >>>>>> 1 client here, not many clients. For many clients, REST seems to > >>>>>> be appropriate, but here we have a Frontend server rendering > >>>>>> content quickly and we need to reduce the query overhead for > >>>>>> HBASE and get > >>>> data fast. > >>>>>> > >>>>>> -Jack > >>>>>> > >>>>>> On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: > >>>>>>> How many columns? Its columns right, and not column families? > >>>>>>> > >>>>>>> Are the 1k rows contiguous? Can you Scan? For insert of 1k > >>>>>>> rows, you know how to do that now, right? Will they be > >>>>>>> substantial rows > >>>>>>> -- 10s to 100s of ks? -- or just small? Do you have multiput > >>>>>>> available in the REST interface, I don't recall. > >>>>>
-
Re: question about multi-transaction queriesJack Levin 2010-12-17, 21:45
We will have php querying hbase over tcp, and we need a connector on the hbase end to return content the fastest way possible
-Jack On Dec 17, 2010, at 1:32 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > I'm not sure I understand. > > Are you trying to build a client? Or you want something that behaves like the mysql client? > >> -----Original Message----- >> From: Jack Levin [mailto:[EMAIL PROTECTED]] >> Sent: Friday, December 17, 2010 1:28 PM >> To: [EMAIL PROTECTED] >> Cc: [EMAIL PROTECTED] >> Subject: Re: question about multi-transaction queries >> >> Client is a tcp framework similar to mysql client that should be able to send >> 1000 gets in one transaction, like a json obj that has all the keys >> >> -Jack >> >> >> On Dec 17, 2010, at 1:21 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: >> >>> I'm not sure exactly what your requirements are but what exactly is your >> client interface? There is no persistent process anywhere serving client >> requests? >>> >>>> -----Original Message----- >>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] >>>> Sent: Friday, December 17, 2010 12:44 PM >>>> To: [EMAIL PROTECTED] >>>> Subject: Re: question about multi-transaction queries >>>> >>>> Do you happen to know if anyone have written or using something like >>>> that as open source? I would imagine this being super useful. There >>>> is a question of interface too, I assume it would be TCP. Is there >>>> sort of Jetty plugin available? Now I somewhat realize that I am >>>> just describing existing REST, but afaik, it does not support multi-get. >>>> >>>> -Jack >>>> >>>> On Fri, Dec 17, 2010 at 11:57 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: >>>>> Yes, some kind of running JVM. I would not recommend starting a JVM >>>>> for each query :) >>>>> >>>>>> -----Original Message----- >>>>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] >>>>>> Sent: Friday, December 17, 2010 11:28 AM >>>>>> To: [EMAIL PROTECTED] >>>>>> Subject: Re: question about multi-transaction queries >>>>>> >>>>>> Ok, does it mean though we would incur Java startup cost? Or do >>>>>> you propose we write some sort of java server that has the JVM >>>>>> running and is able to get multi-get queries? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> -Jack >>>>>> >>>>>> On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray <[EMAIL PROTECTED]> >> wrote: >>>>>>> All of my experience doing something like this was with straight Java. >>>>>>> >>>>>>> There are MultiGet and MultiPut capabilities in the Java client >>>>>>> that will help >>>>>> you out significantly. >>>>>>> >>>>>>> I played with Jython and HBase a couple years ago and back then >>>>>>> the >>>>>> performance was horrible. I never looked back but I have no idea >>>>>> if it's gotten better in the meantime. >>>>>>> >>>>>>> JG >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] >>>>>>>> Sent: Friday, December 17, 2010 11:01 AM >>>>>>>> To: [EMAIL PROTECTED] >>>>>>>> Subject: Re: question about multi-transaction queries >>>>>>>> >>>>>>>> Lets just say its one row key with two columns. Non contiguous >>>>>>>> records. We want to read as fast as possible. So we did some >>>>>>>> tests, and with MongoDB the random reads of 1000 records is about >>>> 80ms. >>>>>>>> While HBASE with jython is 400ms or so. >>>>>>>> Question is, as we develop our applications what is the best >>>>>>>> method to retrieve many rows the fastest way possible? We are >>>>>>>> talking about >>>>>>>> 1 client here, not many clients. For many clients, REST seems to >>>>>>>> be appropriate, but here we have a Frontend server rendering >>>>>>>> content quickly and we need to reduce the query overhead for >>>>>>>> HBASE and get >>>>>> data fast. >>>>>>>> >>>>>>>> -Jack >>>>>>>> >>>>>>>> On Sat, Dec 11, 2010 at 10:55 AM, Stack <[EMAIL PROTECTED]> wrote: >>>>>>>>> How many columns? Its columns right, and not column families? >>>>>>>>> >>>>>>>>> Are the 1k rows contiguous? Can you Scan? For insert of 1k
-
RE: question about multi-transaction queriesJonathan Gray 2010-12-17, 22:26
Have you looked at the thrift support? Plenty of people are using HBase from PHP via Thrift.
I don't think there is MultiPut or MultiGet support but there is work currently underway updating the thrift API. I imagine those two could be added. > -----Original Message----- > From: Jack Levin [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 17, 2010 1:45 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: question about multi-transaction queries > > We will have php querying hbase over tcp, and we need a connector on the > hbase end to return content the fastest way possible > > -Jack > > > On Dec 17, 2010, at 1:32 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > I'm not sure I understand. > > > > Are you trying to build a client? Or you want something that behaves like > the mysql client? > > > >> -----Original Message----- > >> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, December 17, 2010 1:28 PM > >> To: [EMAIL PROTECTED] > >> Cc: [EMAIL PROTECTED] > >> Subject: Re: question about multi-transaction queries > >> > >> Client is a tcp framework similar to mysql client that should be able > >> to send > >> 1000 gets in one transaction, like a json obj that has all the keys > >> > >> -Jack > >> > >> > >> On Dec 17, 2010, at 1:21 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > >> > >>> I'm not sure exactly what your requirements are but what exactly is > >>> your > >> client interface? There is no persistent process anywhere serving > >> client requests? > >>> > >>>> -----Original Message----- > >>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >>>> Sent: Friday, December 17, 2010 12:44 PM > >>>> To: [EMAIL PROTECTED] > >>>> Subject: Re: question about multi-transaction queries > >>>> > >>>> Do you happen to know if anyone have written or using something > >>>> like that as open source? I would imagine this being super useful. > >>>> There is a question of interface too, I assume it would be TCP. Is > >>>> there sort of Jetty plugin available? Now I somewhat realize that > >>>> I am just describing existing REST, but afaik, it does not support multi- > get. > >>>> > >>>> -Jack > >>>> > >>>> On Fri, Dec 17, 2010 at 11:57 AM, Jonathan Gray <[EMAIL PROTECTED]> > wrote: > >>>>> Yes, some kind of running JVM. I would not recommend starting a > >>>>> JVM for each query :) > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >>>>>> Sent: Friday, December 17, 2010 11:28 AM > >>>>>> To: [EMAIL PROTECTED] > >>>>>> Subject: Re: question about multi-transaction queries > >>>>>> > >>>>>> Ok, does it mean though we would incur Java startup cost? Or do > >>>>>> you propose we write some sort of java server that has the JVM > >>>>>> running and is able to get multi-get queries? > >>>>>> > >>>>>> Thanks. > >>>>>> > >>>>>> -Jack > >>>>>> > >>>>>> On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray <[EMAIL PROTECTED]> > >> wrote: > >>>>>>> All of my experience doing something like this was with straight > Java. > >>>>>>> > >>>>>>> There are MultiGet and MultiPut capabilities in the Java client > >>>>>>> that will help > >>>>>> you out significantly. > >>>>>>> > >>>>>>> I played with Jython and HBase a couple years ago and back then > >>>>>>> the > >>>>>> performance was horrible. I never looked back but I have no idea > >>>>>> if it's gotten better in the meantime. > >>>>>>> > >>>>>>> JG > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Jack Levin [mailto:[EMAIL PROTECTED]] > >>>>>>>> Sent: Friday, December 17, 2010 11:01 AM > >>>>>>>> To: [EMAIL PROTECTED] > >>>>>>>> Subject: Re: question about multi-transaction queries > >>>>>>>> > >>>>>>>> Lets just say its one row key with two columns. Non contiguous > >>>>>>>> records. We want to read as fast as possible. So we did some > >>>>>>>> tests, and with MongoDB the random reads of 1000 records is > >>>>>>>> about > >>>> 80ms. > >>>>>>>> While HBASE with jython is 400ms or so. > >>>>>
-
Re: question about multi-transaction queriesAndrew Purtell 2010-12-17, 23:56
> We will have php querying hbase over tcp, and we need a
> connector on the hbase end to return content the fastest > way possible Typically the Thrift connector is used for this. - Andy
-
Re: question about multi-transaction queriesJack Levin 2010-12-18, 00:53
So the language in question for a client is not in question. Rather
the connector to hbase. The end goal is to be able to say send only 5 GETs to get 1000 records quickly, rather then sending 1000 GETs to get 1000 records slowly. So, besides the raw api functionality via Java, I assume there is no multi-get in REST? So, the design might have to look like this. Create a connector to HBASE thats loaded by Jetty, and have it act as a client face API that would get a string of key to run GETs from. Example: "GET /connector/table'{1,11,23,17,180,533,N,..}" (where N is a key) is a query that will run multi-get via the connector and return all values rapidly. Another question is, have anyone done a sort of connector before? -Jack On Fri, Dec 17, 2010 at 3:56 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> We will have php querying hbase over tcp, and we need a >> connector on the hbase end to return content the fastest >> way possible > > Typically the Thrift connector is used for this. > > - Andy > > > > >
-
Re: question about multi-transaction queriesRyan Rawson 2010-12-18, 01:06
The multi interface in 0.90 will minimize rpc calls from the client to the
region server. This isn't exposed in the thrift api but would be trivial to do so. On Dec 17, 2010 4:53 PM, "Jack Levin" <[EMAIL PROTECTED]> wrote: > So the language in question for a client is not in question. Rather > the connector to hbase. The end goal is to be able to say send only 5 > GETs to get 1000 records quickly, rather then sending 1000 GETs to get > 1000 records slowly. So, besides the raw api functionality via Java, > I assume there is no multi-get in REST? > > So, the design might have to look like this. Create a connector to > HBASE thats loaded by Jetty, and have it act as a client face API that > would get a string of key to run GETs from. Example: > "GET /connector/table'{1,11,23,17,180,533,N,..}" (where N is a key) > is a query that will run multi-get via the connector and return all > values rapidly. Another question is, have anyone done a sort of > connector before? > > -Jack > > On Fri, Dec 17, 2010 at 3:56 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>> We will have php querying hbase over tcp, and we need a >>> connector on the hbase end to return content the fastest >>> way possible >> >> Typically the Thrift connector is used for this. >> >> - Andy >> >> >> >> >>
-
Re: question about multi-transaction queriesJack Levin 2010-12-18, 01:14
So, is scanner a worthwhile method to use to get a bunch of rows that
might be random? -Jack On Fri, Dec 17, 2010 at 5:06 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > The multi interface in 0.90 will minimize rpc calls from the client to the > region server. This isn't exposed in the thrift api but would be trivial to > do so. > On Dec 17, 2010 4:53 PM, "Jack Levin" <[EMAIL PROTECTED]> wrote: >> So the language in question for a client is not in question. Rather >> the connector to hbase. The end goal is to be able to say send only 5 >> GETs to get 1000 records quickly, rather then sending 1000 GETs to get >> 1000 records slowly. So, besides the raw api functionality via Java, >> I assume there is no multi-get in REST? >> >> So, the design might have to look like this. Create a connector to >> HBASE thats loaded by Jetty, and have it act as a client face API that >> would get a string of key to run GETs from. Example: >> "GET /connector/table'{1,11,23,17,180,533,N,..}" (where N is a key) >> is a query that will run multi-get via the connector and return all >> values rapidly. Another question is, have anyone done a sort of >> connector before? >> >> -Jack >> >> On Fri, Dec 17, 2010 at 3:56 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: >>>> We will have php querying hbase over tcp, and we need a >>>> connector on the hbase end to return content the fastest >>>> way possible >>> >>> Typically the Thrift connector is used for this. >>> >>> - Andy >>> >>> >>> >>> >>> >
-
Re: question about multi-transaction queriesRyan Rawson 2010-12-18, 01:18
Only if they are clustered
On Dec 17, 2010 5:15 PM, "Jack Levin" <[EMAIL PROTECTED]> wrote: > So, is scanner a worthwhile method to use to get a bunch of rows that > might be random? > > -Jack > > On Fri, Dec 17, 2010 at 5:06 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> The multi interface in 0.90 will minimize rpc calls from the client to the >> region server. This isn't exposed in the thrift api but would be trivial to >> do so. >> On Dec 17, 2010 4:53 PM, "Jack Levin" <[EMAIL PROTECTED]> wrote: >>> So the language in question for a client is not in question. Rather >>> the connector to hbase. The end goal is to be able to say send only 5 >>> GETs to get 1000 records quickly, rather then sending 1000 GETs to get >>> 1000 records slowly. So, besides the raw api functionality via Java, >>> I assume there is no multi-get in REST? >>> >>> So, the design might have to look like this. Create a connector to >>> HBASE thats loaded by Jetty, and have it act as a client face API that >>> would get a string of key to run GETs from. Example: >>> "GET /connector/table'{1,11,23,17,180,533,N,..}" (where N is a key) >>> is a query that will run multi-get via the connector and return all >>> values rapidly. Another question is, have anyone done a sort of >>> connector before? >>> >>> -Jack >>> >>> On Fri, Dec 17, 2010 at 3:56 PM, Andrew Purtell <[EMAIL PROTECTED]> >> wrote: >>>>> We will have php querying hbase over tcp, and we need a >>>>> connector on the hbase end to return content the fastest >>>>> way possible >>>> >>>> Typically the Thrift connector is used for this. >>>> >>>> - Andy >>>> >>>> >>>> >>>> >>>> >>
-
Re: question about multi-transaction queriesAndrew Purtell 2010-12-18, 01:27
I'm sorry, I'm having trouble following what seems like two XY turns in this conversation. Or it could be that I'm just suffering from sleep debt accumulated over the week.
We suggest the Thrift interface not because of language/interoperability considerations but because the operations supported by the Thrift interface are pretty close to the Java API by design. Also, you connect to it via a persistent TCP connection and requests and responses are streamed across, unlike how REST typically is used. When Ryan talks about MultiGet, this is not scanning. It may be that on the regionservers Gets are internally little scans but on the client this is not the same thing as using a Scanner. MultiGet is like scatter/gather of Gets. The Java API supports it therefore the Thrift API could. > I assume there is no multi-get in REST? Correct, see https://issues.apache.org/jira/browse/HBASE-2390 Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --- On Fri, 12/17/10, Jack Levin <[EMAIL PROTECTED]> wrote: > From: Jack Levin <[EMAIL PROTECTED]> > Subject: Re: question about multi-transaction queries > To: [EMAIL PROTECTED] > Date: Friday, December 17, 2010, 5:14 PM > So, is scanner a worthwhile method to use to get a bunch of rows > that might be random? > > -Jack > > On Fri, Dec 17, 2010 at 5:06 PM, Ryan Rawson <[EMAIL PROTECTED]> > wrote: > > > > The multi interface in 0.90 will minimize rpc calls > > from the client to the region server. This isn't exposed > > in the thrift api but would be trivial to do so. > > > > On Dec 17, 2010 4:53 PM, "Jack Levin" <[EMAIL PROTECTED]> > wrote: > > > So the language in question for a client is not in > > > question. Rather the connector to hbase. The end goal > > > is to be able to say send only 5 GETs to get 1000 > > > records quickly, rather then sending 1000 GETs to get > > > 1000 records slowly. So, besides the raw api > > > functionality via Java, I assume there is no multi-get > > > in REST?
-
Re: question about multi-transaction queriesJack Levin 2010-12-18, 19:28
I guess what I really want to do is make sure that hbase api calls are
most efficient. The issue is not the front end tcp persistency. Ryan says: >> > The multi interface in 0.90 will minimize rpc calls >> > from the client to the region server. This isn't exposed >> > in the thrift api but would be trivial to do so. So this is what we want. Its an equivalent of mysql query (select * from table where id in (1,4,5,11,23,N,..);) Looks like its not available in 0.89. So we may end up either waiting until we upgrade to 0.90. Or structure hbase tables so that multi-get is not required, or otherwise cached by either memcached, or mysql temporary ram tables that will support multi-get. -Jack On Fri, Dec 17, 2010 at 5:27 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > I'm sorry, I'm having trouble following what seems like two XY turns in this conversation. Or it could be that I'm just suffering from sleep debt accumulated over the week. > > We suggest the Thrift interface not because of language/interoperability considerations but because the operations supported by the Thrift interface are pretty close to the Java API by design. Also, you connect to it via a persistent TCP connection and requests and responses are streamed across, unlike how REST typically is used. > > When Ryan talks about MultiGet, this is not scanning. It may be that on the regionservers Gets are internally little scans but on the client this is not the same thing as using a Scanner. MultiGet is like scatter/gather of Gets. The Java API supports it therefore the Thrift API could. > >> I assume there is no multi-get in REST? > > Correct, see https://issues.apache.org/jira/browse/HBASE-2390 > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. > - Piet Hein (via Tom White) > > > --- On Fri, 12/17/10, Jack Levin <[EMAIL PROTECTED]> wrote: > >> From: Jack Levin <[EMAIL PROTECTED]> >> Subject: Re: question about multi-transaction queries >> To: [EMAIL PROTECTED] >> Date: Friday, December 17, 2010, 5:14 PM >> So, is scanner a worthwhile method to use to get a bunch of rows >> that might be random? >> >> -Jack >> >> On Fri, Dec 17, 2010 at 5:06 PM, Ryan Rawson <[EMAIL PROTECTED]> >> wrote: >> > >> > The multi interface in 0.90 will minimize rpc calls >> > from the client to the region server. This isn't exposed >> > in the thrift api but would be trivial to do so. >> > >> > On Dec 17, 2010 4:53 PM, "Jack Levin" <[EMAIL PROTECTED]> >> wrote: >> > > So the language in question for a client is not in >> > > question. Rather the connector to hbase. The end goal >> > > is to be able to say send only 5 GETs to get 1000 >> > > records quickly, rather then sending 1000 GETs to get >> > > 1000 records slowly. So, besides the raw api >> > > functionality via Java, I assume there is no multi-get >> > > in REST? > > > > > |