|
Bertrand Dechoux
2012-08-27, 16:00
Raghunath, Ranjith
2012-08-27, 16:03
Carl Steinbach
2012-08-27, 19:15
Bertrand Dechoux
2012-08-27, 21:24
Carl Steinbach
2012-08-27, 22:04
Bertrand Dechoux
2012-08-27, 22:27
Ranjith
2012-08-28, 01:41
|
-
HiveServer can not handle concurrent requests from more than one client?Bertrand Dechoux 2012-08-27, 16:00
Hi,
I would like to have more information about this specific sentence from the documentation. "HiveServer can not handle concurrent requests from more than one client." https://cwiki.apache.org/Hive/hiveserver.html Does it mean it is not possible with this server to provide a JDBC access to an 'almost closed' environment for multiple users? Regards Bertrand
-
RE: HiveServer can not handle concurrent requests from more than one client?Raghunath, Ranjith 2012-08-27, 16:03
Bertrand,
The Hive Server is a thrift service that provides an interface for Hive. You can connect to it using JDBC. It is not sure (out of box) as there is no userid and password restrictions. On the concurrency part, it is single threaded.......one query gets executed after the other. Thanks, Ranjith From: Bertrand Dechoux [mailto:[EMAIL PROTECTED]] Sent: Monday, August 27, 2012 11:01 AM To: [EMAIL PROTECTED] Subject: HiveServer can not handle concurrent requests from more than one client? Hi, I would like to have more information about this specific sentence from the documentation. "HiveServer can not handle concurrent requests from more than one client." https://cwiki.apache.org/Hive/hiveserver.html Does it mean it is not possible with this server to provide a JDBC access to an 'almost closed' environment for multiple users? Regards Bertrand
-
Re: HiveServer can not handle concurrent requests from more than one client?Carl Steinbach 2012-08-27, 19:15
HiveServer is multi-threaded, but there is a defect in the current
HiveServer Thrift API that prevents it from robustly handling concurrent connections. This problem is described in more detail here: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API Thanks. Carl On Mon, Aug 27, 2012 at 9:03 AM, Raghunath, Ranjith < [EMAIL PROTECTED]> wrote: > Bertrand,**** > > ** ** > > The Hive Server is a thrift service that provides an interface for Hive. > You can connect to it using JDBC. It is not sure (out of box) as there is > no userid and password restrictions. On the concurrency part, it is single > threaded…….one query gets executed after the other.**** > > ** ** > > Thanks,**** > > Ranjith**** > > ** ** > > *From:* Bertrand Dechoux [mailto:[EMAIL PROTECTED]] > *Sent:* Monday, August 27, 2012 11:01 AM > *To:* [EMAIL PROTECTED] > *Subject:* HiveServer can not handle concurrent requests from more than > one client?**** > > ** ** > > Hi, > > I would like to have more information about this specific sentence from > the documentation. > "HiveServer can not handle concurrent requests from more than one client." > https://cwiki.apache.org/Hive/hiveserver.html > > Does it mean it is not possible with this server to provide a JDBC access > to an 'almost closed' environment for multiple users? > > Regards > > Bertrand**** >
-
Re: HiveServer can not handle concurrent requests from more than one client?Bertrand Dechoux 2012-08-27, 21:24
Thanks for the answers.
I had already read it but both pages (and the jira) are not very explicit about the problem. According to the proposal for HiveServer2, the current hive server provides no insurance about "session state in between calls". If that was all, it is something that can be lived with. It only means that for a JDBC client, all requests should be conceived as isolated. The page of the Hive Server (1) says "HiveServer can not handle concurrent requests from more than one client." According to the jira, one may run into issues when multiples users are running it. Is that true regardless of the configuration? It should not be interpreted as "query will be executed one after the other", like Ranjiht said? Eg what would be the impact of hive.exec.parallel or hive.support.concurrency? What would be the recommended way for providing a hive access to multiple users to a production environnement which is thightly fire walled? Ssh is not a viable solution in my context and the hive web interface does not seem mature enough. Bertrand On Mon, Aug 27, 2012 at 9:15 PM, Carl Steinbach <[EMAIL PROTECTED]> wrote: > HiveServer is multi-threaded, but there is a defect in the current > HiveServer Thrift API that prevents it from robustly handling concurrent > connections. This problem is described in more detail here: > > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API > > Thanks. > > Carl > > On Mon, Aug 27, 2012 at 9:03 AM, Raghunath, Ranjith < > [EMAIL PROTECTED]> wrote: > >> Bertrand,**** >> >> ** ** >> >> The Hive Server is a thrift service that provides an interface for Hive. >> You can connect to it using JDBC. It is not sure (out of box) as there is >> no userid and password restrictions. On the concurrency part, it is single >> threaded…….one query gets executed after the other.**** >> >> ** ** >> >> Thanks,**** >> >> Ranjith**** >> >> ** ** >> >> *From:* Bertrand Dechoux [mailto:[EMAIL PROTECTED]] >> *Sent:* Monday, August 27, 2012 11:01 AM >> *To:* [EMAIL PROTECTED] >> *Subject:* HiveServer can not handle concurrent requests from more than >> one client?**** >> >> ** ** >> >> Hi, >> >> I would like to have more information about this specific sentence from >> the documentation. >> "HiveServer can not handle concurrent requests from more than one client." >> https://cwiki.apache.org/Hive/hiveserver.html >> >> Does it mean it is not possible with this server to provide a JDBC access >> to an 'almost closed' environment for multiple users? >> >> Regards >> >> Bertrand**** >> > > -- Bertrand Dechoux
-
Re: HiveServer can not handle concurrent requests from more than one client?Carl Steinbach 2012-08-27, 22:04
Hi Bertrand,
According to the proposal for HiveServer2, the current hive server provides > no insurance about "session state in between calls". > If that was all, it is something that can be lived with. It only means > that for a JDBC client, all requests should be conceived as isolated. > In the HiveServer Thrift API Execute() and Fetch() are two separate calls and require two separate RPCs. In between these calls HiveServer has to maintain session state so that when the Fetch() call is made it knows which result set to look at. The current HiveServer Thrift API assumes that Thrift will consistently map the same physical connection to the same Thrift worker thread, and consequently it stores the session state in a thread local variable. Unfortunately, this assumption is false. It's possible to live with this limitation if you're ok with sometimes fetching other people's result sets instead of your own. > The page of the Hive Server (1) says "HiveServer can not handle > concurrent requests from more than one client." > According to the jira, one may run into issues when multiples users are > running it. Is that true regardless of the configuration? > It should not be interpreted as "query will be executed one after the > other", like Ranjiht said? > Yes, this is true regardless of the configuration. Ranjiht's statement is incorrect. > Eg what would be the impact of hive.exec.parallel or > hive.support.concurrency? > These two configuration properties are actually completely orthogonal to the HiveServer multi-client issue, though it's hard to know that since the configuration property names were very poorly chosen. hive.exec.parallel controls whether or not the the MR jobs in the query plan DAG are executed in parallel on the cluster (https://issues.apache.org/jira/browse/HIVE-549). hive.support.concurrency controls whether or not Hive supports coarse-grained locks on tables and partitions (see https://cwiki.apache.org/confluence/display/Hive/Locking). > What would be the recommended way for providing a hive access to multiple > users to a production environnement which is thightly fire walled? Ssh is > not a viable solution in my context and the hive web interface does not > seem mature enough. > I recommend taking a look at the Beeswax web interface for Hive. More details (including screenshots) are available here: https://ccp.cloudera.com/display/CDHDOC/Beeswax Thanks. Carl
-
Re: HiveServer can not handle concurrent requests from more than one client?Bertrand Dechoux 2012-08-27, 22:27
Thanks a lot.
> It's possible to live with this limitation if you're ok with sometimes fetching other people's result sets instead of your own. I hadn't thought about that, only about the states of variables. That consequence isn't nice. It won't be a security issue really in my context but that can be very inconvenient. > Yes, this is true regardless of the configuration. Ranjiht's statement is incorrect. Ok, so the only true solution, as proposed in the jira is to 'serialize' the calls with a kind of proxy like a queue. But that would go against the multi users goals and relatively low latency that Hive could provide. > These two configuration properties are actually completely orthogonal to the HiveServer multi-client issue I thought so but wasn't sure. Thank you for the full explanation and making clear what is the difference. > I recommend taking a look at the Beeswax web interface for Hive. More details (including screenshots) are available here: https://ccp.cloudera.com/display/CDHDOC/Beeswax I know about that but I am afraid that it would mean changing the distribution which is currently used which is not a small thing. But I will consider that solution more seriously. I take it from your answer that the backend is different? I could not find much information about it and wasn't sure if the same issues applied to Beeswax. Thanks a lot, again. Bertrand On Tue, Aug 28, 2012 at 12:04 AM, Carl Steinbach <[EMAIL PROTECTED]> wrote: > Hi Bertrand, > > According to the proposal for HiveServer2, the current hive server >> provides no insurance about "session state in between calls". >> If that was all, it is something that can be lived with. It only means >> that for a JDBC client, all requests should be conceived as isolated. >> > > In the HiveServer Thrift API Execute() and Fetch() are two separate calls > and require two separate RPCs. In between these calls HiveServer has to > maintain session state so that when the Fetch() call is made it knows which > result set to look at. The current HiveServer Thrift API assumes that > Thrift will consistently map the same physical connection to the same > Thrift worker thread, and consequently it stores the session state in a > thread local variable. Unfortunately, this assumption is false. It's > possible to live with this limitation if you're ok with sometimes fetching > other people's result sets instead of your own. > > >> The page of the Hive Server (1) says "HiveServer can not handle >> concurrent requests from more than one client." >> According to the jira, one may run into issues when multiples users are >> running it. Is that true regardless of the configuration? >> It should not be interpreted as "query will be executed one after the >> other", like Ranjiht said? >> > > Yes, this is true regardless of the configuration. Ranjiht's statement is > incorrect. > > >> Eg what would be the impact of hive.exec.parallel or >> hive.support.concurrency? >> > > These two configuration properties are actually completely orthogonal to > the HiveServer multi-client issue, though it's hard to know that since the > configuration property names were very poorly chosen. hive.exec.parallel > controls whether or not the the MR jobs in the query plan DAG are executed > in parallel on the cluster (https://issues.apache.org/jira/browse/HIVE-549). > hive.support.concurrency controls whether or not Hive supports > coarse-grained locks on tables and partitions (see > https://cwiki.apache.org/confluence/display/Hive/Locking). > > >> What would be the recommended way for providing a hive access to multiple >> users to a production environnement which is thightly fire walled? Ssh is >> not a viable solution in my context and the hive web interface does not >> seem mature enough. >> > > I recommend taking a look at the Beeswax web interface for Hive. More > details (including screenshots) are available here: > https://ccp.cloudera.com/display/CDHDOC/Beeswax > > Thanks. > > Carl > > -- Bertrand Dechoux
-
Re: HiveServer can not handle concurrent requests from more than one client?Ranjith 2012-08-28, 01:41
thanks guys for the clarification. What about multiple queries run through a single session? Do they get queued and executed one after the other?
Thanks, Ranjith On Aug 27, 2012, at 5:27 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote: > Thanks a lot. > > > It's possible to live with this limitation if you're ok with sometimes fetching other people's result sets instead of your own. > I hadn't thought about that, only about the states of variables. That consequence isn't nice. It won't be a security issue really in my context but that can be very inconvenient. > > > Yes, this is true regardless of the configuration. Ranjiht's statement is incorrect. > Ok, so the only true solution, as proposed in the jira is to 'serialize' the calls with a kind of proxy like a queue. But that would go against the multi users goals and relatively low latency that Hive could provide. > > > These two configuration properties are actually completely orthogonal to the HiveServer multi-client issue > I thought so but wasn't sure. Thank you for the full explanation and making clear what is the difference. > > > I recommend taking a look at the Beeswax web interface for Hive. More details (including screenshots) are available here: https://ccp.cloudera.com/display/CDHDOC/Beeswax > > I know about that but I am afraid that it would mean changing the distribution which is currently used which is not a small thing. But I will consider that solution more seriously. I take it from your answer that the backend is different? I could not find much information about it and wasn't sure if the same issues applied to Beeswax. > > Thanks a lot, again. > > Bertrand > > On Tue, Aug 28, 2012 at 12:04 AM, Carl Steinbach <[EMAIL PROTECTED]> wrote: > Hi Bertrand, > > According to the proposal for HiveServer2, the current hive server provides no insurance about "session state in between calls". > If that was all, it is something that can be lived with. It only means that for a JDBC client, all requests should be conceived as isolated. > > In the HiveServer Thrift API Execute() and Fetch() are two separate calls and require two separate RPCs. In between these calls HiveServer has to maintain session state so that when the Fetch() call is made it knows which result set to look at. The current HiveServer Thrift API assumes that Thrift will consistently map the same physical connection to the same Thrift worker thread, and consequently it stores the session state in a thread local variable. Unfortunately, this assumption is false. It's possible to live with this limitation if you're ok with sometimes fetching other people's result sets instead of your own. > > The page of the Hive Server (1) says "HiveServer can not handle concurrent requests from more than one client." > According to the jira, one may run into issues when multiples users are running it. Is that true regardless of the configuration? > It should not be interpreted as "query will be executed one after the other", like Ranjiht said? > > Yes, this is true regardless of the configuration. Ranjiht's statement is incorrect. > > Eg what would be the impact of hive.exec.parallel or hive.support.concurrency? > > These two configuration properties are actually completely orthogonal to the HiveServer multi-client issue, though it's hard to know that since the configuration property names were very poorly chosen. hive.exec.parallel controls whether or not the the MR jobs in the query plan DAG are executed in parallel on the cluster (https://issues.apache.org/jira/browse/HIVE-549). hive.support.concurrency controls whether or not Hive supports coarse-grained locks on tables and partitions (see https://cwiki.apache.org/confluence/display/Hive/Locking). > > What would be the recommended way for providing a hive access to multiple users to a production environnement which is thightly fire walled? Ssh is not a viable solution in my context and the hive web interface does not seem mature enough. > > I recommend taking a look at the Beeswax web interface for Hive. More details (including screenshots) are available here: https://ccp.cloudera.com/display/CDHDOC/Beeswax |