|
Something Something
2012-09-17, 05:38
Balaraman, Anand
2012-09-17, 05:48
Tim Robertson
2012-09-17, 05:51
Something Something
2012-09-17, 06:07
Tim Robertson
2012-09-17, 07:15
Hamilton, Robert
2012-09-17, 14:42
MiaoMiao
2012-09-18, 01:28
Ricky Saltzer
2012-09-18, 04:36
|
-
Questions about HiveSomething Something 2012-09-17, 05:38
Note: I am a newbie to Hive.
Can someone please answer the following questions? 1) Does Hive provide APIs (like HBase does) that can be used to retrieve data from the tables in Hive from a Java program? I heard somewhere that the data can be accessed with JDBC (style) APIs. True? 2) I don't see how I can add indexes on the tables, so does that mean a query such as the following will trigger a MR job that will search files on HDFS sequentially? hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; 3) Has anyone compared performance of Hive against other NOSQL databases such as HBase, MongoDB. I understand it's not exactly apples to apples comparison, but still... Thanks.
-
RE: Questions about HiveBalaraman, Anand 2012-09-17, 05:48
Regarding usage of APIs to work on HIVE, here is a tip:
Try using a JDBC connector (like 'hive-jdbc-0.7.1-cdh3u1.jar') as a plugin in any querying tool such as DbVisualizer. I am connecting to hive using the above setup as well as using SQL Explorer plugin in Eclipse. Regards Anand B From: Something Something [mailto:[EMAIL PROTECTED]] Sent: 17 September 2012 11:09 To: [EMAIL PROTECTED] Subject: Questions about Hive Note: I am a newbie to Hive. Can someone please answer the following questions? 1) Does Hive provide APIs (like HBase does) that can be used to retrieve data from the tables in Hive from a Java program? I heard somewhere that the data can be accessed with JDBC (style) APIs. True? 2) I don't see how I can add indexes on the tables, so does that mean a query such as the following will trigger a MR job that will search files on HDFS sequentially? hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; 3) Has anyone compared performance of Hive against other NOSQL databases such as HBase, MongoDB. I understand it's not exactly apples to apples comparison, but still... Thanks. Confidential: This electronic message and all contents contain information from Syntel, Inc. which may be privileged, confidential or otherwise protected from disclosure. The information is intended to be for the addressee only. If you are not the addressee, any disclosure, copy, distribution or use of the contents of this message is prohibited. If you have received this electronic message in error, please notify the sender immediately and destroy the original message and all copies.
-
Re: Questions about HiveTim Robertson 2012-09-17, 05:51
>
> Note: I am a newbie to Hive. > > Can someone please answer the following questions? > > 1) Does Hive provide APIs (like HBase does) that can be used to retrieve > data from the tables in Hive from a Java program? I heard somewhere that > the data can be accessed with JDBC (style) APIs. True? > True. https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC > 2) I don't see how I can add indexes on the tables, so does that mean a > query such as the following will trigger a MR job that will search files on > HDFS sequentially? > > hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; > > There are some index implementations in hive, but it is not as simple as a traditional db. E.g. Search Jira and see some of the work: https://issues.apache.org/jira/browse/HIVE-417 You are correct that the above would do a full table scan 3) Has anyone compared performance of Hive against other NOSQL databases > such as HBase, MongoDB. I understand it's not exactly apples to apples > comparison, but still... > I think you misunderstand what Hive is. It is a basically a SQL to MR translation engine, which has adapters for the input source. By default it uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you can use it to run SQL on HBase tables for example (which works great). Regarding performance, on the HBase scans, the operation is the same as running a normal HBase MR scan, so is the same. > > Thanks.
-
Re: Questions about HiveSomething Something 2012-09-17, 06:07
Thank you both for the answers. We are trying to find out if Hive can be
used as a replacement of Netezza, but if there are no indexes then I don't see how it will beat Netezza in terms of performance. Sounds like it certainly can't be used to do a quick lookup from a webapp - like Netezza can. If performance isn't a concern, then I guess it could be a useful tool. Will try it out & see how it works out. Thanks. On Sun, Sep 16, 2012 at 10:51 PM, Tim Robertson <[EMAIL PROTECTED]>wrote: > Note: I am a newbie to Hive. >> >> Can someone please answer the following questions? >> >> 1) Does Hive provide APIs (like HBase does) that can be used to retrieve >> data from the tables in Hive from a Java program? I heard somewhere that >> the data can be accessed with JDBC (style) APIs. True? >> > > True. > https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC > > >> 2) I don't see how I can add indexes on the tables, so does that mean a >> query such as the following will trigger a MR job that will search files on >> HDFS sequentially? >> >> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; >> >> > There are some index implementations in hive, but it is not as simple as a > traditional db. > E.g. Search Jira and see some of the work: > https://issues.apache.org/jira/browse/HIVE-417 > > You are correct that the above would do a full table scan > > 3) Has anyone compared performance of Hive against other NOSQL databases >> such as HBase, MongoDB. I understand it's not exactly apples to apples >> comparison, but still... >> > > I think you misunderstand what Hive is. It is a basically a SQL to MR > translation engine, which has adapters for the input source. By default it > uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you > can use it to run SQL on HBase tables for example (which works great). > Regarding performance, on the HBase scans, the operation is the same as > running a normal HBase MR scan, so is the same. > > >> >> Thanks. > > >
-
Re: Questions about HiveTim Robertson 2012-09-17, 07:15
I don't think Hive is intended for web request scoped operations... that
would be a rather unusual case from my understanding. HBase sounds more like the Hadoop equivalent that you might be looking for, but you need to look at your search patterns to see if HBase is a good fit (you need to manage your own indexes again). Cheers, Tim On Mon, Sep 17, 2012 at 8:07 AM, Something Something < [EMAIL PROTECTED]> wrote: > Thank you both for the answers. We are trying to find out if Hive can be > used as a replacement of Netezza, but if there are no indexes then I don't > see how it will beat Netezza in terms of performance. Sounds like it > certainly can't be used to do a quick lookup from a webapp - like Netezza > can. > > If performance isn't a concern, then I guess it could be a useful tool. > Will try it out & see how it works out. Thanks. > > > > On Sun, Sep 16, 2012 at 10:51 PM, Tim Robertson <[EMAIL PROTECTED] > > wrote: > >> Note: I am a newbie to Hive. >>> >>> Can someone please answer the following questions? >>> >>> 1) Does Hive provide APIs (like HBase does) that can be used to >>> retrieve data from the tables in Hive from a Java program? I heard >>> somewhere that the data can be accessed with JDBC (style) APIs. True? >>> >> >> True. >> https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC >> >> >>> 2) I don't see how I can add indexes on the tables, so does that mean a >>> query such as the following will trigger a MR job that will search files on >>> HDFS sequentially? >>> >>> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; >>> >>> >> There are some index implementations in hive, but it is not as simple as >> a traditional db. >> E.g. Search Jira and see some of the work: >> https://issues.apache.org/jira/browse/HIVE-417 >> >> You are correct that the above would do a full table scan >> >> 3) Has anyone compared performance of Hive against other NOSQL databases >>> such as HBase, MongoDB. I understand it's not exactly apples to apples >>> comparison, but still... >>> >> >> I think you misunderstand what Hive is. It is a basically a SQL to MR >> translation engine, which has adapters for the input source. By default it >> uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you >> can use it to run SQL on HBase tables for example (which works great). >> Regarding performance, on the HBase scans, the operation is the same as >> running a normal HBase MR scan, so is the same. >> >> >>> >>> Thanks. >> >> >> >
-
RE: Questions about HiveHamilton, Robert 2012-09-17, 14:42
Hello, something :)
Regarding jdbc style: I understand this approach has some limitations, but here is an example. You will need to make sure the hive service is running: https://cwiki.apache.org/Hive/hiveserver.html Here is a sample code that I've used for testing. It is not the best java in the world but it gets the job done. You will need to make sure the hive and hadoop jars are on the classpath. Note you will have to edit the connectionString. import java.sql.*; public class RunSQL { private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver"; private static String connectionString = "jdbc:hive://myserver.hp.com:10000/default"; public static void main(String[] args) throws SQLException ,org.apache.hadoop.hive.ql.metadata.HiveException { String SQLToRun=(args[0]); ResultSet res = null; try { Class.forName(driverName); } catch (ClassNotFoundException e) { e.printStackTrace(); System.exit(1); } Connection con = DriverManager.getConnection(connectionString); System.out.println("Connected."); Statement stmt = con.createStatement(); System.out.println("Running: " + SQLToRun); res = stmt.executeQuery(SQLToRun); ResultSetMetaData meta=res.getMetaData(); int numberOfColumns=meta.getColumnCount(); System.out.println("Result:"); while (res.next()) { for (int i=1;i<=numberOfColumns;i++){ System.out.print(String.valueOf("\t" + res.getString(i))); } System.out.println(); } } } From: Something Something [mailto:[EMAIL PROTECTED]] Sent: Monday, September 17, 2012 12:39 AM To: [EMAIL PROTECTED] Subject: Questions about Hive Note: I am a newbie to Hive. Can someone please answer the following questions? 1) Does Hive provide APIs (like HBase does) that can be used to retrieve data from the tables in Hive from a Java program? I heard somewhere that the data can be accessed with JDBC (style) APIs. True? 2) I don't see how I can add indexes on the tables, so does that mean a query such as the following will trigger a MR job that will search files on HDFS sequentially? hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; 3) Has anyone compared performance of Hive against other NOSQL databases such as HBase, MongoDB. I understand it's not exactly apples to apples comparison, but still... Thanks.
-
Re: Questions about HiveMiaoMiao 2012-09-18, 01:28
I believe Hive is not for web users, since it takes several minutes or
even hours to do one query. But I managed to provide a web service via THRIFT and php. http://nousefor.net/55/2011/12/php/hbase-and-hive-thrift-php-client/ On Mon, Sep 17, 2012 at 10:42 PM, Hamilton, Robert (Austin) <[EMAIL PROTECTED]> wrote: > Hello, something J > > Regarding jdbc style: I understand this approach has some limitations, but > here is an example. > > You will need to make sure the hive service is running: > https://cwiki.apache.org/Hive/hiveserver.html > > Here is a sample code that I’ve used for testing. It is not the best java in > the world but it gets the job done. > > You will need to make sure the hive and hadoop jars are on the classpath. > Note you will have to edit the connectionString. > > > > > > import java.sql.*; > > > > public class RunSQL { > > private static String driverName > "org.apache.hadoop.hive.jdbc.HiveDriver"; > > private static String connectionString > "jdbc:hive://myserver.hp.com:10000/default"; > > > > public static void main(String[] args) throws SQLException > ,org.apache.hadoop.hive.ql.metadata.HiveException { > > > > String SQLToRun=(args[0]); > > > > ResultSet res = null; > > > > try { > > Class.forName(driverName); > > } catch (ClassNotFoundException e) { > > e.printStackTrace(); > > System.exit(1); > > } > > Connection con = DriverManager.getConnection(connectionString); > > System.out.println("Connected."); > > > > Statement stmt = con.createStatement(); > > > > System.out.println("Running: " + SQLToRun); > > res = stmt.executeQuery(SQLToRun); > > ResultSetMetaData meta=res.getMetaData(); > > int numberOfColumns=meta.getColumnCount(); > > > > System.out.println("Result:"); > > while (res.next()) { > > for (int i=1;i<=numberOfColumns;i++){ > > System.out.print(String.valueOf("\t" + > res.getString(i))); > > } > > System.out.println(); > > } > > > > } > > } > > > > From: Something Something [mailto:[EMAIL PROTECTED]] > Sent: Monday, September 17, 2012 12:39 AM > > > To: [EMAIL PROTECTED] > Subject: Questions about Hive > > > > Note: I am a newbie to Hive. > > > > Can someone please answer the following questions? > > 1) Does Hive provide APIs (like HBase does) that can be used to retrieve > data from the tables in Hive from a Java program? I heard somewhere that > the data can be accessed with JDBC (style) APIs. True? > > 2) I don't see how I can add indexes on the tables, so does that mean a > query such as the following will trigger a MR job that will search files on > HDFS sequentially? > > > > > hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; > > > 3) Has anyone compared performance of Hive against other NOSQL databases > such as HBase, MongoDB. I understand it's not exactly apples to apples > comparison, but still... > > Thanks.
-
Re: Questions about HiveRicky Saltzer 2012-09-18, 04:36
Yes, Hive is meant for batch processing on a very large data set. It's
very latent when compared to other "databases" such as, MySQL, but excels where other databases faulter. For example, running analysis on several terabytes of data is not unusual in Hive. It was mentioned to consider HBase, be sure to understand that this is a "NoSQL" database, and so you will need to re-think a lot of application logic if it relied on SQL beforehand. Ricky On Sep 17, 2012 6:29 PM, "MiaoMiao" <[EMAIL PROTECTED]> wrote: > I believe Hive is not for web users, since it takes several minutes or > even hours to do one query. But I managed to provide a web service via > THRIFT and php. > http://nousefor.net/55/2011/12/php/hbase-and-hive-thrift-php-client/ > On Mon, Sep 17, 2012 at 10:42 PM, Hamilton, Robert (Austin) > <[EMAIL PROTECTED]> wrote: > > Hello, something J > > > > Regarding jdbc style: I understand this approach has some limitations, > but > > here is an example. > > > > You will need to make sure the hive service is running: > > https://cwiki.apache.org/Hive/hiveserver.html > > > > Here is a sample code that I’ve used for testing. It is not the best > java in > > the world but it gets the job done. > > > > You will need to make sure the hive and hadoop jars are on the classpath. > > Note you will have to edit the connectionString. > > > > > > > > > > > > import java.sql.*; > > > > > > > > public class RunSQL { > > > > private static String driverName > > "org.apache.hadoop.hive.jdbc.HiveDriver"; > > > > private static String connectionString > > "jdbc:hive://myserver.hp.com:10000/default"; > > > > > > > > public static void main(String[] args) throws SQLException > > ,org.apache.hadoop.hive.ql.metadata.HiveException { > > > > > > > > String SQLToRun=(args[0]); > > > > > > > > ResultSet res = null; > > > > > > > > try { > > > > Class.forName(driverName); > > > > } catch (ClassNotFoundException e) { > > > > e.printStackTrace(); > > > > System.exit(1); > > > > } > > > > Connection con = DriverManager.getConnection(connectionString); > > > > System.out.println("Connected."); > > > > > > > > Statement stmt = con.createStatement(); > > > > > > > > System.out.println("Running: " + SQLToRun); > > > > res = stmt.executeQuery(SQLToRun); > > > > ResultSetMetaData meta=res.getMetaData(); > > > > int numberOfColumns=meta.getColumnCount(); > > > > > > > > System.out.println("Result:"); > > > > while (res.next()) { > > > > for (int i=1;i<=numberOfColumns;i++){ > > > > System.out.print(String.valueOf("\t" + > > res.getString(i))); > > > > } > > > > System.out.println(); > > > > } > > > > > > > > } > > > > } > > > > > > > > From: Something Something [mailto:[EMAIL PROTECTED]] > > Sent: Monday, September 17, 2012 12:39 AM > > > > > > To: [EMAIL PROTECTED] > > Subject: Questions about Hive > > > > > > > > Note: I am a newbie to Hive. > > > > > > > > Can someone please answer the following questions? > > > > 1) Does Hive provide APIs (like HBase does) that can be used to retrieve > > data from the tables in Hive from a Java program? I heard somewhere that > > the data can be accessed with JDBC (style) APIs. True? > > > > 2) I don't see how I can add indexes on the tables, so does that mean a > > query such as the following will trigger a MR job that will search files > on > > HDFS sequentially? > > > > > > > > > > hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; > > > > > > 3) Has anyone compared performance of Hive against other NOSQL databases > > such as HBase, MongoDB. I understand it's not exactly apples to apples > > comparison, but still... > > > > Thanks. > |