I am working on setting up Sqoop2 in my CDH4.6 pseudo-distributed environment. Per earlier conversations with Jarek, I'm working toward migrating our usage to #2 vs. Sqoop1 for a few reasons.
I'm coming up with 2 questions. The first is: Is Sqoop2 considered "production-ready?" I only ask because the Sqoop User Guide still seems to be for Sqoop 1. Is the syntax essentially identical when working from CLI, so the guides are essentially interchangeable, and Sqoop2 will provide all the functionality of Sqoop1 but in a server-oriented fashion?
The second is - why does my sqoop2 server seem uncooperative?
I've yum installed sqoop2-server and client, and uninstalled Sqoop.
If I start Sqoop2 server after boot, it tells me a PID file is already there and aborts. Nothing odd there. Stopping it and starting it again yields this output:
[root@haddev1 ~]# /sbin/service sqoop2-server start Starting Sqoop Server: [ OK ] Sqoop home directory: /usr/lib/sqoop2 Setting SQOOP_HTTP_PORT: 12000 Setting SQOOP_ADMIN_PORT: 12001 Using CATALINA_OPTS: -Xmx1024m Adding to CATALINA_OPTS: -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001 Using CATALINA_BASE: /usr/lib/sqoop2/sqoop-server-0.20 Using CATALINA_HOME: /usr/lib/bigtop-tomcat Using CATALINA_TMPDIR: /var/tmp/sqoop2 Using JRE_HOME: /usr/java/jdk1.6.0_31 Using CLASSPATH: /usr/lib/bigtop-tomcat/bin/bootstrap.jar Using CATALINA_PID: /var/run/sqoop2/sqoop-server-sqoop2.pid
From there, I attempt to test the server response per the Cloudera guide with the command:
wget -qO - localhost:12000/sqoop/version
but I don't get the JSON token I'm supposed to get back.
If I enter the client and request version info, I get this:
sqoop:000> show version --all client version: Sqoop 1.99.2-cdh4.6.0 revision Compiled by jenkins on Wed Feb 26 02:44:40 PST 2014 Exception has occurred during processing command Exception: com.sun.jersey.api.client.UniformInterfaceException Message: GET http://localhost:12000/sqoop/version returned a response status of 404 Not Found
So, I'm kind of stumped at that point. Is there some contraindication regarding running Sq2 client on the same box as the server, essentially stopping it from being worked with in pseudo-distributed mode? Or does something stand out as a problem with my config/is there some debug I can pull out to glean more insight into why the service isn't listening on that port, even though it appears to be?
Thanks for any assistance you can give. I'd like to avoid spinning up a fully-distributed VM cluster to test Sqoop2 jobs on, if possible, and my real cluster is indisposed at the moment...
*Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com
If you find that its hardcoded to a default port number instead, then that is your problem, the port is not configurable. Change that line to the above so it recognizes the -D option given to start the sqoop2 server process.
You can run sqoop 1 and sqoop2 server on different nodes in the cluster simultaneously, but not on the same box.
So that will help you a lot in migration, as sqoop2 server still has many limitations as compared below in cdh documentation - Feature Differences - Sqoop and Sqoop 2 *Note*:
*Moving from Apache Sqoop to Sqoop 2:* Sqoop 2 is essentially the future of the Apache Sqoop project. However, since Sqoop 2 currently lacks some of the features of Sqoop, Cloudera recommends you use Sqoop 2 only if it contains all the features required for your use case, otherwise, continue to use Sqoop. Feature Sqoop Sqoop 2Connectors for all major RDBMS Supported.
*Workaround*: Use the generic JDBC Connector which has been tested on the following databases: Microsoft SQL Server, PostgreSQL, MySQL and Oracle.
This connector should work on any other JDBC compliant database. However, performance might not be comparable to that of specialized connectors in Sqoop. Kerberos Security IntegrationSupported.
Transfer to/from Hive/Hbase (through Hcat) are supported from Sqoop 1.4.4. See HCatalog Integration
Venkat On Fri, Apr 4, 2014 at 10:51 AM, Suhas Satish <[EMAIL PROTECTED]>wrote: CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext