Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?


Copy link to this message
-
Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?


On 03/15/2012 09:22 AM, Manu S wrote:
> Thanks a lot Bijoy, that makes sense :)
>
> Suppose if I have Mysql database in some other node(not in hadoop
> cluster), can I import the tables using sqoop to my HDFS?
Yes, this is the main purpose of Sqoop
On the Cloudera site, you have the completed documentation for it

Sqoop User Guide
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html

Sqoop installation
https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation

Sqoop for MySQL
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_mysql

Sqoop site on GitHub
http://github.com/cloudera/sqoop

Cloudera blog related post to Sqoop
http://www.cloudera.com/blog/category/sqoop/
Best wishes

>
>
> On Thu, Mar 15, 2012 at 6:27 PM, Bejoy Ks <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Hi Manu
>          Please find my responses inline
>
>     >I had read about we can install Pig, hive & Sqoop on the client
>     node, no
>     need to install it in cluster. What is the client node actually?
>     Can I use
>     my management-node as a client?
>
>     On larger clusters we have different node that is out of hadoop
>     cluster and
>     these stay in there. So user programs would be triggered from this
>     node.
>     This is the node refereed to as client node/ edge node etc . For your
>     cluster management node and client node can be the same
>
>     >What is the best practice to install Pig, Hive, & Sqoop?
>
>     On a client node
>
>     >For the fully distributed cluster do we need to install Pig,
>     Hive, & Sqoop
>     >in each nodes?
>
>     No, can be on a client node or on any of the nodes
>
>     >Mysql is needed for Hive as a metastore and sqoop can import
>     mysql database
>     to HDFS or hive or pig, so can we make use of mysql DB's residing on
>     another node?
>     Regarding your first point, SQOOP import is for different purpose,
>     to get
>     data from RDBNS into hdfs. But the meta stores is used by hive  in
>     framing
>     the map reduce jobs corresponding to your hive query. Here SQOOP
>     can't help
>     you much
>     Recommend to have the metastore db of hive on the same node where
>     hive is
>     installed as for execution hive queries there is meta data look up
>     required
>     much especially when your table has large number of partitions and
>     all.
>
>     Regards
>     Bejoy.K.S
>
>     On Thu, Mar 15, 2012 at 5:34 PM, Manu S <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>
>     > Greetings All !!!
>     >
>     > I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes,
>     in which 5
>     > are used for a fully distributed cluster, 1 for
>     pseudo-distributed & 1 as
>     > management-node.
>     >
>     > Fully distributed cluster: HDFS, Mapreduce & Hbase cluster
>     > Pseudo distributed mode: All
>     >
>     > I had read about we can install Pig, hive & Sqoop on the client
>     node, no
>     > need to install it in cluster. What is the client node actually?
>     Can I use
>     > my management-node as a client?
>     >
>     > What is the best practice to install Pig, Hive, & Sqoop?
>     > For the fully distributed cluster do we need to install Pig,
>     Hive, & Sqoop
>     > in each nodes?
>     >
>     > Mysql is needed for Hive as a metastore and sqoop can import
>     mysql database
>     > to HDFS or hive or pig, so can we make use of mysql DB's residing on
>     > another node?
>     >
>     > --
>     > Thanks & Regards
>     > ----
>     > Manu S
>     > SI Engineer - OpenSource & HPC
>     > Wipro Infotech
>     > Mob: +91 8861302855                Skype: manuspkd
>     > www.opensourcetalk.co.in <http://www.opensourcetalk.co.in>
>     >
>
>
>
>
> --
> Thanks & Regards
> ----
> Manu S
> SI Engineer - OpenSource & HPC
> Wipro Infotech
> Mob: +91 8861302855                Skype: manuspkd
> www.opensourcetalk.co.in <http://www.opensourcetalk.co.in>
>
>
>

--
Marcos Luis Ort�z Valmaseda
  Sr. Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://postgresql.uci.cu/blog/38

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci