|
|
-
Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?Marcos Ortiz 2012-03-15, 13:33
On 03/15/2012 09:22 AM, Manu S wrote: > Thanks a lot Bijoy, that makes sense :) > > Suppose if I have Mysql database in some other node(not in hadoop > cluster), can I import the tables using sqoop to my HDFS? Yes, this is the main purpose of Sqoop On the Cloudera site, you have the completed documentation for it Sqoop User Guide http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html Sqoop installation https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation Sqoop for MySQL http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_mysql Sqoop site on GitHub http://github.com/cloudera/sqoop Cloudera blog related post to Sqoop http://www.cloudera.com/blog/category/sqoop/ Best wishes > > > On Thu, Mar 15, 2012 at 6:27 PM, Bejoy Ks <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Hi Manu > Please find my responses inline > > >I had read about we can install Pig, hive & Sqoop on the client > node, no > need to install it in cluster. What is the client node actually? > Can I use > my management-node as a client? > > On larger clusters we have different node that is out of hadoop > cluster and > these stay in there. So user programs would be triggered from this > node. > This is the node refereed to as client node/ edge node etc . For your > cluster management node and client node can be the same > > >What is the best practice to install Pig, Hive, & Sqoop? > > On a client node > > >For the fully distributed cluster do we need to install Pig, > Hive, & Sqoop > >in each nodes? > > No, can be on a client node or on any of the nodes > > >Mysql is needed for Hive as a metastore and sqoop can import > mysql database > to HDFS or hive or pig, so can we make use of mysql DB's residing on > another node? > Regarding your first point, SQOOP import is for different purpose, > to get > data from RDBNS into hdfs. But the meta stores is used by hive in > framing > the map reduce jobs corresponding to your hive query. Here SQOOP > can't help > you much > Recommend to have the metastore db of hive on the same node where > hive is > installed as for execution hive queries there is meta data look up > required > much especially when your table has large number of partitions and > all. > > Regards > Bejoy.K.S > > On Thu, Mar 15, 2012 at 5:34 PM, Manu S <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > > Greetings All !!! > > > > I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes, > in which 5 > > are used for a fully distributed cluster, 1 for > pseudo-distributed & 1 as > > management-node. > > > > Fully distributed cluster: HDFS, Mapreduce & Hbase cluster > > Pseudo distributed mode: All > > > > I had read about we can install Pig, hive & Sqoop on the client > node, no > > need to install it in cluster. What is the client node actually? > Can I use > > my management-node as a client? > > > > What is the best practice to install Pig, Hive, & Sqoop? > > For the fully distributed cluster do we need to install Pig, > Hive, & Sqoop > > in each nodes? > > > > Mysql is needed for Hive as a metastore and sqoop can import > mysql database > > to HDFS or hive or pig, so can we make use of mysql DB's residing on > > another node? > > > > -- > > Thanks & Regards > > ---- > > Manu S > > SI Engineer - OpenSource & HPC > > Wipro Infotech > > Mob: +91 8861302855 Skype: manuspkd > > www.opensourcetalk.co.in <http://www.opensourcetalk.co.in> > > > > > > > -- > Thanks & Regards > ---- > Manu S > SI Engineer - OpenSource & HPC > Wipro Infotech > Mob: +91 8861302855 Skype: manuspkd > www.opensourcetalk.co.in <http://www.opensourcetalk.co.in> > > > -- Marcos Luis Ort�z Valmaseda Sr. Software Engineer (UCI) http://marcosluis2186.posterous.com http://postgresql.uci.cu/blog/38 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci |