Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - running sqoop on hadoop cluster


Copy link to this message
-
Re: running sqoop on hadoop cluster
Bejoy KS 2011-10-21, 10:58
Hi Firantika
           HDFS is the underlying file system and the meta data to HDFS is
stored in Name Node and the actual data blocks are in DataNode. You can have
a NameNode and DataNode running on the same physical machine then the
metadata and some data blocks would be on same physical machine. But ideally
in production clusters it never happens so. Better to have a little larger
cluster so that data reliability  holds good with replication in hdfs. AFAIK
hdfs is not a process that runs on any node, there are 5 basic process in
hadoop they are

   1. Name Node
   2. Secondary Name Node
   3. Job Tracker
   4. Data Node
   5. Task Tracker

            SQOOP uses map reduce under the hood for import/export
processes. If you have more nodes or rather more task tracker slots(map task
slots) with optimal memory for each, you can spawn more no of parallel tasks
for a single sqoop import. But parallelism with sqoop is agin dependent on
your source db, on how many parallel connections it can handle.
             Bottom Line you need to have more number of nodes in your
cluster to use it in production. For development purposes this configuration
would be fine. There are a good number of tutorials available in cloudera
and yahoo blogs which would give you a better insight on your queries.

Hope it helps!...

Thank You

Regards
Bejoy.K.S

On Fri, Oct 21, 2011 at 12:42 PM, Alexander C.H. Lorenz <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> first setup a valid cluster:
> namenode, secondary namenode, jobtracker + datanodes with tasktracker.
>
> After that install sqoop on a datanode and play with ;)
>
> Here a howto for RedHat (CentOS)
> http://mapredit.blogspot.com/p/get-hadoop-cluster-running-in-20.html
>
> and for Ubuntu:
>
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>
> regards,
>  Alex
>
> On Fri, Oct 21, 2011 at 2:03 AM, firantika <[EMAIL PROTECTED]
> >wrote:
>
> >
> > Hi All,
> > i'm newbie on hadoop,
> >
> > if i installed hadoop on 2 node, where is hdfs running ? on master or
> slave
> > node ?
> >
> > and then if i running sqoop for export dbms to hive, is it give effect on
> > speed up system between hadoop which running on single node and hadoop
> > multi
> > node ?
> >
> > please give me explaining ?
> >
> >
> > Tks
> >
> >
> > --
> > View this message in context:
> >
> http://old.nabble.com/running-sqoop-on-hadoop-cluster-tp32693398p32693398.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>