Transferring to user list (hdfs-dev bcc'd).
The datanodes are definitely more disposable than the namenodes. If a
Sqoop command unexpectedly consumes a lot of resources, then stealing
resources from the namenode could impact performance of the whole cluster.
Stealing resources from a datanode would just impact performance of tasks
scheduled to that node, and if things got really bad, then the datanode
would get blacklisted and tasks would get scheduled elsewhere anyway.
If you have hardware to spare, then you may want to consider reserving a
machine for data staging and ad-hoc commands. This would be similar to the
other nodes in terms of Hadoop software installation and configuration, but
it wouldn't run any of the daemons. The benefit is that you can isolate
some of these data staging and ad-hoc commands so that there is no risk of
harming nodes participating in the cluster. The drawback is that you end
up with idle capacity when you're not running these commands, and that
capacity could have been used for map and reduce tasks.
There are some trade-offs to consider, but my biggest recommendation is
don't use the namenode. :-)
Hope this helps,
On Tue, Apr 23, 2013 at 8:21 AM, Kevin Burton <[EMAIL PROTECTED]>wrote:
> The Apache documentation on installing a Sqoop server indicates:
> "Copy Sqoop artifact on machine where you want to run Sqoop server. This
> machine must have installed and configured Hadoop. You don't need to run
> Hadoop related services there, however the machine must be able to act as
> Hadoop client. "
> I have a NameNode Server and a bunch of DataNodes as well as a backup
> namenode server. Anyone of these servers could function as a Sqoop server
> but based on some heresay it is fair to say that some of these machines may
> have more compute cycles available in a production environment that others.
> Any recommendations as to which machine in a Hadoop cluster would best be
> able to meet the needs of a Sqoop server?
> Thank you.
> Kevin Burton