|
Arindam Choudhury
2012-04-20, 12:07
Harsh J
2012-04-20, 12:20
Arindam Choudhury
2012-04-20, 13:01
Harsh J
2012-04-20, 13:08
Arindam Choudhury
2012-04-20, 13:45
Robert Evans
2012-04-20, 15:17
Amith D K
2012-04-20, 16:19
JAX
2012-04-21, 00:06
Harsh J
2012-04-21, 06:14
JAX
2012-04-21, 15:03
Harsh J
2012-04-21, 15:22
|
-
remote job submissionArindam Choudhury 2012-04-20, 12:07
Hi,
Do hadoop have any web service or other interface so I can submit jobs from remote machine? Thanks, Arindam +
Arindam Choudhury 2012-04-20, 12:07
-
Re: remote job submissionHarsh J 2012-04-20, 12:20
If you are allowed a remote connection to the cluster's service ports,
then you can directly submit your jobs from your local CLI. Just make sure your local configuration points to the right locations. Otherwise, perhaps you can choose to use Apache Oozie (Incubating) (http://incubator.apache.org/oozie/) It does provide a REST interface that launches jobs up for you over the supplied clusters, but its more oriented towards workflow management or perhaps HUE: https://github.com/cloudera/hue On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury <[EMAIL PROTECTED]> wrote: > Hi, > > Do hadoop have any web service or other interface so I can submit jobs from > remote machine? > > Thanks, > Arindam -- Harsh J +
Harsh J 2012-04-20, 12:20
-
Re: remote job submissionArindam Choudhury 2012-04-20, 13:01
"If you are allowed a remote connection to the cluster's service ports,
then you can directly submit your jobs from your local CLI. Just make sure your local configuration points to the right locations." Can you elaborate in details please? On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <[EMAIL PROTECTED]> wrote: > If you are allowed a remote connection to the cluster's service ports, > then you can directly submit your jobs from your local CLI. Just make > sure your local configuration points to the right locations. > > Otherwise, perhaps you can choose to use Apache Oozie (Incubating) > (http://incubator.apache.org/oozie/) It does provide a REST interface > that launches jobs up for you over the supplied clusters, but its more > oriented towards workflow management or perhaps HUE: > https://github.com/cloudera/hue > > On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury > <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Do hadoop have any web service or other interface so I can submit jobs > from > > remote machine? > > > > Thanks, > > Arindam > > > > -- > Harsh J > +
Arindam Choudhury 2012-04-20, 13:01
-
Re: remote job submissionHarsh J 2012-04-20, 13:08
Arindam,
If your machine can access the clusters' NN/JT/DN ports, then you can simply run your job from the machine itself. On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury <[EMAIL PROTECTED]> wrote: > "If you are allowed a remote connection to the cluster's service ports, > then you can directly submit your jobs from your local CLI. Just make > sure your local configuration points to the right locations." > > Can you elaborate in details please? > > On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> If you are allowed a remote connection to the cluster's service ports, >> then you can directly submit your jobs from your local CLI. Just make >> sure your local configuration points to the right locations. >> >> Otherwise, perhaps you can choose to use Apache Oozie (Incubating) >> (http://incubator.apache.org/oozie/) It does provide a REST interface >> that launches jobs up for you over the supplied clusters, but its more >> oriented towards workflow management or perhaps HUE: >> https://github.com/cloudera/hue >> >> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury >> <[EMAIL PROTECTED]> wrote: >> > Hi, >> > >> > Do hadoop have any web service or other interface so I can submit jobs >> from >> > remote machine? >> > >> > Thanks, >> > Arindam >> >> >> >> -- >> Harsh J >> -- Harsh J +
Harsh J 2012-04-20, 13:08
-
Re: remote job submissionArindam Choudhury 2012-04-20, 13:45
Sorry. But I can you give me a example.
On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Arindam, > > If your machine can access the clusters' NN/JT/DN ports, then you can > simply run your job from the machine itself. > > On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury > <[EMAIL PROTECTED]> wrote: > > "If you are allowed a remote connection to the cluster's service ports, > > then you can directly submit your jobs from your local CLI. Just make > > sure your local configuration points to the right locations." > > > > Can you elaborate in details please? > > > > On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <[EMAIL PROTECTED]> wrote: > > > >> If you are allowed a remote connection to the cluster's service ports, > >> then you can directly submit your jobs from your local CLI. Just make > >> sure your local configuration points to the right locations. > >> > >> Otherwise, perhaps you can choose to use Apache Oozie (Incubating) > >> (http://incubator.apache.org/oozie/) It does provide a REST interface > >> that launches jobs up for you over the supplied clusters, but its more > >> oriented towards workflow management or perhaps HUE: > >> https://github.com/cloudera/hue > >> > >> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury > >> <[EMAIL PROTECTED]> wrote: > >> > Hi, > >> > > >> > Do hadoop have any web service or other interface so I can submit jobs > >> from > >> > remote machine? > >> > > >> > Thanks, > >> > Arindam > >> > >> > >> > >> -- > >> Harsh J > >> > > > > -- > Harsh J > +
Arindam Choudhury 2012-04-20, 13:45
-
Re: remote job submissionRobert Evans 2012-04-20, 15:17
You can use Oozie to do it.
On 4/20/12 8:45 AM, "Arindam Choudhury" <[EMAIL PROTECTED]> wrote: Sorry. But I can you give me a example. On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Arindam, > > If your machine can access the clusters' NN/JT/DN ports, then you can > simply run your job from the machine itself. > > On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury > <[EMAIL PROTECTED]> wrote: > > "If you are allowed a remote connection to the cluster's service ports, > > then you can directly submit your jobs from your local CLI. Just make > > sure your local configuration points to the right locations." > > > > Can you elaborate in details please? > > > > On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <[EMAIL PROTECTED]> wrote: > > > >> If you are allowed a remote connection to the cluster's service ports, > >> then you can directly submit your jobs from your local CLI. Just make > >> sure your local configuration points to the right locations. > >> > >> Otherwise, perhaps you can choose to use Apache Oozie (Incubating) > >> (http://incubator.apache.org/oozie/) It does provide a REST interface > >> that launches jobs up for you over the supplied clusters, but its more > >> oriented towards workflow management or perhaps HUE: > >> https://github.com/cloudera/hue > >> > >> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury > >> <[EMAIL PROTECTED]> wrote: > >> > Hi, > >> > > >> > Do hadoop have any web service or other interface so I can submit jobs > >> from > >> > remote machine? > >> > > >> > Thanks, > >> > Arindam > >> > >> > >> > >> -- > >> Harsh J > >> > > > > -- > Harsh J > +
Robert Evans 2012-04-20, 15:17
-
RE: remote job submissionAmith D K 2012-04-20, 16:19
I dont know your use case if its for test and
ssh across the machine are disabled then u write a script that can do ssh run the jobs using cli for running your jobs. U can check ssh usage. Or else use Ooze ________________________________________ From: Robert Evans [[EMAIL PROTECTED]] Sent: Friday, April 20, 2012 11:17 PM To: [EMAIL PROTECTED] Subject: Re: remote job submission You can use Oozie to do it. On 4/20/12 8:45 AM, "Arindam Choudhury" <[EMAIL PROTECTED]> wrote: Sorry. But I can you give me a example. On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Arindam, > > If your machine can access the clusters' NN/JT/DN ports, then you can > simply run your job from the machine itself. > > On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury > <[EMAIL PROTECTED]> wrote: > > "If you are allowed a remote connection to the cluster's service ports, > > then you can directly submit your jobs from your local CLI. Just make > > sure your local configuration points to the right locations." > > > > Can you elaborate in details please? > > > > On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <[EMAIL PROTECTED]> wrote: > > > >> If you are allowed a remote connection to the cluster's service ports, > >> then you can directly submit your jobs from your local CLI. Just make > >> sure your local configuration points to the right locations. > >> > >> Otherwise, perhaps you can choose to use Apache Oozie (Incubating) > >> (http://incubator.apache.org/oozie/) It does provide a REST interface > >> that launches jobs up for you over the supplied clusters, but its more > >> oriented towards workflow management or perhaps HUE: > >> https://github.com/cloudera/hue > >> > >> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury > >> <[EMAIL PROTECTED]> wrote: > >> > Hi, > >> > > >> > Do hadoop have any web service or other interface so I can submit jobs > >> from > >> > remote machine? > >> > > >> > Thanks, > >> > Arindam > >> > >> > >> > >> -- > >> Harsh J > >> > > > > -- > Harsh J > +
Amith D K 2012-04-20, 16:19
-
Re: remote job submissionJAX 2012-04-21, 00:06
RE anirunds question on "how to submit a job remotely".
Here are my follow up questions - hope this helps to guide the discussion: 1) Normally - what is the "job client"? Do you guys typically use the namenode as the client? 2) In the case where the client != name node ---- how does the client know how to start up the task trackers ? UCHC On Apr 20, 2012, at 11:19 AM, Amith D K <[EMAIL PROTECTED]> wrote: > I dont know your use case if its for test and > ssh across the machine are disabled then u write a script that can do ssh run the jobs using cli for running your jobs. U can check ssh usage. > > Or else use Ooze > ________________________________________ > From: Robert Evans [[EMAIL PROTECTED]] > Sent: Friday, April 20, 2012 11:17 PM > To: [EMAIL PROTECTED] > Subject: Re: remote job submission > > You can use Oozie to do it. > > > On 4/20/12 8:45 AM, "Arindam Choudhury" <[EMAIL PROTECTED]> wrote: > > Sorry. But I can you give me a example. > > On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Arindam, >> >> If your machine can access the clusters' NN/JT/DN ports, then you can >> simply run your job from the machine itself. >> >> On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury >> <[EMAIL PROTECTED]> wrote: >>> "If you are allowed a remote connection to the cluster's service ports, >>> then you can directly submit your jobs from your local CLI. Just make >>> sure your local configuration points to the right locations." >>> >>> Can you elaborate in details please? >>> >>> On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >>>> If you are allowed a remote connection to the cluster's service ports, >>>> then you can directly submit your jobs from your local CLI. Just make >>>> sure your local configuration points to the right locations. >>>> >>>> Otherwise, perhaps you can choose to use Apache Oozie (Incubating) >>>> (http://incubator.apache.org/oozie/) It does provide a REST interface >>>> that launches jobs up for you over the supplied clusters, but its more >>>> oriented towards workflow management or perhaps HUE: >>>> https://github.com/cloudera/hue >>>> >>>> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury >>>> <[EMAIL PROTECTED]> wrote: >>>>> Hi, >>>>> >>>>> Do hadoop have any web service or other interface so I can submit jobs >>>> from >>>>> remote machine? >>>>> >>>>> Thanks, >>>>> Arindam >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >> >> >> >> -- >> Harsh J >> > +
JAX 2012-04-21, 00:06
-
Re: remote job submissionHarsh J 2012-04-21, 06:14
Hi,
A JobClient is something that facilitates validating your job configuration and shipping necessities to the cluster and notifying the JobTracker of that new job. Afterwards, its responsibility may merely be to monitor progress via reports from JobTracker(MR1)/ApplicationMaster(MR2). A client need not concern themselves, nor be aware about TaskTrackers (or NodeManagers). These are non-permanent members of a cluster and do not carry (critical) persistent states. The scheduling of job and its tasks is taken care of from the JobTracker in MR1 (or the MR Application's ApplicationMaster in MR2). The only thing a JobClient running user needs to ensure is that he has access to the NameNode (For creating staging files - job jar, job xml, etc.), the DataNodes (for actually writing the previous files to DFS for the JobTracker to pick up) and the JobTracker/Scheduler (for protocol communication required to notify the cluster of a job and that its resources are now ready to launch - and also monitoring progress) On Sat, Apr 21, 2012 at 5:36 AM, JAX <[EMAIL PROTECTED]> wrote: > RE anirunds question on "how to submit a job remotely". > > Here are my follow up questions - hope this helps to guide the discussion: > > 1) Normally - what is the "job client"? Do you guys typically use the namenode as the client? > > 2) In the case where the client != name node ---- how does the client know how to start up the task trackers ? > > UCHC > > On Apr 20, 2012, at 11:19 AM, Amith D K <[EMAIL PROTECTED]> wrote: > >> I dont know your use case if its for test and >> ssh across the machine are disabled then u write a script that can do ssh run the jobs using cli for running your jobs. U can check ssh usage. >> >> Or else use Ooze >> ________________________________________ >> From: Robert Evans [[EMAIL PROTECTED]] >> Sent: Friday, April 20, 2012 11:17 PM >> To: [EMAIL PROTECTED] >> Subject: Re: remote job submission >> >> You can use Oozie to do it. >> >> >> On 4/20/12 8:45 AM, "Arindam Choudhury" <[EMAIL PROTECTED]> wrote: >> >> Sorry. But I can you give me a example. >> >> On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >>> Arindam, >>> >>> If your machine can access the clusters' NN/JT/DN ports, then you can >>> simply run your job from the machine itself. >>> >>> On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury >>> <[EMAIL PROTECTED]> wrote: >>>> "If you are allowed a remote connection to the cluster's service ports, >>>> then you can directly submit your jobs from your local CLI. Just make >>>> sure your local configuration points to the right locations." >>>> >>>> Can you elaborate in details please? >>>> >>>> On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>>> >>>>> If you are allowed a remote connection to the cluster's service ports, >>>>> then you can directly submit your jobs from your local CLI. Just make >>>>> sure your local configuration points to the right locations. >>>>> >>>>> Otherwise, perhaps you can choose to use Apache Oozie (Incubating) >>>>> (http://incubator.apache.org/oozie/) It does provide a REST interface >>>>> that launches jobs up for you over the supplied clusters, but its more >>>>> oriented towards workflow management or perhaps HUE: >>>>> https://github.com/cloudera/hue >>>>> >>>>> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> Hi, >>>>>> >>>>>> Do hadoop have any web service or other interface so I can submit jobs >>>>> from >>>>>> remote machine? >>>>>> >>>>>> Thanks, >>>>>> Arindam >>>>> >>>>> >>>>> >>>>> -- >>>>> Harsh J >>>>> >>> >>> >>> >>> -- >>> Harsh J >>> >> -- Harsh J +
Harsh J 2012-04-21, 06:14
-
Re: remote job submissionJAX 2012-04-21, 15:03
Thanks j harsh:
I have another question , though --- You mentioned that : The client needs access to " the DataNodes (for actually writing the previous files to DFS for the JobTracker to pick up)" What do you mean by previous files? It seems like, if designing Hadoop from scratch , I wouldn't want to force the client to communicate with data nodes at all, since those can be added and removed during a job. Jay Vyas MMSB UCHC On Apr 21, 2012, at 1:14 AM, Harsh J <[EMAIL PROTECTED]> wrote: > the > DataNodes (for actually writing the previous files to DFS for the > JobTracker to pick up) +
JAX 2012-04-21, 15:03
-
Re: remote job submissionHarsh J 2012-04-21, 15:22
By "previous files" I meant the job related files there. DataNodes are
persistent members in HDFS. A removal of a DN results in loss of blocks. Usually you have replication handling failures of DN flawlessly, but consider a 1-replication cluster. A DN downtime can't be acceptable in that case. Writes to HDFS is done by writing blocks directly to DN, so a JobClient does need access to it to write its job-related files to HDFS. On Sat, Apr 21, 2012 at 8:33 PM, JAX <[EMAIL PROTECTED]> wrote: > Thanks j harsh: > I have another question , though --- > > You mentioned that : > > The client needs access to > " the > DataNodes (for actually writing the previous files to DFS for the > JobTracker to pick up)" > > What do you mean by previous files? It seems like, if designing Hadoop from scratch , I wouldn't want to force the client to communicate with data nodes at all, since those can be added and removed during a job. > > Jay Vyas > MMSB > UCHC > > On Apr 21, 2012, at 1:14 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> the >> DataNodes (for actually writing the previous files to DFS for the >> JobTracker to pick up) -- Harsh J +
Harsh J 2012-04-21, 15:22
|