|
Kartashov, Andy
2012-11-15, 14:44
Kartashov, Andy
2012-11-15, 15:29
Matt Goeke
2012-11-15, 15:31
Kartashov, Andy
2012-11-15, 15:57
Matt Goeke
2012-11-15, 18:32
|
-
Oozie apparent concurrency deadlockingKartashov, Andy 2012-11-15, 14:44
Guys,
Have struggled for the last four days with this and still cannot find an answer even after hours of searching the web. I tried oozie workflow to execute my consecutive sqoop jobs in parallel. I use forking that executes 9 sqoop-action-nodes. I had no problem executing the job on a pseudo-distributed cluster but with an added DN/TT node I ran into (what seems like) deadlocking. Oozie web interface displays those jobs as "Running" indefinitely until I eventually kill the workflow. What I did noticed wasd that if I was to reduce the number of sqoop-action-nodes to 3, all works fine. I found somewhere about oozie.service.CallableQueueService.callable.concurrency property to be set by default to 3 and it hinted me that this must be it them. I tried to over-ride this property by increasing this number to 5 in oozie-site.xml and restart oozie server and then run 4 sqoop-action-nodes in fork but the result is the same. 2 out of 4 nodes execute successfully (not in the same order every time) but the other 2 get hung in indefinite "Running...". There were some suggestion about changing queue name from default but nothing was clear as to what it change it to and where. In case someone found a solution to this please do share. I will greatly appreciate it. Thanks, AK47 NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel
-
RE: Oozie apparent concurrency deadlockingKartashov, Andy 2012-11-15, 15:29
Guys,
Further to my post below and while searching the web , I believe I found hints to my problem solution but have no idea how to implement it: Qte ... - are you using FairScheduler? If so and since you mention that the Sqoop import command is successful, you could be hitting your per user job limit. Whenever Oozie launches a job, it requires two job submissions (if not more) - one being the monitor+launcher, and the subsequent ones being the ones that do the real logic work. The launcher job is something that will launch the remaining jobs, and hence sticks around until they have all ended - taking up one running job slot for the whole lifetime of the Oozie job. For example, with a per user job limit of 3, if you were to run 3 Oozie jobs, the 3 slots would be filled with launchers first. These would submit their real jobs next, and those would end up being in a queue - thereby forming a resource deadlock. The solution is to channel Oozie launcher hadoop jobs into a dedicated launcher pool. This pool can have a running job limit too but won't cause a deadlock because the pools are now separated. To do this, you need to pass the config property: "oozie.launcher.<property that specifies your pool>" via WF <configuration> elements or <job-xml> files to point to the separate pool. Unqte And also Qte Harsh J <[EMAIL PROTECTED]> wrote: > >> In a FairScheduler environment, especially where max-running-job >> limits are configured, it is recommended to override the Oozie >> launcher job's pool to be different than the actual required working >> pool (for actions that launch other MR jobs). >> >> If your scheduler is configured to pick ${user.name} up automatically, >> then your Oozie launcher config must use the super-override pool name >> config: >> >> oozie.launcher.mapred.fairscheduler.pool=launcherpoolname >> >> Your target pool for launchers can still carry limitations, but it >> should no longer deadlock your actual MR execution (after which the >> launcher dies away anyway). Unqte Please help. Thanks, Ak-47 From: Kartashov, Andy Sent: Thursday, November 15, 2012 9:45 AM To: [EMAIL PROTECTED]; '[EMAIL PROTECTED]' Subject: Oozie apparent concurrency deadlocking Guys, Have struggled for the last four days with this and still cannot find an answer even after hours of searching the web. I tried oozie workflow to execute my consecutive sqoop jobs in parallel. I use forking that executes 9 sqoop-action-nodes. I had no problem executing the job on a pseudo-distributed cluster but with an added DN/TT node I ran into (what seems like) deadlocking. Oozie web interface displays those jobs as "Running" indefinitely until I eventually kill the workflow. What I did noticed wasd that if I was to reduce the number of sqoop-action-nodes to 3, all works fine. I found somewhere about oozie.service.CallableQueueService.callable.concurrency property to be set by default to 3 and it hinted me that this must be it them. I tried to over-ride this property by increasing this number to 5 in oozie-site.xml and restart oozie server and then run 4 sqoop-action-nodes in fork but the result is the same. 2 out of 4 nodes execute successfully (not in the same order every time) but the other 2 get hung in indefinite "Running...". There were some suggestion about changing queue name from default but nothing was clear as to what it change it to and where. In case someone found a solution to this please do share. I will greatly appreciate it. Thanks, AK47 NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel
-
Re: Oozie apparent concurrency deadlockingMatt Goeke 2012-11-15, 15:31
Andy,
Are you using the fairscheduler or default FIFO? This problem can be partially alleviated by routing the MR actions and the Launcher jobs to seperate queues/pools. The reason for this is if they are both competing for the same resources you can run into a situation where all of the available slots are taken up by the launcher actions and thus permanent deadlock. I am guessing based on the numbers you threw out there that your overall slot capacity is small (less than 10 mappers total?) but if this isn't the case then something else is probably going on as well. The way to specify it if you are looking to do it in a sqoop node is below: <action name="sqoop-node"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${JOB_TRACKER}</job-tracker> <name-node>${NAME_NODE}</name-node> <prepare> <delete path="${NAME_NODE}/tmp/blah"/> </prepare> <configuration> <property> <name>oozie.launcher.mapred.fairscheduler.pool</name> <value>${LAUNCHER_POOL}</value> </property> </configuration> ... </action> I have seen the oozie.service.CallableQueueService.callable.concurrency property fix mentioned before as well but I thought that was only for internalized Oozie nodes (e.g. forks/decisions/etc). Hope this helps -- Matt On Thu, Nov 15, 2012 at 8:44 AM, Kartashov, Andy <[EMAIL PROTECTED]>wrote: > Guys, > > > > Have struggled for the last four days with this and still cannot find an > answer even after hours of searching the web. > > > > I tried oozie workflow to execute my consecutive sqoop jobs in parallel. > I use forking that executes 9 sqoop-action-nodes. > > > > I had no problem executing the job on a pseudo-distributed cluster but > with an added DN/TT node I ran into (what seems like) deadlocking. Oozie > web interface displays those jobs as “Running” indefinitely until I > eventually kill the workflow. > > > > What I did noticed wasd that if I was to reduce the number of > sqoop-action-nodes to 3, all works fine. > > > > I found somewhere about > oozie.service.CallableQueueService.callable.concurrency property to be set > by default to 3 and it hinted me that this must be it them. I tried to > over-ride this property by increasing this number to 5 in oozie-site.xml > and restart oozie server and then run 4 sqoop-action-nodes in fork but the > result is the same. 2 out of 4 nodes execute successfully (not in the same > order every time) but the other 2 get hung in indefinite “Running…”. > > > > There were some suggestion about changing queue name from default but > nothing was clear as to what it change it to and where. > > > > In case someone found a solution to this please do share. I will greatly > appreciate it. > > > > Thanks, > > AK47 > NOTICE: This e-mail message and any attachments are confidential, subject > to copyright and may be privileged. Any unauthorized use, copying or > disclosure is prohibited. If you are not the intended recipient, please > delete and contact the sender immediately. Please consider the environment > before printing this e-mail. AVIS : le présent courriel et toute pièce > jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur > et peuvent être couverts par le secret professionnel. Toute utilisation, > copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le > destinataire prévu de ce courriel, supprimez-le et contactez immédiatement > l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent > courriel >
-
RE: Oozie apparent concurrency deadlockingKartashov, Andy 2012-11-15, 15:57
Matt,
Thank you for the prompt response. To answer your question "Are you using the fairscheduler or default FIFO?" I frankly have no idea. I suppose that since I have no idea, it must be the default? How can I find out?? Do you set ${LAUNCHER_POOL} parameter inside job.properties, similar to ${JT} & ${NN}? What do you set it to? I have one more question I cannot find an answer to. It is about "uri:oozie:sqoop-action:0.2" structure. I found no source about this. I take it 0.2 is a schema version? I have seen examples with 0.1, 0.2 and 0.3. I have seen examples where simple <sqoop>..</sqoop> used without "uri:....". Could you explain this part as well or point into the right direction where I can learn how to sensibly us "uri:..." in <workflow-app xmlns="uri:..." <map-reduce xmlns="uri:.." and <sqoop xmlns="uri:..", etc? p.s. My sqoop input totals around 800Mb of datacoming from 9 tables, at 64Mb default split size I end up with about what, 13 mappers total? I run this test on two EC2 medium type instances with one node running as NN,JT,DN,TT and another just DN,TT. With 2cores per node I have two M|R slots each? Rgds, AK-47 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Matt Goeke Sent: Thursday, November 15, 2012 10:31 AM To: [EMAIL PROTECTED] Subject: Re: Oozie apparent concurrency deadlocking Andy, Are you using the fairscheduler or default FIFO? This problem can be partially alleviated by routing the MR actions and the Launcher jobs to seperate queues/pools. The reason for this is if they are both competing for the same resources you can run into a situation where all of the available slots are taken up by the launcher actions and thus permanent deadlock. I am guessing based on the numbers you threw out there that your overall slot capacity is small (less than 10 mappers total?) but if this isn't the case then something else is probably going on as well. The way to specify it if you are looking to do it in a sqoop node is below: <action name="sqoop-node"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${JOB_TRACKER}</job-tracker> <name-node>${NAME_NODE}</name-node> <prepare> <delete path="${NAME_NODE}/tmp/blah"/> </prepare> <configuration> <property> <name>oozie.launcher.mapred.fairscheduler.pool</name> <value>${LAUNCHER_POOL}</value> </property> </configuration> ... </action> I have seen the oozie.service.CallableQueueService.callable.concurrency property fix mentioned before as well but I thought that was only for internalized Oozie nodes (e.g. forks/decisions/etc). Hope this helps -- Matt On Thu, Nov 15, 2012 at 8:44 AM, Kartashov, Andy <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Guys, Have struggled for the last four days with this and still cannot find an answer even after hours of searching the web. I tried oozie workflow to execute my consecutive sqoop jobs in parallel. I use forking that executes 9 sqoop-action-nodes. I had no problem executing the job on a pseudo-distributed cluster but with an added DN/TT node I ran into (what seems like) deadlocking. Oozie web interface displays those jobs as "Running" indefinitely until I eventually kill the workflow. What I did noticed wasd that if I was to reduce the number of sqoop-action-nodes to 3, all works fine. I found somewhere about oozie.service.CallableQueueService.callable.concurrency property to be set by default to 3 and it hinted me that this must be it them. I tried to over-ride this property by increasing this number to 5 in oozie-site.xml and restart oozie server and then run 4 sqoop-action-nodes in fork but the result is the same. 2 out of 4 nodes execute successfully (not in the same order every time) but the other 2 get hung in indefinite "Running...". There were some suggestion about changing queue name from default but nothing was clear as to what it change it to and where. In case someone found a solution to this please do share. I will greatly appreciate it. Thanks, AK47 NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel
-
Re: Oozie apparent concurrency deadlockingMatt Goeke 2012-11-15, 18:32
Inline
On Thu, Nov 15, 2012 at 9:57 AM, Kartashov, Andy <[EMAIL PROTECTED]>wrote: > Matt, > > > > Thank you for the prompt response. > > > > To answer your question “Are you using the fairscheduler or default > FIFO?” I frankly have no idea. I suppose that since I have no idea, it > must be the default? How can I find out?? > If you are not explicitly specifying a different scheduler then yes you are using the default FIFO. In this case you will have to make the differentiation based on queue and not pool. Take a look at http://downright-amazed.blogspot.com/2012/02/configure-oozies-launcher-job.htmlfor specifics. > > > Do you set ${LAUNCHER_POOL} parameter inside job.properties, similar to > ${JT} & ${NN}? What do you set it to? > You will have to either set it statically in the workflow, feed it in through the job.properties or specify it in the coordinator (if you are using one). Remember that you can have multiple layers of inheritance but in the end the workflow.xml needs to be able to resolve it from one of the layers. > > > I have one more question I cannot find an answer to. It is about > "uri:oozie:sqoop-action:0.2” structure. I found no source about this. I > take it 0.2 is a schema version? I have seen examples with 0.1, 0.2 and > 0.3. I have seen examples where simple <sqoop>..</sqoop> used without > “uri:….”. Could you explain this part as well or point into the right > direction where I can learn how to sensibly us “uri:…” in <workflow-app > xmlns=”uri:…” <map-reduce xmlns=”uri:..” and <sqoop xmlns=”uri:..”, etc? > This is all based on which URIs are included in the version that you downloaded. I actually haven't dug too deep into the difference between the versions for sqoop but there could be some minor differences between the action API based on which URI you specify. I wouldn't worry to hard about it unless you run into errors when actually trying to run the action. If you take a look at the numerous examples on how to add your own custom actions for Oozie you will get a much better grasp on how everything is registered. > > > p.s. My sqoop input totals around 800Mb of datacoming from 9 tables, at > 64Mb default split size I end up with about what, 13 mappers total? I run > this test on two EC2 medium type instances with one node running as > NN,JT,DN,TT and another just DN,TT. With 2cores per node I have two M|R > slots each? > So my initial guess based on this is you are hitting the issue where all of the available slots are being held by launcher actions. If you start to scale out more and still run into this then you will want to start being mindful of any setting that could limit your max concurrent jobs for your user. > > > Rgds, > > AK-47 > > > > *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] *On Behalf Of *Matt > Goeke > *Sent:* Thursday, November 15, 2012 10:31 AM > *To:* [EMAIL PROTECTED] > *Subject:* Re: Oozie apparent concurrency deadlocking > > > > Andy, > > > > Are you using the fairscheduler or default FIFO? This problem can be > partially alleviated by routing the MR actions and the Launcher jobs to > seperate queues/pools. The reason for this is if they are both competing > for the same resources you can run into a situation where all of the > available slots are taken up by the launcher actions and > thus permanent deadlock. I am guessing based on the numbers you threw out > there that your overall slot capacity is small (less than 10 mappers > total?) but if this isn't the case then something else is probably going on > as well. The way to specify it if you are looking to do it in a sqoop node > is below: > > > > <action name="sqoop-node"> > > <sqoop xmlns="uri:oozie:sqoop-action:0.2"> > > <job-tracker>${JOB_TRACKER}</job-tracker> > > <name-node>${NAME_NODE}</name-node> > > <prepare> > > <delete path="${NAME_NODE}/tmp/blah"/> > > </prepare> > > <configuration> > > <property> |