|
|
-
RE: a question on NameNode
Kartashov, Andy 2012-11-19, 14:43
Awesome, thanks.
So, what if DN2 is down, i.e. it is not sending any blocks' report. Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.
Thanks, AK
From: Kai Voigt [mailto:[EMAIL PROTECTED]] Sent: Monday, November 19, 2012 9:31 AM To: [EMAIL PROTECTED] Subject: Re: a question on NameNode
Hi,
Am 19.11.2012 um 15:27 schrieb "Kartashov, Andy" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>: I am learning that NN doesn't persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts.
So, how does it work?
Say we only have one file A.txt in our HDFS that is split into 4 blocks 1,2,3,4 (no replication), with block 1-2 residing on DN1 and blocks 3,4 on DN2.
When we start NN it reads it metastore and tries to locate and map the locations of 4 blocks of file A.txt??
when a NameNode starts, it does that in safe mode. Like you said, it doesn't know where the blocks are. The DataNodes send a list of all of their local block IDs (so called block reports). Once the NameNode knows about the locations of most blocks (99,9%, configurable number), it will leave safe mode and HDFS is back.
Kai
-- Kai Voigt [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel
+
Kartashov, Andy 2012-11-19, 14:43
-
Re: a question on NameNode
Kai Voigt 2012-11-19, 15:01
Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <[EMAIL PROTECTED]>:
> So, what if DN2 is down, i.e. it is not sending any blocks’ report. Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.
One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.
Kai
-- Kai Voigt [EMAIL PROTECTED]
+
Kai Voigt 2012-11-19, 15:01
-
RE: a question on NameNode
Kartashov, Andy 2012-11-19, 15:14
Thank you Kai.. One more question please.
Does MapReduce run tasks of redundant blocks ?
Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3
Will MR attempt: a. to start 3 Map tasks (one per replicated block) end execute them all
b. to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully
c. will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed
Thanks,
From: Kai Voigt [mailto:[EMAIL PROTECTED]] Sent: Monday, November 19, 2012 10:01 AM To: [EMAIL PROTECTED] Subject: Re: a question on NameNode Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>: So, what if DN2 is down, i.e. it is not sending any blocks' report. Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.
One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.
Kai
-- Kai Voigt [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel
+
Kartashov, Andy 2012-11-19, 15:14
-
Re: a question on NameNode
Ted Dunning 2012-11-19, 16:37
IT sounds like you could benefit from reading the basic papers on map-reduce in general. Hadoop is a reasonable facsimile of the original Google systems. Try looking at this: http://research.google.com/archive/mapreduce.htmlOn Mon, Nov 19, 2012 at 7:14 AM, Kartashov, Andy <[EMAIL PROTECTED]>wrote: > Thank you Kai.. One more question please. > > > > Does MapReduce run tasks of redundant blocks ? > > > > Say you have only 1 block of data replicated 3 times, one block over each > of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 > (replica #2) – DN3 > > > > Will MR attempt: > > > > a. to start 3 Map tasks (one per replicated block) end execute them > all > > b. to start 3 Map tasks (one per replicated block) end drop the > other two as soon as one of the three executed successfully > > c. will start only 1 Map task (for just one block avoiding all > replicated ones) and will attempt to start (another one of the replicated > blocks) when and only when the initially task running (say on DN1)failed > > > > Thanks, > > > > *From:* Kai Voigt [mailto:[EMAIL PROTECTED]] > *Sent:* Monday, November 19, 2012 10:01 AM > > *To:* [EMAIL PROTECTED] > *Subject:* Re: a question on NameNode > > > > > > Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <[EMAIL PROTECTED]>: > > > > So, what if DN2 is down, i.e. it is not sending any blocks’ report. > Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no > home and that (without replication) it has no way of reconstructing the > file A.txt. It must spit the error then. > > > > One major feature of HDFS is its redundancy. Blocks are stored more than > once (three times by default), so chances are good that another DataNode > will have that block and report it during the safe mode phase. So the file > will be accessible. > > > > Kai > > > > -- > > Kai Voigt > > [EMAIL PROTECTED] > > > > > > > NOTICE: This e-mail message and any attachments are confidential, subject > to copyright and may be privileged. Any unauthorized use, copying or > disclosure is prohibited. If you are not the intended recipient, please > delete and contact the sender immediately. Please consider the environment > before printing this e-mail. AVIS : le présent courriel et toute pièce > jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur > et peuvent être couverts par le secret professionnel. Toute utilisation, > copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le > destinataire prévu de ce courriel, supprimez-le et contactez immédiatement > l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent > courriel >
+
Ted Dunning 2012-11-19, 16:37
-
Re: a question on NameNode
Mohammad Tariq 2012-11-19, 15:20
Hello Andy,
If you have not disabled the speculative execution then your second assumption is correct.
Regards, Mohammad Tariq
On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <[EMAIL PROTECTED]>wrote:
> Thank you Kai.. One more question please. > > > > Does MapReduce run tasks of redundant blocks ? > > > > Say you have only 1 block of data replicated 3 times, one block over each > of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 > (replica #2) – DN3 > > > > Will MR attempt: > > > > a. to start 3 Map tasks (one per replicated block) end execute them > all > > b. to start 3 Map tasks (one per replicated block) end drop the > other two as soon as one of the three executed successfully > > c. will start only 1 Map task (for just one block avoiding all > replicated ones) and will attempt to start (another one of the replicated > blocks) when and only when the initially task running (say on DN1)failed > > > > Thanks, > > > > *From:* Kai Voigt [mailto:[EMAIL PROTECTED]] > *Sent:* Monday, November 19, 2012 10:01 AM > > *To:* [EMAIL PROTECTED] > *Subject:* Re: a question on NameNode > > > > > > Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <[EMAIL PROTECTED]>: > > > > So, what if DN2 is down, i.e. it is not sending any blocks’ report. > Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no > home and that (without replication) it has no way of reconstructing the > file A.txt. It must spit the error then. > > > > One major feature of HDFS is its redundancy. Blocks are stored more than > once (three times by default), so chances are good that another DataNode > will have that block and report it during the safe mode phase. So the file > will be accessible. > > > > Kai > > > > -- > > Kai Voigt > > [EMAIL PROTECTED] > > > > > > > NOTICE: This e-mail message and any attachments are confidential, subject > to copyright and may be privileged. Any unauthorized use, copying or > disclosure is prohibited. If you are not the intended recipient, please > delete and contact the sender immediately. Please consider the environment > before printing this e-mail. AVIS : le présent courriel et toute pièce > jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur > et peuvent être couverts par le secret professionnel. Toute utilisation, > copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le > destinataire prévu de ce courriel, supprimez-le et contactez immédiatement > l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent > courriel >
+
Mohammad Tariq 2012-11-19, 15:20
-
RE: a question on NameNode
Kartashov, Andy 2012-11-19, 15:44
Thank you Kai and Tariq.
From: Mohammad Tariq [mailto:[EMAIL PROTECTED]] Sent: Monday, November 19, 2012 10:20 AM To: [EMAIL PROTECTED] Subject: Re: a question on NameNode
Hello Andy,
If you have not disabled the speculative execution then your second assumption is correct.
Regards, Mohammad Tariq On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Thank you Kai.. One more question please.
Does MapReduce run tasks of redundant blocks ?
Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3
Will MR attempt: a. to start 3 Map tasks (one per replicated block) end execute them all
b. to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully
c. will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed
Thanks,
From: Kai Voigt [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Monday, November 19, 2012 10:01 AM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: a question on NameNode Am 19.11.2012 um 15:43 schrieb "Kartashov, Andy" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>:
So, what if DN2 is down, i.e. it is not sending any blocks' report. Then NN (I guess) will figure out that it has 2 blocks (3,4) that has no home and that (without replication) it has no way of reconstructing the file A.txt. It must spit the error then.
One major feature of HDFS is its redundancy. Blocks are stored more than once (three times by default), so chances are good that another DataNode will have that block and report it during the safe mode phase. So the file will be accessible.
Kai
-- Kai Voigt [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel
+
Kartashov, Andy 2012-11-19, 15:44
-
Re: a question on NameNode
Kai Voigt 2012-11-19, 15:19
Hi,
Am 19.11.2012 um 16:14 schrieb "Kartashov, Andy" <[EMAIL PROTECTED]>:
> Does MapReduce run tasks of redundant blocks ? > > Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 (replica #2) – DN3 > > Will MR attempt: > > a. to start 3 Map tasks (one per replicated block) end execute them all > b. to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully > c. will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed
the JobTracker will schedule the map task on one node only initially. There's no need to launch the task on all nodes that have a local copy of the block.
If a task fails during its execution (node failure, e.g.), the JobTracker will launch the task again on another node with that block.
There's another advanced feature called Speculative Execution. If a task is progressing slowly through a phase (maybe due to flaky hardware), the JobTracker will launch the task in parallel on another node. The node finishing first will be used to get the task's output. The slow task will be killed.
Kai
-- Kai Voigt [EMAIL PROTECTED]
+
Kai Voigt 2012-11-19, 15:19
|
|