|
Gokulakannan M
2011-02-10, 15:11
Ted Dunning
2011-02-10, 15:59
Konstantin Boudnik
2011-02-10, 22:38
Gokulakannan M
2011-02-11, 04:38
Ted Dunning
2011-02-11, 05:33
Gokulakannan M
2011-02-11, 08:31
Ted Dunning
2011-02-11, 08:43
Marcos M Rubinelli
2011-02-11, 10:55
Gokulakannan M
2011-02-14, 15:21
Ted Dunning
2011-02-14, 16:47
Gokulakannan M
2011-02-15, 04:21
M. C. Srivas
2011-02-15, 06:29
|
-
hadoop 0.20 append - some clarificationsGokulakannan M 2011-02-10, 15:11
Hi All,
I have run the hadoop 0.20 append branch . Can someone please clarify the following behavior? A writer writing a file but he has not flushed the data and not closed the file. Could a parallel reader read this partial file? For example, 1. a writer is writing a 10MB file(block size 2 MB) 2. wrote the file upto 5MB (2 finalized blocks + 1 blockBeingWritten) . note that writer is not calling FsDataOutputStream sync( ) at all 3. now a reader tries to read the above partially written file I can be able to see that the reader can be able to see the partially written 5MB data but I feel the reader should be able to see the data only after the writer calls sync() api. Is this the correct behavior or my understanding is wrong? Thanks, Gokul
-
Re: hadoop 0.20 append - some clarificationsTed Dunning 2011-02-10, 15:59
Correct is a strong word here.
There is actually an HDFS unit test that checks to see if partially written and unflushed data is visible. The basic rule of thumb is that you need to synchronize readers and writers outside of HDFS. There is no guarantee that data is visible or invisible after writing, but there is a guarantee that it will become visible after sync or close. On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > Is this the correct behavior or my understanding is wrong? >
-
Re: hadoop 0.20 append - some clarificationsKonstantin Boudnik 2011-02-10, 22:38
You might also want to check append design doc published at HDFS-265
-- Take care, Konstantin (Cos) Boudnik On Thu, Feb 10, 2011 at 07:11, Gokulakannan M <[EMAIL PROTECTED]> wrote: > Hi All, > > I have run the hadoop 0.20 append branch . Can someone please clarify the > following behavior? > > A writer writing a file but he has not flushed the data and not closed the > file. Could a parallel reader read this partial file? > > For example, > > 1. a writer is writing a 10MB file(block size 2 MB) > > 2. wrote the file upto 5MB (2 finalized blocks + 1 blockBeingWritten) . note > that writer is not calling FsDataOutputStream sync( ) at all > > 3. now a reader tries to read the above partially written file > > I can be able to see that the reader can be able to see the partially > written 5MB data but I feel the reader should be able to see the data only > after the writer calls sync() api. > > Is this the correct behavior or my understanding is wrong? > > > > Thanks, > > Gokul > >
-
RE: hadoop 0.20 append - some clarificationsGokulakannan M 2011-02-11, 04:38
Thanks Ted for clarifying.
So the sync is to just flush the current buffers to datanode and persist the block info in namenode once per block, isn't it? Regarding reader able to see the unflushed data, I faced an issue in the following scneario: 1. a writer is writing a 10MB file(block size 2 MB) 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in blocksBeingWritten directory in DN) . So 2 blocks are written 3. client calls addBlock for the 3rd block on namenode and not yet created outputstream to DN(or written anything to DN). At this point of time, the namenode knows about the 3rd block but the datanode doesn't. 4. at point 3, a reader is trying to read the file and he is getting exception and not able to read the file as the datanode's getBlockInfo returns null to the client(of course DN doesn't know about the 3rd block yet) In this situation the reader cannot see the file. But when the block writing is in progress , the read is successful. Is this a bug that needs to be handled in append branch? >> -----Original Message----- >> From: Konstantin Boudnik [mailto:[EMAIL PROTECTED]] >> Sent: Friday, February 11, 2011 4:09 AM >>To: [EMAIL PROTECTED] >> Subject: Re: hadoop 0.20 append - some clarifications >> You might also want to check append design doc published at HDFS-265 I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's design doc won't apply to it. _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 10, 2011 9:29 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications Correct is a strong word here. There is actually an HDFS unit test that checks to see if partially written and unflushed data is visible. The basic rule of thumb is that you need to synchronize readers and writers outside of HDFS. There is no guarantee that data is visible or invisible after writing, but there is a guarantee that it will become visible after sync or close. On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: Is this the correct behavior or my understanding is wrong?
-
Re: hadoop 0.20 append - some clarificationsTed Dunning 2011-02-11, 05:33
It is a bit confusing.
SequenceFile.Writer#sync isn't really sync. There is SequenceFile.Writer#syncFs which is more what you might expect to be sync. Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if you want portable code, you have to reflect a bit to figure out what can be done. On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > Thanks Ted for clarifying. > > So the *sync* is to just flush the current buffers to datanode and persist > the block info in namenode once per block, isn't it? > > > > Regarding reader able to see the unflushed data, I faced an issue in the > following scneario: > > 1. a writer is writing a *10MB* file(block size 2 MB) > > 2. wrote the file upto 4MB (2 finalized blocks in *current* and nothing in > *blocksBeingWritten* directory in DN) . So 2 blocks are written > > 3. client calls addBlock for the 3rd block on namenode and not yet created > outputstream to DN(or written anything to DN). At this point of time, the > namenode knows about the 3rd block but the datanode doesn't. > > 4. at point 3, a reader is trying to read the file and he is getting > exception and not able to read the file as the datanode's getBlockInfo > returns null to the client(of course DN doesn't know about the 3rd block > yet) > > In this situation the reader cannot see the file. But when the block > writing is in progress , the read is successful. > > *Is this a bug that needs to be handled in append branch?* > > > > >> -----Original Message----- > >> From: Konstantin Boudnik [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, February 11, 2011 4:09 AM > >>To: [EMAIL PROTECTED] > >> Subject: Re: hadoop 0.20 append - some clarifications > > >> You might also want to check append design doc published at HDFS-265 > > > > I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's > design doc won't apply to it. > > > ------------------------------ > > *From:* Ted Dunning [mailto:[EMAIL PROTECTED]] > *Sent:* Thursday, February 10, 2011 9:29 PM > *To:* [EMAIL PROTECTED]; [EMAIL PROTECTED] > *Cc:* [EMAIL PROTECTED] > *Subject:* Re: hadoop 0.20 append - some clarifications > > > > Correct is a strong word here. > > > > There is actually an HDFS unit test that checks to see if partially written > and unflushed data is visible. The basic rule of thumb is that you need to > synchronize readers and writers outside of HDFS. There is no guarantee that > data is visible or invisible after writing, but there is a guarantee that it > will become visible after sync or close. > > On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > > Is this the correct behavior or my understanding is wrong? > > >
-
RE: hadoop 0.20 append - some clarificationsGokulakannan M 2011-02-11, 08:31
I am not concerned about the sync behavior.
The thing is the reader reading non-flushed(non-synced) data from HDFS as you have explained in previous post.(in hadoop 0.20 append branch) I identified one specific scenario where the above statement is not holding true. Following is how you can reproduce the problem. 1. add debug point at createBlockOutputStream() method in DFSClient and run your HDFS write client in debug mode 2. allow client to write 1 block to HDFS 3. for the 2nd block, the flow will come to the debug point mentioned in 1(do not execute the createBlockOutputStream() method). hold here. 4. parallely, try to read the file from another client Now you will get an error saying that file cannot be read. _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Friday, February 11, 2011 11:04 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications It is a bit confusing. SequenceFile.Writer#sync isn't really sync. There is SequenceFile.Writer#syncFs which is more what you might expect to be sync. Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if you want portable code, you have to reflect a bit to figure out what can be done. On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote: Thanks Ted for clarifying. So the sync is to just flush the current buffers to datanode and persist the block info in namenode once per block, isn't it? Regarding reader able to see the unflushed data, I faced an issue in the following scneario: 1. a writer is writing a 10MB file(block size 2 MB) 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in blocksBeingWritten directory in DN) . So 2 blocks are written 3. client calls addBlock for the 3rd block on namenode and not yet created outputstream to DN(or written anything to DN). At this point of time, the namenode knows about the 3rd block but the datanode doesn't. 4. at point 3, a reader is trying to read the file and he is getting exception and not able to read the file as the datanode's getBlockInfo returns null to the client(of course DN doesn't know about the 3rd block yet) In this situation the reader cannot see the file. But when the block writing is in progress , the read is successful. Is this a bug that needs to be handled in append branch? >> -----Original Message----- >> From: Konstantin Boudnik [mailto:[EMAIL PROTECTED]] >> Sent: Friday, February 11, 2011 4:09 AM >>To: [EMAIL PROTECTED] >> Subject: Re: hadoop 0.20 append - some clarifications >> You might also want to check append design doc published at HDFS-265 I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's design doc won't apply to it. _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 10, 2011 9:29 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications Correct is a strong word here. There is actually an HDFS unit test that checks to see if partially written and unflushed data is visible. The basic rule of thumb is that you need to synchronize readers and writers outside of HDFS. There is no guarantee that data is visible or invisible after writing, but there is a guarantee that it will become visible after sync or close. On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: Is this the correct behavior or my understanding is wrong?
-
Re: hadoop 0.20 append - some clarificationsTed Dunning 2011-02-11, 08:43
I think that in general, the behavior of any program reading data from an
HDFS file before hsync or close is called is pretty much undefined. If you don't wait until some point were part of the file is defined, you can't expect any particular behavior. On Fri, Feb 11, 2011 at 12:31 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > I am not concerned about the sync behavior. > > The thing is the reader reading non-flushed(non-synced) data from HDFS as > you have explained in previous post.(in hadoop 0.20 append branch) > > I identified one specific scenario where the above statement is not holding > true. > > Following is how you can reproduce the problem. > > 1. add debug point at createBlockOutputStream() method in DFSClient and run > your HDFS write client in debug mode > > 2. allow client to write 1 block to HDFS > > 3. for the 2nd block, the flow will come to the debug point mentioned in > 1(do not execute the createBlockOutputStream() method). hold here. > > 4. parallely, try to read the file from another client > > Now you will get an error saying that file cannot be read. > > > > _____ > > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Friday, February 11, 2011 11:04 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: hadoop 0.20 append - some clarifications > > > > It is a bit confusing. > > > > SequenceFile.Writer#sync isn't really sync. > > > > There is SequenceFile.Writer#syncFs which is more what you might expect to > be sync. > > > > Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if > you want portable code, you have to reflect a bit to figure out what can be > done. > > On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > > Thanks Ted for clarifying. > > So the sync is to just flush the current buffers to datanode and persist > the > block info in namenode once per block, isn't it? > > > > Regarding reader able to see the unflushed data, I faced an issue in the > following scneario: > > 1. a writer is writing a 10MB file(block size 2 MB) > > 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in > blocksBeingWritten directory in DN) . So 2 blocks are written > > 3. client calls addBlock for the 3rd block on namenode and not yet created > outputstream to DN(or written anything to DN). At this point of time, the > namenode knows about the 3rd block but the datanode doesn't. > > 4. at point 3, a reader is trying to read the file and he is getting > exception and not able to read the file as the datanode's getBlockInfo > returns null to the client(of course DN doesn't know about the 3rd block > yet) > > In this situation the reader cannot see the file. But when the block > writing > is in progress , the read is successful. > > Is this a bug that needs to be handled in append branch? > > > > >> -----Original Message----- > >> From: Konstantin Boudnik [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, February 11, 2011 4:09 AM > >>To: [EMAIL PROTECTED] > >> Subject: Re: hadoop 0.20 append - some clarifications > > >> You might also want to check append design doc published at HDFS-265 > > > > I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's > design doc won't apply to it. > > > > _____ > > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Thursday, February 10, 2011 9:29 PM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: hadoop 0.20 append - some clarifications > > > > Correct is a strong word here. > > > > There is actually an HDFS unit test that checks to see if partially written > and unflushed data is visible. The basic rule of thumb is that you need to > synchronize readers and writers outside of HDFS. There is no guarantee > that > data is visible or invisible after writing, but there is a guarantee that > it > will become visible after sync or close. > > On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote:
-
Re: hadoop 0.20 append - some clarificationsMarcos M Rubinelli 2011-02-11, 10:55
In the past, I've seen two approaches (besides external synchronization)
used to solve this kind of problem in local file systems: 1. Create file in a temporary location, then move it. My understanding is that moves are atomic in HDFS (since data is only updated in the namenode,) but if you are appending to an existing file, this probably won't help you. 2. Create a "flag" file; an empty file that signals the operation has finished and it is now safe to read the data. I'm not sure if that would that work in HDFS. Can we guarantee the order of the operations? I suppose you could also play with file permissions: if you have one user updating the file and another reading it, you can temporarily revoke reading permission for all but the file owner. This may create race conditions, though. Regards, Marcos In 11-02-2011 06:43, Ted Dunning wrote: > I think that in general, the behavior of any program reading data from an > HDFS file before hsync or close is called is pretty much undefined. > > If you don't wait until some point were part of the file is defined, you > can't expect any particular behavior. > > On Fri, Feb 11, 2011 at 12:31 AM, Gokulakannan M<[EMAIL PROTECTED]> wrote: > >> I am not concerned about the sync behavior. >> >> The thing is the reader reading non-flushed(non-synced) data from HDFS as >> you have explained in previous post.(in hadoop 0.20 append branch) >> >> I identified one specific scenario where the above statement is not holding >> true. >> >> Following is how you can reproduce the problem. >> >> 1. add debug point at createBlockOutputStream() method in DFSClient and run >> your HDFS write client in debug mode >> >> 2. allow client to write 1 block to HDFS >> >> 3. for the 2nd block, the flow will come to the debug point mentioned in >> 1(do not execute the createBlockOutputStream() method). hold here. >> >> 4. parallely, try to read the file from another client >> >> Now you will get an error saying that file cannot be read. >> >> >> >> _____ >> >> From: Ted Dunning [mailto:[EMAIL PROTECTED]] >> Sent: Friday, February 11, 2011 11:04 AM >> To: [EMAIL PROTECTED] >> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; >> [EMAIL PROTECTED] >> Subject: Re: hadoop 0.20 append - some clarifications >> >> >> >> It is a bit confusing. >> >> >> >> SequenceFile.Writer#sync isn't really sync. >> >> >> >> There is SequenceFile.Writer#syncFs which is more what you might expect to >> be sync. >> >> >> >> Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if >> you want portable code, you have to reflect a bit to figure out what can be >> done. >> >> On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M<[EMAIL PROTECTED]> wrote: >> >> Thanks Ted for clarifying. >> >> So the sync is to just flush the current buffers to datanode and persist >> the >> block info in namenode once per block, isn't it? >> >> >> >> Regarding reader able to see the unflushed data, I faced an issue in the >> following scneario: >> >> 1. a writer is writing a 10MB file(block size 2 MB) >> >> 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in >> blocksBeingWritten directory in DN) . So 2 blocks are written >> >> 3. client calls addBlock for the 3rd block on namenode and not yet created >> outputstream to DN(or written anything to DN). At this point of time, the >> namenode knows about the 3rd block but the datanode doesn't. >> >> 4. at point 3, a reader is trying to read the file and he is getting >> exception and not able to read the file as the datanode's getBlockInfo >> returns null to the client(of course DN doesn't know about the 3rd block >> yet) >> >> In this situation the reader cannot see the file. But when the block >> writing >> is in progress , the read is successful. >> >> Is this a bug that needs to be handled in append branch? >> >> >> >>>> -----Original Message----- >>>> From: Konstantin Boudnik [mailto:[EMAIL PROTECTED]] >>>> Sent: Friday, February 11, 2011 4:09 AM
-
RE: hadoop 0.20 append - some clarificationsGokulakannan M 2011-02-14, 15:21
>> I think that in general, the behavior of any program reading data from an HDFS file before hsync or close is called is pretty much undefined. In Unix, users can parallelly read a file when another user is writing a file. And I suppose the sync feature design is based on that. So at any point of time during the file write, parallel users should be able to read the file. https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958&pag e=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1 2663958 _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Friday, February 11, 2011 2:14 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications I think that in general, the behavior of any program reading data from an HDFS file before hsync or close is called is pretty much undefined. If you don't wait until some point were part of the file is defined, you can't expect any particular behavior. On Fri, Feb 11, 2011 at 12:31 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: I am not concerned about the sync behavior. The thing is the reader reading non-flushed(non-synced) data from HDFS as you have explained in previous post.(in hadoop 0.20 append branch) I identified one specific scenario where the above statement is not holding true. Following is how you can reproduce the problem. 1. add debug point at createBlockOutputStream() method in DFSClient and run your HDFS write client in debug mode 2. allow client to write 1 block to HDFS 3. for the 2nd block, the flow will come to the debug point mentioned in 1(do not execute the createBlockOutputStream() method). hold here. 4. parallely, try to read the file from another client Now you will get an error saying that file cannot be read. _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Friday, February 11, 2011 11:04 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications It is a bit confusing. SequenceFile.Writer#sync isn't really sync. There is SequenceFile.Writer#syncFs which is more what you might expect to be sync. Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if you want portable code, you have to reflect a bit to figure out what can be done. On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote: Thanks Ted for clarifying. So the sync is to just flush the current buffers to datanode and persist the block info in namenode once per block, isn't it? Regarding reader able to see the unflushed data, I faced an issue in the following scneario: 1. a writer is writing a 10MB file(block size 2 MB) 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in blocksBeingWritten directory in DN) . So 2 blocks are written 3. client calls addBlock for the 3rd block on namenode and not yet created outputstream to DN(or written anything to DN). At this point of time, the namenode knows about the 3rd block but the datanode doesn't. 4. at point 3, a reader is trying to read the file and he is getting exception and not able to read the file as the datanode's getBlockInfo returns null to the client(of course DN doesn't know about the 3rd block yet) In this situation the reader cannot see the file. But when the block writing is in progress , the read is successful. Is this a bug that needs to be handled in append branch? >> -----Original Message----- >> From: Konstantin Boudnik [mailto:[EMAIL PROTECTED]] >> Sent: Friday, February 11, 2011 4:09 AM >>To: [EMAIL PROTECTED] >> Subject: Re: hadoop 0.20 append - some clarifications >> You might also want to check append design doc published at HDFS-265 I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's design doc won't apply to it. _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 10, 2011 9:29 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications Correct is a strong word here. There is actually an HDFS unit test that checks to see if partially written and unflushed data is visible. The basic rule of thumb is that you need to synchronize readers and writers outside of HDFS. There is no guarantee that data is visible or invisible after writing, but there is a guarantee that it will become visible after sync or close. On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: Is this the correct behavior or my understanding is wrong?
-
Re: hadoop 0.20 append - some clarificationsTed Dunning 2011-02-14, 16:47
HDFS definitely doesn't follow anything like POSIX file semantics.
They may be a vague inspiration for what HDFS does, but generally the behavior of HDFS is not tightly specified. Even the unit tests have some real surprising behavior. On Mon, Feb 14, 2011 at 7:21 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > > > >> I think that in general, the behavior of any program reading data from > an HDFS file before hsync or close is called is pretty much undefined. > > > > In Unix, users can parallelly read a file when another user is writing a > file. And I suppose the sync feature design is based on that. > > So at any point of time during the file write, parallel users should be > able to read the file. > > > > > https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12663958 > ------------------------------ > > *From:* Ted Dunning [mailto:[EMAIL PROTECTED]] > *Sent:* Friday, February 11, 2011 2:14 PM > *To:* [EMAIL PROTECTED]; [EMAIL PROTECTED] > *Cc:* [EMAIL PROTECTED]; [EMAIL PROTECTED] > *Subject:* Re: hadoop 0.20 append - some clarifications > > > > I think that in general, the behavior of any program reading data from an > HDFS file before hsync or close is called is pretty much undefined. > > > > If you don't wait until some point were part of the file is defined, you > can't expect any particular behavior. > > On Fri, Feb 11, 2011 at 12:31 AM, Gokulakannan M <[EMAIL PROTECTED]> > wrote: > > I am not concerned about the sync behavior. > > The thing is the reader reading non-flushed(non-synced) data from HDFS as > you have explained in previous post.(in hadoop 0.20 append branch) > > I identified one specific scenario where the above statement is not holding > true. > > Following is how you can reproduce the problem. > > 1. add debug point at createBlockOutputStream() method in DFSClient and run > your HDFS write client in debug mode > > 2. allow client to write 1 block to HDFS > > 3. for the 2nd block, the flow will come to the debug point mentioned in > 1(do not execute the createBlockOutputStream() method). hold here. > > 4. parallely, try to read the file from another client > > Now you will get an error saying that file cannot be read. > > > > _____ > > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Friday, February 11, 2011 11:04 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: hadoop 0.20 append - some clarifications > > > > It is a bit confusing. > > > > SequenceFile.Writer#sync isn't really sync. > > > > There is SequenceFile.Writer#syncFs which is more what you might expect to > be sync. > > > > Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if > you want portable code, you have to reflect a bit to figure out what can be > done. > > On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > > Thanks Ted for clarifying. > > So the sync is to just flush the current buffers to datanode and persist > the > block info in namenode once per block, isn't it? > > > > Regarding reader able to see the unflushed data, I faced an issue in the > following scneario: > > 1. a writer is writing a 10MB file(block size 2 MB) > > 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in > blocksBeingWritten directory in DN) . So 2 blocks are written > > 3. client calls addBlock for the 3rd block on namenode and not yet created > outputstream to DN(or written anything to DN). At this point of time, the > namenode knows about the 3rd block but the datanode doesn't. > > 4. at point 3, a reader is trying to read the file and he is getting > exception and not able to read the file as the datanode's getBlockInfo > returns null to the client(of course DN doesn't know about the 3rd block > yet) > > In this situation the reader cannot see the file. But when the block > writing
-
RE: hadoop 0.20 append - some clarificationsGokulakannan M 2011-02-15, 04:21
I agree that HDFS doesn't strongly follow POSIX semantics. But it would have
been better if this issue is fixed. _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Monday, February 14, 2011 10:18 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications HDFS definitely doesn't follow anything like POSIX file semantics. They may be a vague inspiration for what HDFS does, but generally the behavior of HDFS is not tightly specified. Even the unit tests have some real surprising behavior. On Mon, Feb 14, 2011 at 7:21 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: >> I think that in general, the behavior of any program reading data from an HDFS file before hsync or close is called is pretty much undefined. In Unix, users can parallelly read a file when another user is writing a file. And I suppose the sync feature design is based on that. So at any point of time during the file write, parallel users should be able to read the file. https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958 <https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958&pa ge=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment- 12663958> &page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comme nt-12663958 _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Friday, February 11, 2011 2:14 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications I think that in general, the behavior of any program reading data from an HDFS file before hsync or close is called is pretty much undefined. If you don't wait until some point were part of the file is defined, you can't expect any particular behavior. On Fri, Feb 11, 2011 at 12:31 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: I am not concerned about the sync behavior. The thing is the reader reading non-flushed(non-synced) data from HDFS as you have explained in previous post.(in hadoop 0.20 append branch) I identified one specific scenario where the above statement is not holding true. Following is how you can reproduce the problem. 1. add debug point at createBlockOutputStream() method in DFSClient and run your HDFS write client in debug mode 2. allow client to write 1 block to HDFS 3. for the 2nd block, the flow will come to the debug point mentioned in 1(do not execute the createBlockOutputStream() method). hold here. 4. parallely, try to read the file from another client Now you will get an error saying that file cannot be read. _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Friday, February 11, 2011 11:04 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications It is a bit confusing. SequenceFile.Writer#sync isn't really sync. There is SequenceFile.Writer#syncFs which is more what you might expect to be sync. Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if you want portable code, you have to reflect a bit to figure out what can be done. On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote: Thanks Ted for clarifying. So the sync is to just flush the current buffers to datanode and persist the block info in namenode once per block, isn't it? Regarding reader able to see the unflushed data, I faced an issue in the following scneario: 1. a writer is writing a 10MB file(block size 2 MB) 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in blocksBeingWritten directory in DN) . So 2 blocks are written 3. client calls addBlock for the 3rd block on namenode and not yet created outputstream to DN(or written anything to DN). At this point of time, the namenode knows about the 3rd block but the datanode doesn't. 4. at point 3, a reader is trying to read the file and he is getting exception and not able to read the file as the datanode's getBlockInfo returns null to the client(of course DN doesn't know about the 3rd block yet) In this situation the reader cannot see the file. But when the block writing is in progress , the read is successful. Is this a bug that needs to be handled in append branch? I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's design doc won't apply to it. _____ From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 10, 2011 9:29 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: hadoop 0.20 append - some clarifications Correct is a strong word here. There is actually an HDFS unit test that checks to see if partially written and unflushed data is visible. The basic rule of thumb is that you need to synchronize readers and writers outside of HDFS. There is no guarantee that data is visible or invisible after writing, but there is a guarantee that it will become visible after sync or close. On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: Is this the correct behavior or my understanding is wrong?
-
Re: hadoop 0.20 append - some clarificationsM. C. Srivas 2011-02-15, 06:29
The problem you describe occurs with NFS also.
Basically, single-site-semantics are very hard to achieve on a networked file system. On Mon, Feb 14, 2011 at 8:21 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > I agree that HDFS doesn't strongly follow POSIX semantics. But it would > have > been better if this issue is fixed. > > > > _____ > > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Monday, February 14, 2011 10:18 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: hadoop 0.20 append - some clarifications > > > > HDFS definitely doesn't follow anything like POSIX file semantics. > > > > They may be a vague inspiration for what HDFS does, but generally the > behavior of HDFS is not tightly specified. Even the unit tests have some > real surprising behavior. > > On Mon, Feb 14, 2011 at 7:21 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > > > > >> I think that in general, the behavior of any program reading data from > an > HDFS file before hsync or close is called is pretty much undefined. > > > > In Unix, users can parallelly read a file when another user is writing a > file. And I suppose the sync feature design is based on that. > > So at any point of time during the file write, parallel users should be > able > to read the file. > > > > https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958 > < > https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958&pa > > ge=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment- > 12663958> > > &page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comme > nt-12663958 > > _____ > > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Friday, February 11, 2011 2:14 PM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: hadoop 0.20 append - some clarifications > > > > I think that in general, the behavior of any program reading data from an > HDFS file before hsync or close is called is pretty much undefined. > > > > If you don't wait until some point were part of the file is defined, you > can't expect any particular behavior. > > On Fri, Feb 11, 2011 at 12:31 AM, Gokulakannan M <[EMAIL PROTECTED]> > wrote: > > I am not concerned about the sync behavior. > > The thing is the reader reading non-flushed(non-synced) data from HDFS as > you have explained in previous post.(in hadoop 0.20 append branch) > > I identified one specific scenario where the above statement is not holding > true. > > Following is how you can reproduce the problem. > > 1. add debug point at createBlockOutputStream() method in DFSClient and run > your HDFS write client in debug mode > > 2. allow client to write 1 block to HDFS > > 3. for the 2nd block, the flow will come to the debug point mentioned in > 1(do not execute the createBlockOutputStream() method). hold here. > > 4. parallely, try to read the file from another client > > Now you will get an error saying that file cannot be read. > > > > _____ > > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Friday, February 11, 2011 11:04 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: hadoop 0.20 append - some clarifications > > > > It is a bit confusing. > > > > SequenceFile.Writer#sync isn't really sync. > > > > There is SequenceFile.Writer#syncFs which is more what you might expect to > be sync. > > > > Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if > you want portable code, you have to reflect a bit to figure out what can be > done. > > On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote: > > Thanks Ted for clarifying. > > So the sync is to just flush the current buffers to datanode and persist > the > block info in namenode once per block, isn't it? > > > > Regarding reader able to see the unflushed data, I faced an issue in the |