|
|
Jean-Pierre OCALAN 2010-08-25, 00:21
Hi,
I would like to know if the *rename* operation (i.e. renaming a directory or a single file) can be consider as an atomic operation in HDFS.
Basically what i am trying to achieve is having one process that continiously add new file into the HDFS and another process that will start every 15 minutes a map/reduce flow on file that were newly added into the HDFS.
In other words a process A continuously read a *local directory "A/in"*where new files are moved there continuously and put each file in a *"A/tmp" directory on the HDFS*. When A finish to put one file in "*A/tmp"*it will *move/rename that file into a "B/in" directory*. At the same time a process B will, every 15 minutes, push all the files present in "B/in" to a map/reduce flow.
Regards,
-- JP
Friso van Vollenhoven 2010-08-26, 12:51
Hi JP,
I don't actually know the answer to your question, but we do a lot of things using files and directories on HDFS and use renames to move files out of directories which are periodically scanned by other processes. All I can say: it has never gone wrong. We are happily living with the assumptions that the rename is atomic. Our directory scanning jobs runs every couple of seconds and has done so without any error for months.
Short answer: I don't know, but it appears to be that way (ignorance is a blessing). Friso
On 25 aug 2010, at 02:21, Jean-Pierre OCALAN wrote:
Hi,
I would like to know if the rename operation (i.e. renaming a directory or a single file) can be consider as an atomic operation in HDFS.
Basically what i am trying to achieve is having one process that continiously add new file into the HDFS and another process that will start every 15 minutes a map/reduce flow on file that were newly added into the HDFS.
In other words a process A continuously read a local directory "A/in" where new files are moved there continuously and put each file in a "A/tmp" directory on the HDFS. When A finish to put one file in "A/tmp" it will move/rename that file into a "B/in" directory. At the same time a process B will, every 15 minutes, push all the files present in "B/in" to a map/reduce flow.
Regards,
-- JP
Jean-Pierre OCALAN 2010-08-26, 14:51
Hi Friso,
Thank you very much for your answer. I guess I will assume that it's atomic like you did. At least for now.
Again thank you,
JP.
On Thu, Aug 26, 2010 at 8:51 AM, Friso van Vollenhoven < [EMAIL PROTECTED]> wrote:
> Hi JP, > > I don't actually know the answer to your question, but we do a lot of > things using files and directories on HDFS and use renames to move files out > of directories which are periodically scanned by other processes. All I can > say: it has never gone wrong. We are happily living with the assumptions > that the rename is atomic. Our directory scanning jobs runs every couple of > seconds and has done so without any error for months. > > Short answer: I don't know, but it appears to be that way (ignorance is a > blessing). > > > Friso > > > > On 25 aug 2010, at 02:21, Jean-Pierre OCALAN wrote: > > Hi, > > I would like to know if the *rename* operation (i.e. renaming a directory > or a single file) can be consider as an atomic operation in HDFS. > > Basically what i am trying to achieve is having one process that > continiously add new file into the HDFS and another process that will start > every 15 minutes a map/reduce flow on file that were newly added into the > HDFS. > > In other words a process A continuously read a *local directory "A/in"*where new files are moved there continuously and put each file in a > *"A/tmp" directory on the HDFS*. When A finish to put one file in "*A/tmp" > * it will *move/rename that file into a "B/in" directory*. At the same > time a process B will, every 15 minutes, push all the files present in > "B/in" to a map/reduce flow. > > Regards, > > -- JP > > > -- jean-pierre ocalan [EMAIL PROTECTED]
|
|