-Re: FSDataInputStream.read returns -1 with growing file and never continues reading
Thank you, Harsh. I appreciate it.
2012/12/20 Harsh J <[EMAIL PROTECTED]>
> Hi Christoph,
> If you use sync/hflush/hsync, the new length of data is only seen by a
> new reader, not an existent reader. The "workaround" you've done
> exactly how we've implemented the "fs -tail <file>" utility. See code
> for that at
> (Note the looping at ~74).
> On Thu, Dec 20, 2012 at 5:51 PM, Christoph Rupp <[EMAIL PROTECTED]> wrote:
> > Hi,
> > I am experiencing an unexpected situation where FSDataInputStream.read()
> > returns -1 while reading data from a file that another process still
> > to. According to the documentation read() should never return -1 but
> > Exceptions on errors. In addition, there's more data available, and
> > definitely should not fail.
> > The problem gets worse because the FSDataInputStream is not able to
> > from this. If it once returns -1 then it will always return -1, even if
> > file continues growing.
> > If, at the same time, other Java processes read other HDFS files, they
> > also return -1 immediately after opening the file. It smells like this
> > gets propagated to other client processes as well.
> > I found a workaround: close the FSDataInputStream, open it again and then
> > seek to the previous position. And then reading works fine.
> > Another problem that i have seen is that the FSDataInputStream returns -1
> > when reaching EOF. It will never return 0 (which i would expect when
> > reaching EOF).
> > I use CDH 4.1.2, but also saw this with CDH 3u5. I have attached samples
> > reproduce this.
> > My cluster consists of 4 machines; 1 namenode and 3 datanodes. I run my
> > tests on the namenode machine. there are no other HDFS users, and the
> > that is generated by my tests is fairly low, i would say.
> > One process writes to 6 files simultaneously, but with a 5 sec sleep
> > each write. It uses an FSDataOutputStream, and after writing data it
> > sync(). Each write() appends 8 mb; it stops when the file grows to 100
> > Six processes read files; each process reads one file. At first each
> > loops till the file exists. If it does then it opens the
> > and starts reading. Usually the first process returns the first 8 MB in
> > file before it starts returning -1. But the other processes immediately
> > return -1 without reading any data. I start the 6 reader processes
> before i
> > start the writer.
> > Search HdfsReader.java for "WORKAROUND" and remove the comments; this
> > reopen the FSDataInputStream after -1 is returned, and then everything
> > works.
> > Sources are attached.
> > This is a very basic scenario and i wonder if i'm doing anything wrong
> or if
> > i found an HDFS bug.
> > bye
> > Christoph
> Harsh J