-Re: FSDataInputStream.read returns -1 with growing file and never continues reading
Colin McCabe 2012-12-27, 20:37
Also, read() returning -1 is not an error, it's EOF. This is the same
as for the regular Java InputStream.
On Thu, Dec 20, 2012 at 10:32 AM, Christoph Rupp <[EMAIL PROTECTED]> wrote:
> Thank you, Harsh. I appreciate it.
> 2012/12/20 Harsh J <[EMAIL PROTECTED]>
>> Hi Christoph,
>> If you use sync/hflush/hsync, the new length of data is only seen by a
>> new reader, not an existent reader. The "workaround" you've done
>> exactly how we've implemented the "fs -tail <file>" utility. See code
>> for that at
>> (Note the looping at ~74).
>> On Thu, Dec 20, 2012 at 5:51 PM, Christoph Rupp <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> > I am experiencing an unexpected situation where FSDataInputStream.read()
>> > returns -1 while reading data from a file that another process still
>> > to. According to the documentation read() should never return -1 but
>> > Exceptions on errors. In addition, there's more data available, and
>> > definitely should not fail.
>> > The problem gets worse because the FSDataInputStream is not able to
>> > from this. If it once returns -1 then it will always return -1, even if
>> > file continues growing.
>> > If, at the same time, other Java processes read other HDFS files, they
>> > also return -1 immediately after opening the file. It smells like this
>> > gets propagated to other client processes as well.
>> > I found a workaround: close the FSDataInputStream, open it again and then
>> > seek to the previous position. And then reading works fine.
>> > Another problem that i have seen is that the FSDataInputStream returns -1
>> > when reaching EOF. It will never return 0 (which i would expect when
>> > reaching EOF).
>> > I use CDH 4.1.2, but also saw this with CDH 3u5. I have attached samples
>> > reproduce this.
>> > My cluster consists of 4 machines; 1 namenode and 3 datanodes. I run my
>> > tests on the namenode machine. there are no other HDFS users, and the
>> > that is generated by my tests is fairly low, i would say.
>> > One process writes to 6 files simultaneously, but with a 5 sec sleep
>> > each write. It uses an FSDataOutputStream, and after writing data it
>> > sync(). Each write() appends 8 mb; it stops when the file grows to 100
>> > Six processes read files; each process reads one file. At first each
>> > loops till the file exists. If it does then it opens the
>> > and starts reading. Usually the first process returns the first 8 MB in
>> > file before it starts returning -1. But the other processes immediately
>> > return -1 without reading any data. I start the 6 reader processes
>> before i
>> > start the writer.
>> > Search HdfsReader.java for "WORKAROUND" and remove the comments; this
>> > reopen the FSDataInputStream after -1 is returned, and then everything
>> > works.
>> > Sources are attached.
>> > This is a very basic scenario and i wonder if i'm doing anything wrong
>> or if
>> > i found an HDFS bug.
>> > bye
>> > Christoph
>> Harsh J