Folks often move files once they're closed into a directory where
they're processed to avoid issues with partially written data. Maybe
you could start a new log file every hour rather than every day?
We could add an ignoreTruncation or ignoreCorruption option to
DataFileReader that attempts to read files that might be truncated or
And yes, you can probably just catch those exceptions and exit the map
at that point.
On Mon, Jan 14, 2013 at 11:22 AM, Terry Healy <[EMAIL PROTECTED]> wrote:
> I have a log collection application that writes .avro files within HDFS.
> Ideally I would like to include the current days (open for append) file
> as one of the input files for a periodic M/R job.
> I tried this but the Map job exited in error with the dreaded "Invalid
> Sync!" IOException. I guess I should have expected this, but is there a
> reasonable way around it? Can I catch the exception and just exit the
> map at that point?
> All suggestions appreciated.