Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Possible to include open .avro file in Map/Reduce job?


+
Terry Healy 2013-01-14, 19:22
Copy link to this message
-
Re: Possible to include open .avro file in Map/Reduce job?
Doug Cutting 2013-01-17, 21:36
Folks often move files once they're closed into a directory where
they're processed to avoid issues with partially written data.  Maybe
you could start a new log file every hour rather than every day?

We could add an ignoreTruncation or ignoreCorruption option to
DataFileReader that attempts to read files that might be truncated or
corrupted.

And yes, you can probably just catch those exceptions and exit the map
at that point.

Doug

On Mon, Jan 14, 2013 at 11:22 AM, Terry Healy <[EMAIL PROTECTED]> wrote:
> I have a log collection application that writes .avro files within HDFS.
> Ideally I would like to include the current days (open for append) file
> as one of the input files for a periodic M/R job.
>
> I tried this but the Map job exited in error with the dreaded "Invalid
> Sync!" IOException. I guess I should have expected this, but is there a
> reasonable way around it? Can I catch the exception and just exit the
> map at that point?
>
> All suggestions appreciated.
>
> -Terry
+
Terry Healy 2013-01-18, 14:51