The functionality you are looking for was added to 0.19 and above:
http://issues.apache.org/jira/browse/HADOOP-3828. If you upgrade your
cluster to CDH2, you should be good to go.
On Mon, Oct 19, 2009 at 10:58 AM, <[EMAIL PROTECTED]>wrote:
> Hi Everybody,
> I'm doing a project where I have to read a large set of compress files
> (gz). I'm using python and streaming to achieve my goals. However, I
> have a problem, there are corrupt compress files that are killing my
> map/reduce jobs.
> My environment is the following:
> Hadoop-0.18.3 (CDH1)
> Do you guys have some recommendations how to manage this case?
> How I can catch that exception using python so that my jobs don't fail?
> How I can identify these files using python and move them to a corrupt
> file folder?
> I really appreciate any recommendation