Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> How to IO catch exceptions using python


Copy link to this message
-
Re: How to IO catch exceptions using python
Hey Xavier,

The functionality you are looking for was added to 0.19 and above:
http://issues.apache.org/jira/browse/HADOOP-3828. If you upgrade your
cluster to CDH2, you should be good to go.

Regards,
Jeff

On Mon, Oct 19, 2009 at 10:58 AM, <[EMAIL PROTECTED]>wrote:

> Hi Everybody,
>
> I'm doing a project where I have to read a large set of compress files
> (gz). I'm using python and streaming to achieve my goals. However, I
> have a problem, there are corrupt compress files that are killing my
> map/reduce jobs.
> My environment is the following:
> Hadoop-0.18.3 (CDH1)
>
>
> Do you guys have some recommendations how to manage this case?
> How I can catch that exception using python so that my jobs don't fail?
> How I can identify these files using python and move them to a corrupt
> file folder?
>
> I really appreciate any recommendation
>
> Xavier
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB