Corbin Hoenes 2011-02-24, 18:12
Eric Yang 2011-02-24, 18:55
Jerome Boulon 2011-02-24, 19:20
Corbin Hoenes 2011-02-25, 00:24
Ariel Rabkin 2011-02-25, 00:27
Eric Yang 2011-02-25, 00:32
+1 for skipping _*
From: Eric Yang <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Thu, 24 Feb 2011 16:32:49 -0800
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: _SUCCESS files appearing in demuxOutput
I also support for skipping _*.
On 2/24/11 4:27 PM, "Ariel Rabkin" <[EMAIL PROTECTED]> wrote:
I don't think this has been fixed yet in trunk, let alone 0.4.
I would support skipping everything starting with _. Is there an
actual use case this would break?
On Thu, Feb 24, 2011 at 4:24 PM, Corbin Hoenes <[EMAIL PROTECTED]> wrote:
> We're using Cloudera's CDH3 beta 4 release. Maybe they've patched in the
> FileOutputCommitter stuff into their release as it's based on Hadoop 0.20.2
> Looking at the source for Chukwa 0.3 (version we are on) the
> MoveToRepository class skips the _log and _temporary directories.
> Seems like Chukwa should skip the _SUCCESS directory as well? Or could a
> more general skip be used like skip anything starting with and underscore?
> (maybe too aggressive).
> Does Chukwa 0.4 or 0.5 fix this issue? (I'm probably going to just have to
> patch 0.3 but maybe just another reason to upgrade.)
> On Thu, Feb 24, 2011 at 12:20 PM, Jerome Boulon <[EMAIL PROTECTED]> wrote:
>> This filename is coming from
>> here: http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/constant-values.html
>> In general for hadoop you may want to avoid looking at any "_*" file since
>> those are Hadoop related files like (_temporary, _log,…)
>> From: Eric Yang <[EMAIL PROTECTED]>
>> Reply-To: "[EMAIL PROTECTED]"
>> <[EMAIL PROTECTED]>
>> Date: Thu, 24 Feb 2011 10:55:57 -0800
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Subject: Re: _SUCCESS files appearing in demuxOutput
>> Hi Corbin,
>> I have not seen this. What is the version of hadoop that you are using,
>> are you using 0.21? It looks like the _SUCCESS file is spill out after
>> demux mapreduce job. There are two possibilities leading to the creation of
>> this file. Demux is modified and it is doing something that is unexpected,
>> or the mapreduce framework 0.21 put that file there.
>> If you are using 0.21, I would recommend to avoid it.
>> A more stable version of Hadoop is 0.20.100 branch, and you can download
>> it from:
>> On 2/24/11 10:12 AM, "Corbin Hoenes" <[EMAIL PROTECTED]> wrote:
>> Anyone seen this?
>> I clean them out and I keep getting the same file showing up and chukwa
>> doesn't know how to handle it:
>> 2011-02-21 06:51:55,027 INFO main MoveToRepository - main procesing
>> Cluster (_SUCCESS)
>> 2011-02-21 06:51:55,027 INFO main MoveToRepository -
>> processClutserDirectory (_SUCCESS,/chukwa/repos//_SUCCESS)
>> 2011-02-21 06:51:55,028 WARN main PostProcessorManager - Error in
>> hdfs://cluster1/chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS is
>> not a directory!
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
James Seigel 2011-02-25, 01:36