Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> _SUCCESS files appearing in demuxOutput


Copy link to this message
-
Re: _SUCCESS files appearing in demuxOutput
I don't think this has been fixed yet in trunk, let alone 0.4.

I would support skipping everything starting with _.  Is there an
actual use case this would break?

--Ari

On Thu, Feb 24, 2011 at 4:24 PM, Corbin Hoenes <[EMAIL PROTECTED]> wrote:
> We're using Cloudera's CDH3 beta 4 release.   Maybe they've patched in the
> FileOutputCommitter stuff into their release as it's based on Hadoop 0.20.2
> Looking at the source for Chukwa 0.3 (version we are on) the
> MoveToRepository class skips the _log and _temporary directories.
>
> Seems like Chukwa should skip the _SUCCESS directory as well?  Or could a
> more general skip be used like skip anything starting with and underscore?
> (maybe too aggressive).
>
> Does Chukwa 0.4 or 0.5 fix this issue? (I'm probably going to just have to
> patch 0.3 but maybe just another reason to upgrade.)
>
> On Thu, Feb 24, 2011 at 12:20 PM, Jerome Boulon <[EMAIL PROTECTED]> wrote:
>>
>> This filename is coming from
>> here: http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/constant-values.html
>> In general for hadoop you may want to avoid looking at any "_*" file since
>> those are Hadoop related files like (_temporary, _log,…)
>> /Jerome.
>> From: Eric Yang <[EMAIL PROTECTED]>
>> Reply-To: "[EMAIL PROTECTED]"
>> <[EMAIL PROTECTED]>
>> Date: Thu, 24 Feb 2011 10:55:57 -0800
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Subject: Re: _SUCCESS files appearing in demuxOutput
>>
>> Hi Corbin,
>>
>> I have not seen this.  What is the version of hadoop that you are using,
>> are you using 0.21?  It looks like the _SUCCESS file is spill out after
>> demux mapreduce job.  There are two possibilities leading to the creation of
>> this file.  Demux is modified and it is doing something that is unexpected,
>> or the mapreduce framework 0.21 put that file there.
>> If you are using 0.21, I would recommend to avoid it.
>>
>> A more stable version of Hadoop is 0.20.100 branch, and you can download
>> it from:
>>
>> http://people.apache.org/~eyang/
>>
>> Regards,
>> Eric
>>
>> On 2/24/11 10:12 AM, "Corbin Hoenes" <[EMAIL PROTECTED]> wrote:
>>
>> Anyone seen this?
>>
>> /chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS
>>
>> I clean them out and I keep getting the same file showing up and chukwa
>> doesn't know how to handle it:
>>
>> postProcess.log:
>> 2011-02-21 06:51:55,027 INFO main MoveToRepository - main procesing
>> Cluster (_SUCCESS)
>> 2011-02-21 06:51:55,027 INFO main MoveToRepository -
>> processClutserDirectory (_SUCCESS,/chukwa/repos//_SUCCESS)
>> 2011-02-21 06:51:55,028 WARN main PostProcessorManager - Error in
>> processDemuxOutput:
>> java.io.IOException:
>> hdfs://cluster1/chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS is
>> not a directory!
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.processClutserDirectory(MoveToRepository.java:54)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.main(MoveToRepository.java:250)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.movetoMainRepository(PostProcessorManager.java:201)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.start(PostProcessorManager.java:146)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.main(PostProcessorManager.java:80)
>>
>>
>
>

--
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department