Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Chukwa >> mail # user >> _SUCCESS files appearing in demuxOutput


+
Corbin Hoenes 2011-02-24, 18:12
+
Eric Yang 2011-02-24, 18:55
+
Jerome Boulon 2011-02-24, 19:20
+
Corbin Hoenes 2011-02-25, 00:24
+
Ariel Rabkin 2011-02-25, 00:27
+
Eric Yang 2011-02-25, 00:32
Copy link to this message
-
Re: _SUCCESS files appearing in demuxOutput
+1 for skipping _*
/Jerome

From: Eric Yang <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Thu, 24 Feb 2011 16:32:49 -0800
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: _SUCCESS files appearing in demuxOutput

I also support for skipping _*.

Regards,
Eric

On 2/24/11 4:27 PM, "Ariel Rabkin" <[EMAIL PROTECTED]> wrote:

I don't think this has been fixed yet in trunk, let alone 0.4.

I would support skipping everything starting with _.  Is there an
actual use case this would break?

--Ari

On Thu, Feb 24, 2011 at 4:24 PM, Corbin Hoenes <[EMAIL PROTECTED]> wrote:
> We're using Cloudera's CDH3 beta 4 release.   Maybe they've patched in the
> FileOutputCommitter stuff into their release as it's based on Hadoop 0.20.2
> Looking at the source for Chukwa 0.3 (version we are on) the
> MoveToRepository class skips the _log and _temporary directories.
>
> Seems like Chukwa should skip the _SUCCESS directory as well?  Or could a
> more general skip be used like skip anything starting with and underscore?
> (maybe too aggressive).
>
> Does Chukwa 0.4 or 0.5 fix this issue? (I'm probably going to just have to
> patch 0.3 but maybe just another reason to upgrade.)
>
> On Thu, Feb 24, 2011 at 12:20 PM, Jerome Boulon <[EMAIL PROTECTED]> wrote:
>>
>> This filename is coming from
>> here: http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/constant-values.html
>> In general for hadoop you may want to avoid looking at any "_*" file since
>> those are Hadoop related files like (_temporary, _log,…)
>> /Jerome.
>> From: Eric Yang <[EMAIL PROTECTED]>
>> Reply-To: "[EMAIL PROTECTED]"
>> <[EMAIL PROTECTED]>
>> Date: Thu, 24 Feb 2011 10:55:57 -0800
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Subject: Re: _SUCCESS files appearing in demuxOutput
>>
>> Hi Corbin,
>>
>> I have not seen this.  What is the version of hadoop that you are using,
>> are you using 0.21?  It looks like the _SUCCESS file is spill out after
>> demux mapreduce job.  There are two possibilities leading to the creation of
>> this file.  Demux is modified and it is doing something that is unexpected,
>> or the mapreduce framework 0.21 put that file there.
>> If you are using 0.21, I would recommend to avoid it.
>>
>> A more stable version of Hadoop is 0.20.100 branch, and you can download
>> it from:
>>
>> http://people.apache.org/~eyang/
>>
>> Regards,
>> Eric
>>
>> On 2/24/11 10:12 AM, "Corbin Hoenes" <[EMAIL PROTECTED]> wrote:
>>
>> Anyone seen this?
>>
>> /chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS
>>
>> I clean them out and I keep getting the same file showing up and chukwa
>> doesn't know how to handle it:
>>
>> postProcess.log:
>> 2011-02-21 06:51:55,027 INFO main MoveToRepository - main procesing
>> Cluster (_SUCCESS)
>> 2011-02-21 06:51:55,027 INFO main MoveToRepository -
>> processClutserDirectory (_SUCCESS,/chukwa/repos//_SUCCESS)
>> 2011-02-21 06:51:55,028 WARN main PostProcessorManager - Error in
>> processDemuxOutput:
>> java.io.IOException:
>> hdfs://cluster1/chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS is
>> not a directory!
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.processClutserDirectory(MoveToRepository.java:54)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.main(MoveToRepository.java:250)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.movetoMainRepository(PostProcessorManager.java:201)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.start(PostProcessorManager.java:146)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.main(PostProcessorManager.java:80)

Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
+
James Seigel 2011-02-25, 01:36
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB