IvyTang 2012-03-15, 07:36
-Re: Simple Archiver , Demux and PostProcessManager about the Raw Data sink file
Ariel Rabkin 2012-03-16, 18:33
On Thu, Mar 15, 2012 at 12:36 AM, IvyTang <[EMAIL PROTECTED]> wrote:
> As the wiki says, Data in the sink may include duplicate and omitted
> chunks.So we need demux and archive the raw data sink file .
> The start-data-processors.sh runs three processes , ChukwaArchiveManager
> , PostProcessorManager and DemuxManager.
> page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html explains
> the data workflow.
> First , DemuxManager moves raw *.done to
> Then, ChukwaArchiveManager every half hour or so aggregates and removes
> dataSinkArchives data using M/R , from dataSinkArchives/[yyyyMMdd]/*/*.done
> to finalArchives/.
> The complete logflow is logs/*.done
> ==> dataSinkArchives/[yyyyMMdd]/*/*.done ==> finalArchives
> Here , i have a question .Accoring to
> the http://incubator.apache.org/chukwa/docs/r0.4.0/programming.html#Using+MapReduce ,
> Simple Archiver & Demux . The simple archiver removed the duplicates .
> Does the simple archiver refers to the ChukwaArchiveManager?
No, these are separate pieces. Back in the day, I found that
ChukwaArchiveManager was too complicated for my needs, and that I
wanted a simple command that would just archive whatever was in the
sink. And that's the simple archiver. It's found in
> 3. Can i just run the DemuxManager & ChukwaArchiveManager ? i found i
> just need these two components.
Yes, you should be fine with just those if they meet your needs.
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department