|
|
-
Simple Archiver , Demux and PostProcessManager about the Raw Data sink fileIvyTang 2012-03-15, 07:36
As the wiki says, *Data in the sink may include duplicate and omitted
chunks.*So we need demux and archive the raw data sink file . The start-data-processors.sh runs three processes , ChukwaArchiveManager , PostProcessorManager and DemuxManager. This page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html explains the data workflow. First , DemuxManager moves raw *.done to dataSinkArchives/[yyyyMMdd]/*/*.done. Then, ChukwaArchiveManager every half hour or so aggregates and removes dataSinkArchives data using M/R , from dataSinkArchives/[yyyyMMdd]/*/*.done to finalArchives/. The complete logflow is logs/*.done ==> dataSinkArchives/[yyyyMMdd]/*/*.done ==> finalArchives 1. Here , i have a question .Accoring to the http://incubator.apache.org/chukwa/docs/r0.4.0/programming.html#Using+MapReduce , Simple Archiver & Demux . The simple archiver removed the duplicates . Does the simple archiver refers to the ChukwaArchiveManager? 2. And the PostProcessorManager , moves logs from postProcess/demuxOutputDir to repos/[clusterName]/ . But no one writes log into postProcess/demuxOutputDir. What does PostProcessorManager do ? 3. Can i just run the DemuxManager & ChukwaArchiveManager ? i found i just need these two components. Thanks! -- Best regards, Ivy Tang +
Ariel Rabkin 2012-03-16, 18:33
|