Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa, mail # user - Simple Archiver , Demux and PostProcessManager about the Raw Data sink file


Copy link to this message
-
Simple Archiver , Demux and PostProcessManager about the Raw Data sink file
IvyTang 2012-03-15, 07:36
As the wiki says, *Data in the sink may include duplicate and omitted
chunks.*So we need demux and archive the raw data sink file .

The start-data-processors.sh runs three processes ,  ChukwaArchiveManager
, PostProcessorManager and DemuxManager.

This page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html explains
the data workflow.

First , DemuxManager moves raw *.done to
 dataSinkArchives/[yyyyMMdd]/*/*.done.

Then, ChukwaArchiveManager every half hour or so aggregates and removes
dataSinkArchives data using M/R , from dataSinkArchives/[yyyyMMdd]/*/*.done
to finalArchives/.

The complete logflow is  logs/*.done ==>  dataSinkArchives/[yyyyMMdd]/*/*.done
==> finalArchives

1.
         Here , i have a question .Accoring to the
http://incubator.apache.org/chukwa/docs/r0.4.0/programming.html#Using+MapReduce
,
 Simple Archiver & Demux . The simple archiver removed the duplicates .
        Does the simple archiver refers to the  ChukwaArchiveManager?

2.
        And the  PostProcessorManager , moves logs from
postProcess/demuxOutputDir
to repos/[clusterName]/ . But no one writes log
into postProcess/demuxOutputDir.
        What does  PostProcessorManager do ?

3.     Can i just run the DemuxManager  & ChukwaArchiveManager ?  i found i
just need these two components.

Thanks!

--
Best regards,

Ivy Tang