Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> Simple Archiver , Demux and PostProcessManager about the Raw Data sink file


Copy link to this message
-
Simple Archiver , Demux and PostProcessManager about the Raw Data sink file
As the wiki says, *Data in the sink may include duplicate and omitted
chunks.*So we need demux and archive the raw data sink file .

The start-data-processors.sh runs three processes ,  ChukwaArchiveManager
, PostProcessorManager and DemuxManager.

This page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html explains
the data workflow.

First , DemuxManager moves raw *.done to
 dataSinkArchives/[yyyyMMdd]/*/*.done.

Then, ChukwaArchiveManager every half hour or so aggregates and removes
dataSinkArchives data using M/R , from dataSinkArchives/[yyyyMMdd]/*/*.done
to finalArchives/.

The complete logflow is  logs/*.done ==>  dataSinkArchives/[yyyyMMdd]/*/*.done
==> finalArchives

1.
         Here , i have a question .Accoring to the
http://incubator.apache.org/chukwa/docs/r0.4.0/programming.html#Using+MapReduce
,
 Simple Archiver & Demux . The simple archiver removed the duplicates .
        Does the simple archiver refers to the  ChukwaArchiveManager?

2.
        And the  PostProcessorManager , moves logs from
postProcess/demuxOutputDir
to repos/[clusterName]/ . But no one writes log
into postProcess/demuxOutputDir.
        What does  PostProcessorManager do ?

3.     Can i just run the DemuxManager  & ChukwaArchiveManager ?  i found i
just need these two components.

Thanks!

--
Best regards,

Ivy Tang
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB