Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> Simple Archiver , Demux and PostProcessManager about the Raw Data sink file


Copy link to this message
-
Re: Simple Archiver , Demux and PostProcessManager about the Raw Data sink file
On Thu, Mar 15, 2012 at 12:36 AM, IvyTang <[EMAIL PROTECTED]> wrote:
> As the wiki says, Data in the sink may include duplicate and omitted
> chunks.So we need demux and archive the raw data sink file .
>
> The start-data-processors.sh runs three processes ,  ChukwaArchiveManager
> , PostProcessorManager and DemuxManager.
>
> This
> page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html explains
> the data workflow.
>
> First , DemuxManager moves raw *.done to
>  dataSinkArchives/[yyyyMMdd]/*/*.done.
>
> Then, ChukwaArchiveManager every half hour or so aggregates and removes
> dataSinkArchives data using M/R , from dataSinkArchives/[yyyyMMdd]/*/*.done
> to finalArchives/.
>
> The complete logflow is  logs/*.done
> ==>  dataSinkArchives/[yyyyMMdd]/*/*.done ==> finalArchives
>
> 1.
>          Here , i have a question .Accoring to
> the http://incubator.apache.org/chukwa/docs/r0.4.0/programming.html#Using+MapReduce ,
>  Simple Archiver & Demux . The simple archiver removed the duplicates .
>         Does the simple archiver refers to the  ChukwaArchiveManager?

No, these are separate pieces. Back in the day, I found that
ChukwaArchiveManager was too complicated for my needs, and that I
wanted a simple command that would just archive whatever was in the
sink. And that's the simple archiver. It's found in
org.apache.hadoop.chukwa.extraction.archive.SinkArchiver.
> 3.     Can i just run the DemuxManager  & ChukwaArchiveManager ?  i found i
> just need these two components.

Yes, you should be fine with just those if they meet your needs.

--
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB