Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> Sink file has omitted chunks?

Copy link to this message
Re: Sink file has omitted chunks?
"Omitted chunks" is an error. By definition, if chunks are omitted
they won't be there. Duplicates and other peculiarities will happen in
the event of failures. As you say, it's a consequence of the
distributed environment.

SimpleArchiver should do the cleanup you want.


On Mon, Nov 22, 2010 at 11:39 PM, Ying Tang <[EMAIL PROTECTED]> wrote:
> Hi all ,
>     After reading the chukwa docs , per my understanding , the log data flow
> is :
>     adaptor-->agent-->collector-->sink file--->....
>     In the doc says , " Data in the sink may include duplicate and omitted
> chunks."
>     And it is not recommanded to write MapReduce jobs that directly examine
> the data sink , "becaues  jobs will likely discard most of their input ".
>     Here is my question:
>     1. Why data in sink file include duplicate and ommitted chunks ? Because
> the distributed environmrnt ?
>     2. How to solve the problem above ?  The Simple Archiver generates the
> archive file , and duplicates have been removed . So the simple archiver can
> only solve the duplicate data , right?
> --
> Best regards,
> Ivy Tang

UC Berkeley Computer Science Department