Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> More about the removing of duplicate chunks

Copy link to this message
Re: More about the removing of duplicate chunks
TimePartition is showing you when the data showed up.

I think SeqID + StreamName is the right thing to match on -- if the
data is re-collected later, but it's the same data, yeah, you want to
treat it as duplicate.

On Wed, Mar 28, 2012 at 12:56 AM, IvyTang <[EMAIL PROTECTED]> wrote:
> Thanks to the simple archiver , we do remove almost all the duplicate
> chunks.
> But we found that there are still few ,very few duplicate chunks left .
> And strangely , these chunks's key are't the same. The DataType,StreamName
> and SeqId are the same , but the TimePartition are different. The log in
> these chunks are the same.
> Could we just distinguish the duplicate chunks using the DataType,StreamName
> and SeqId ? What's the TimePartition meaning for?
> Thanks!
> --
> Best regards,
> Ivy Tang

UC Berkeley Computer Science Department