Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Chukwa >> mail # user >> More about the removing of duplicate chunks


+
IvyTang 2012-03-28, 07:56
Copy link to this message
-
Re: More about the removing of duplicate chunks
TimePartition is showing you when the data showed up.

I think SeqID + StreamName is the right thing to match on -- if the
data is re-collected later, but it's the same data, yeah, you want to
treat it as duplicate.

On Wed, Mar 28, 2012 at 12:56 AM, IvyTang <[EMAIL PROTECTED]> wrote:
> Thanks to the simple archiver , we do remove almost all the duplicate
> chunks.
>
> But we found that there are still few ,very few duplicate chunks left .
>
> And strangely , these chunks's key are't the same. The DataType,StreamName
> and SeqId are the same , but the TimePartition are different. The log in
> these chunks are the same.
>
> Could we just distinguish the duplicate chunks using the DataType,StreamName
> and SeqId ? What's the TimePartition meaning for?
>
> Thanks!
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>

--
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB