Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Chukwa, mail # user - Missing logs in hbase because of same timestamp


+
Abhijit Dhar 2012-01-21, 04:20
Copy link to this message
-
Re: Missing logs in hbase because of same timestamp
Eric Yang 2012-01-21, 04:31
You might want to extend TsProcessor and put in chunk sequence id into
the primary key to ensure that you get ordered entries in HBase.  Hope
this works for your use case.

regards,
Eric

On Fri, Jan 20, 2012 at 8:20 PM, Abhijit Dhar <[EMAIL PROTECTED]> wrote:
> I noticed that TsProcessor is using the timestamp as the key for putting logs
> into hbase. But, my logs are coming in so fast that they have same timestamp
> like this:
>
> 2012-01-20 20:03:14,041 [INFO] [communication thread]
> [org.apache.hadoop.mapred.LocalJobRunner.statusUpdate()] 10 threads, 28
> requests, 0 errors, 0 forbidden, 0.6 pages/s, 80 kb/s,
> 2012-01-20 20:03:14,852 [INFO] [Thread-274]
> [jcrawler.fetch.mapreduce.FetchMapper.doWork()] -activeThreads=10,
> spinWaiting=7, fetchQueues.totalSize=649
> 2012-01-20 20:03:14,852 [INFO] [Thread-274]
> [jcrawler.fetch.mapreduce.FetchMapper.feedQueueManager()] feeding 649 input
> urls ...
> 2012-01-20 20:03:14,852 [INFO] [Thread-274]
> [jcrawler.fetch.mapreduce.FetchMapper.logHeapUsage()] Fetcher feeding queue
> manager. Heap usage: 327668152 out of 932118528 bytes.
>
> I think because of this, they are getting reduced and takes only one log for
> a given timestamp.
> Any idea how to fix this?
>
> Thanks,
>
> --
> View this message in context: http://apache-chukwa.679492.n3.nabble.com/Missing-logs-in-hbase-because-of-same-timestamp-tp3677271p3677271.html
> Sent from the Chukwa - Users mailing list archive at Nabble.com.
+
Abhijit Dhar 2012-01-26, 03:16
+
Abhijit Dhar 2012-01-26, 03:50
+
Eric Yang 2012-01-26, 05:39