Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Inconsistent row count between mapreduce and shell count


+
kiran chitturi 2013-02-10, 00:14
+
Ted Yu 2013-02-10, 00:43
+
kiran chitturi 2013-02-10, 00:49
Copy link to this message
-
Re: Inconsistent row count between mapreduce and shell count
Hmm... Can you show us the exact commands you executed?

And just to rule out the obvious:
1. There were no writes while you did the row count?
2. In the RowCount M/R case you specified neither a range nor any columns?
Do you always get the exact same numbers in both cases? Or do they vary?

Thanks.

-- Lars
----- Original Message -----
From: kiran chitturi <[EMAIL PROTECTED]>
To: user <[EMAIL PROTECTED]>
Cc:
Sent: Saturday, February 9, 2013 4:49 PM
Subject: Re: Inconsistent row count between mapreduce and shell count

Yes. I just counted the number of regions in '
http://machine1:60010/table.jsp?name=documents' and the count is 53 which
is equal to the number of complete tasks in hadoop.
Thanks,
Kiran.
On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Apart from the 5 killed tasks, was the number of successful tasks equal to
> the number of regions in your table ?
>
> Thanks
>
> On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <[EMAIL PROTECTED]
> >wrote:
>
> > Hi!
> >
> > I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.
> >
> > When i execute hbase count over a table in a shell, i got the count of
> > 2152416 rows.
> >
> > When i did the same thing using the rowcounter mapreduce, i got the value
> > as below
> >
> > org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
> >
> > Same thing happened when i used pig to count or do operations. There is
> > inconsistency between both the results.
> >
> > During the mapreduce, i have noticed that there are 5 tasks that are
> > killed. When i tried to trace back to the tasktracker logs of the node it
> > shows similar to below log.
> >
> > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker: JVM
> > with ID: jvm_201302090035_0015_m_1905604998 given task:
> > attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
> About
> > to purge task: attempt_201302090035_0015_m_000012_1
> > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
> Killing
> > process group9745 with signal TERM. Exit code 0
> >
> > I have also tried to run the tool 'hbck' but it shows no inconsistencies.
> >
> > Can you please suggest me why there is inconsistency and how can i
> correct
> > it ?
> >
> > Thanks,
> > --
> > Kiran Chitturi
> >
>

--
Kiran Chitturi
+
kiran chitturi 2013-02-10, 02:51
+
lars hofhansl 2013-02-10, 04:38
+
kiran chitturi 2013-02-10, 05:46
+
Ted Yu 2013-02-10, 07:05
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB