Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Inconsistent row count between mapreduce and shell count


+
kiran chitturi 2013-02-10, 00:14
+
Ted Yu 2013-02-10, 00:43
+
kiran chitturi 2013-02-10, 00:49
+
lars hofhansl 2013-02-10, 02:17
+
kiran chitturi 2013-02-10, 02:51
+
lars hofhansl 2013-02-10, 04:38
+
kiran chitturi 2013-02-10, 05:46
Copy link to this message
-
Re: Inconsistent row count between mapreduce and shell count
Kiran:
Take a look at src/main/ruby/shell/commands/move.rb

You would see help on how to move region.

Cheers

On Sat, Feb 9, 2013 at 9:46 PM, kiran chitturi <[EMAIL PROTECTED]>wrote:

> Many Thanks Lars for your suggestions! I have added them to the command
>
> /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3"
> -Dhbase.client.scanner.caching=1000
> -Dmapred.map.tasks.speculative.execution=false documents
>
> I have stopped the datasources which write data in to the table but it did
> not work. There is not much difference in the rowCount mapreduce is
> showing.
>
> Though, the rowcount returned is presistent once i stopped writing data in
> to the table. ( I ran the command 3 times). The shell count is also same
> once i stopped writing.
>
> Since most of the rows are tweets, around 1.4 million rows are stored on a
> single data node.  (region server)
>
> Do you know of any way that i can reassign the regions in the table without
> losing the data ? Will it make a difference then ?
>
> Thank you,
> Kiran.
>
>
>
>
> On Sat, Feb 9, 2013 at 11:38 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > That looks all as it should.
> > Unless you somehow pointed the M/R job to another cluster I have no good
> > explanation.
> >
> >
> > Would be interesting to see whether in the absence of writes you'd always
> > get precisely the same numbers.
> > (Look like it might be the case, your 2nd run is not wildly different
> from
> > the first).
> >
> >
> > This is a bit disconcerting. Is there anything "interesting" in the logs?
> >
> >
> > Aside: For performance reasons you'd probably want to enable scanner
> > caching for the M/R: -Dhbase.client.scanner.caching=100 (or 1000)
> >
> > And also turn off speculative execution (we should do that by default):
> > -Dmapred.map.tasks.speculative.execution=false
> >
> > It might be the speculative execution that throws the job off, I am just
> > guessing now.
> >
> >
> > -- Lars
> >
> > ________________________________
> > From: kiran chitturi <[EMAIL PROTECTED]>
> > To: user <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
> > Sent: Saturday, February 9, 2013 6:51 PM
> > Subject: Re: Inconsistent row count between mapreduce and shell count
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> > Hmm... Can you show us the exact commands you executed?
> > >
> > >
> > I am writing below the exact commands that i have used.
> >
> > In the hbase shell, for the table documents i have used
> >    count 'documents'
> >
> > The mapreduce command is
> >     /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> > rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents
> >
> >
> > And just to rule out the obvious:
> > >1. There were no writes while you did the row count?
> > >
> >            Actually, we have a few automated programs which write tweets
> > to the table over time. So there might be writes when the row count is
> > there
> >            Should i disable writes when doing the mapreduce ?
> >
> > 2. In the RowCount M/R case you specified neither a range nor any
> columns?
> > >
> > >
> >     No
> >
> > >Do you always get the exact same numbers in both cases? Or do they vary?
> > >
> >    I just did another map reduce and this time the number is 1394234. The
> > actual count from shell is 2157447
> >
> > Thanks!
> >
> >
> > >
> > >----- Original Message -----
> > >From: kiran chitturi <[EMAIL PROTECTED]>
> > >To: user <[EMAIL PROTECTED]>
> > >Cc:
> > >Sent: Saturday, February 9, 2013 4:49 PM
> > >Subject: Re: Inconsistent row count between mapreduce and shell count
> > >
> > >Yes. I just counted the number of regions in '
> > >http://machine1:60010/table.jsp?name=documents'; and the count is 53
> > which
> > >is equal to the number of complete tasks in hadoop.
> > >
> > >
> > >Thanks,
> > >Kiran.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB