Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Inconsistent row count between mapreduce and shell count


Copy link to this message
-
Re: Inconsistent row count between mapreduce and shell count
Kiran:
Take a look at src/main/ruby/shell/commands/move.rb

You would see help on how to move region.

Cheers

On Sat, Feb 9, 2013 at 9:46 PM, kiran chitturi <[EMAIL PROTECTED]>wrote:

> Many Thanks Lars for your suggestions! I have added them to the command
>
> /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3"
> -Dhbase.client.scanner.caching=1000
> -Dmapred.map.tasks.speculative.execution=false documents
>
> I have stopped the datasources which write data in to the table but it did
> not work. There is not much difference in the rowCount mapreduce is
> showing.
>
> Though, the rowcount returned is presistent once i stopped writing data in
> to the table. ( I ran the command 3 times). The shell count is also same
> once i stopped writing.
>
> Since most of the rows are tweets, around 1.4 million rows are stored on a
> single data node.  (region server)
>
> Do you know of any way that i can reassign the regions in the table without
> losing the data ? Will it make a difference then ?
>
> Thank you,
> Kiran.
>
>
>
>
> On Sat, Feb 9, 2013 at 11:38 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > That looks all as it should.
> > Unless you somehow pointed the M/R job to another cluster I have no good
> > explanation.
> >
> >
> > Would be interesting to see whether in the absence of writes you'd always
> > get precisely the same numbers.
> > (Look like it might be the case, your 2nd run is not wildly different
> from
> > the first).
> >
> >
> > This is a bit disconcerting. Is there anything "interesting" in the logs?
> >
> >
> > Aside: For performance reasons you'd probably want to enable scanner
> > caching for the M/R: -Dhbase.client.scanner.caching=100 (or 1000)
> >
> > And also turn off speculative execution (we should do that by default):
> > -Dmapred.map.tasks.speculative.execution=false
> >
> > It might be the speculative execution that throws the job off, I am just
> > guessing now.
> >
> >
> > -- Lars
> >
> > ________________________________
> > From: kiran chitturi <[EMAIL PROTECTED]>
> > To: user <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
> > Sent: Saturday, February 9, 2013 6:51 PM
> > Subject: Re: Inconsistent row count between mapreduce and shell count
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> > Hmm... Can you show us the exact commands you executed?
> > >
> > >
> > I am writing below the exact commands that i have used.
> >
> > In the hbase shell, for the table documents i have used
> >    count 'documents'
> >
> > The mapreduce command is
> >     /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> > rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents
> >
> >
> > And just to rule out the obvious:
> > >1. There were no writes while you did the row count?
> > >
> >            Actually, we have a few automated programs which write tweets
> > to the table over time. So there might be writes when the row count is
> > there
> >            Should i disable writes when doing the mapreduce ?
> >
> > 2. In the RowCount M/R case you specified neither a range nor any
> columns?
> > >
> > >
> >     No
> >
> > >Do you always get the exact same numbers in both cases? Or do they vary?
> > >
> >    I just did another map reduce and this time the number is 1394234. The
> > actual count from shell is 2157447
> >
> > Thanks!
> >
> >
> > >
> > >----- Original Message -----
> > >From: kiran chitturi <[EMAIL PROTECTED]>
> > >To: user <[EMAIL PROTECTED]>
> > >Cc:
> > >Sent: Saturday, February 9, 2013 4:49 PM
> > >Subject: Re: Inconsistent row count between mapreduce and shell count
> > >
> > >Yes. I just counted the number of regions in '
> > >http://machine1:60010/table.jsp?name=documents'; and the count is 53
> > which
> > >is equal to the number of complete tasks in hadoop.
> > >
> > >
> > >Thanks,
> > >Kiran.