|
|
+
kiran chitturi 2013-02-10, 00:14
+
Ted Yu 2013-02-10, 00:43
+
kiran chitturi 2013-02-10, 00:49
+
lars hofhansl 2013-02-10, 02:17
+
kiran chitturi 2013-02-10, 02:51
+
lars hofhansl 2013-02-10, 04:38
+
kiran chitturi 2013-02-10, 05:46
-
Re: Inconsistent row count between mapreduce and shell countTed Yu 2013-02-10, 07:05
Kiran:
Take a look at src/main/ruby/shell/commands/move.rb You would see help on how to move region. Cheers On Sat, Feb 9, 2013 at 9:46 PM, kiran chitturi <[EMAIL PROTECTED]>wrote: > Many Thanks Lars for your suggestions! I have added them to the command > > /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar > rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" > -Dhbase.client.scanner.caching=1000 > -Dmapred.map.tasks.speculative.execution=false documents > > I have stopped the datasources which write data in to the table but it did > not work. There is not much difference in the rowCount mapreduce is > showing. > > Though, the rowcount returned is presistent once i stopped writing data in > to the table. ( I ran the command 3 times). The shell count is also same > once i stopped writing. > > Since most of the rows are tweets, around 1.4 million rows are stored on a > single data node. (region server) > > Do you know of any way that i can reassign the regions in the table without > losing the data ? Will it make a difference then ? > > Thank you, > Kiran. > > > > > On Sat, Feb 9, 2013 at 11:38 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > > > That looks all as it should. > > Unless you somehow pointed the M/R job to another cluster I have no good > > explanation. > > > > > > Would be interesting to see whether in the absence of writes you'd always > > get precisely the same numbers. > > (Look like it might be the case, your 2nd run is not wildly different > from > > the first). > > > > > > This is a bit disconcerting. Is there anything "interesting" in the logs? > > > > > > Aside: For performance reasons you'd probably want to enable scanner > > caching for the M/R: -Dhbase.client.scanner.caching=100 (or 1000) > > > > And also turn off speculative execution (we should do that by default): > > -Dmapred.map.tasks.speculative.execution=false > > > > It might be the speculative execution that throws the job off, I am just > > guessing now. > > > > > > -- Lars > > > > ________________________________ > > From: kiran chitturi <[EMAIL PROTECTED]> > > To: user <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> > > Sent: Saturday, February 9, 2013 6:51 PM > > Subject: Re: Inconsistent row count between mapreduce and shell count > > > > > > > > > > > > > > > > On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > > > > Hmm... Can you show us the exact commands you executed? > > > > > > > > I am writing below the exact commands that i have used. > > > > In the hbase shell, for the table documents i have used > > count 'documents' > > > > The mapreduce command is > > /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar > > rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents > > > > > > And just to rule out the obvious: > > >1. There were no writes while you did the row count? > > > > > Actually, we have a few automated programs which write tweets > > to the table over time. So there might be writes when the row count is > > there > > Should i disable writes when doing the mapreduce ? > > > > 2. In the RowCount M/R case you specified neither a range nor any > columns? > > > > > > > > No > > > > >Do you always get the exact same numbers in both cases? Or do they vary? > > > > > I just did another map reduce and this time the number is 1394234. The > > actual count from shell is 2157447 > > > > Thanks! > > > > > > > > > >----- Original Message ----- > > >From: kiran chitturi <[EMAIL PROTECTED]> > > >To: user <[EMAIL PROTECTED]> > > >Cc: > > >Sent: Saturday, February 9, 2013 4:49 PM > > >Subject: Re: Inconsistent row count between mapreduce and shell count > > > > > >Yes. I just counted the number of regions in ' > > >http://machine1:60010/table.jsp?name=documents'; and the count is 53 > > which > > >is equal to the number of complete tasks in hadoop. > > > > > > > > >Thanks, > > >Kiran. |