2 tasks at the same time, for a total of 25 tasks at the end.
Maybe as you are saying, I'm not facing the good jobtracker? I'm
running the command line on the master server.
If I look at the map tasks, I can see that:
Input Split Locations /default-rack/node1
With differents values depending on the tasks, but on the same page I
can see machine=/default-rack/node3 (which is my master).
How/where should I run this? Should I point it to Zookeeper instance instead?
2012/10/11 Jean-Daniel Cryans <[EMAIL PROTECTED]>:
> 2 tasks total or that are running at the same time? If latter, it just
> means that you are using the local job tracker instead of your job
> tracker because HBase couldn't find your MR config.
> On Thu, Oct 11, 2012 at 1:36 PM, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]> wrote:
>> Hi J-D,
>> I have about 20M rows over 25 regions on 6 nodes. So that mean I
>> should see something like 6 tasks or even 25, right? And not just 2?
>> Keys are 128 byte long. Value is 1 byte.
>> I tried also to update mapreduce.tasktracker.map.tasks.maximum but
>> this is "the number of map tasks that should be launched on each node,
>> not the number of nodes to be used for each map task.", so there was
>> no changes, as expected.
>> 2012/10/11 Jean-Daniel Cryans <[EMAIL PROTECTED]>:
>>> On Thu, Oct 11, 2012 at 1:20 PM, Jean-Marc Spaggiari
>>> <[EMAIL PROTECTED]> wrote:
>>>> I'm now using thsi command line and it's working fine (except for the
>>>> number of tasks).
>>>> classpath`:`/home/hadoop/hadoop-1.0.3/bin/hadoop classpath`
>>>> /home/hadoop/hadoop-1.0.3/bin/hadoop jar
>>>> /home/hbase/hbase-0.94.0/hbase-0.94.1.jar rowcounter
>>>> -Dhbase.client.scanner.caching=100 -Dmapred.map.tasks=6
>>>> -Dmapred.map.tasks.speculative.execution=false work_proposed
>>>> I simply don't know if the -D parameters are taken into consideration
>>>> since I get the same results (numbers of tasks, time of exec, etc.)
>>>> with and without them.
>>> Using a higher caching value won't do much good if you don't have a
>>> lot of rows. Since you didn't include any data like that in your
>>> email, I won't guess how much 100 would help your case.
>>> The number of map tasks when mapping an HBase table will be the number
>>> of regions you have in that table. Unfortunately you can't change it
>>> unless you write your own input format for HBase.