Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: number of map and reduce task does not change in M/R program


Copy link to this message
-
Re: number of map and reduce task does not change in M/R program
Anseh,

Let's assume that your job is fully scalable, then it should take: 100 000
000 / 600 000 times the amount of time of the first job, which is 1000 / 6
= 167 times longer. This is an ideal, probably it will be something like
200 times. Also try using units in your questions + scientific notation
10^8 records or 10^8 bytes?

Regards, irW
2013/10/20 Anseh Danesh <[EMAIL PROTECTED]>

> OK... thanks a lot for the link... it is so useful... ;)
>
>
> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <[EMAIL PROTECTED]> wrote:
>
>> Try profiling the job (
>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>> And yeah the machine specs could be the reason, that's why hadoop was
>> invented in the first place ;)
>>
>>
>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <[EMAIL PROTECTED]>wrote:
>>
>>> I try it in a small set of data, in about 600000 data and it does not
>>> take too long. the execution time was reasonable. but in the set of
>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>> in my machine, I think this amount of data is very huge for my processor
>>> and this way it takes too long to process... what do you think about this?
>>>
>>>
>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <[EMAIL PROTECTED]> wrote:
>>>
>>>> Try running the job locally on a small set of the data and see if it
>>>> takes too long. If so, you map code might have some performance issues
>>>>
>>>>
>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>>> is that my program takes too long to process, but I think mapreduce is good
>>>>> and fast for large volume of data. so I think maybe I have problems in
>>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>> me I really get confused...
>>>>>
>>>>
>>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB