Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: number of map and reduce task does not change in M/R program


Copy link to this message
-
Re: number of map and reduce task does not change in M/R program
Dieter De Witte 2013-10-21, 07:09
Anseh,

Let's assume that your job is fully scalable, then it should take: 100 000
000 / 600 000 times the amount of time of the first job, which is 1000 / 6
= 167 times longer. This is an ideal, probably it will be something like
200 times. Also try using units in your questions + scientific notation
10^8 records or 10^8 bytes?

Regards, irW
2013/10/20 Anseh Danesh <[EMAIL PROTECTED]>

> OK... thanks a lot for the link... it is so useful... ;)
>
>
> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <[EMAIL PROTECTED]> wrote:
>
>> Try profiling the job (
>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>> And yeah the machine specs could be the reason, that's why hadoop was
>> invented in the first place ;)
>>
>>
>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <[EMAIL PROTECTED]>wrote:
>>
>>> I try it in a small set of data, in about 600000 data and it does not
>>> take too long. the execution time was reasonable. but in the set of
>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>> in my machine, I think this amount of data is very huge for my processor
>>> and this way it takes too long to process... what do you think about this?
>>>
>>>
>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <[EMAIL PROTECTED]> wrote:
>>>
>>>> Try running the job locally on a small set of the data and see if it
>>>> takes too long. If so, you map code might have some performance issues
>>>>
>>>>
>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>>> is that my program takes too long to process, but I think mapreduce is good
>>>>> and fast for large volume of data. so I think maybe I have problems in
>>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>> me I really get confused...
>>>>>
>>>>
>>>>
>>>
>>
>