|
Bejoy KS
2011-09-07, 07:48
Sonal Goyal
2011-09-07, 07:52
Bejoy KS
2011-09-07, 08:00
Devaraj K
2011-09-07, 08:54
Harsh J
2011-09-07, 08:55
Sudharsan Sampath
2011-09-07, 09:39
Bejoy KS
2011-09-07, 11:21
Harsh J
2011-09-07, 11:39
Robert Hafner
2011-09-07, 16:33
GOEKE, MATTHEW
2011-09-07, 17:00
Bejoy KS
2011-09-08, 06:09
GOEKE, MATTHEW
2011-09-08, 14:40
|
-
No Mapper but ReducerBejoy KS 2011-09-07, 07:48
Hi
I'm having a query here. Is it possible to have no mappers but reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can set numReduceTasks to zero but such a setting on mapper wont work. So how can it be achieved if possible? Thank You Regards Bejoy.K.S +
Bejoy KS 2011-09-07, 07:48
-
Re: No Mapper but ReducerSonal Goyal 2011-09-07, 07:52
I dont think that is possible, can you explain in what scenario you want to
have no mappers, only reducers? Best Regards, Sonal Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > Hi > I'm having a query here. Is it possible to have no mappers but > reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can > set numReduceTasks to zero but such a setting on mapper wont work. So how > can it be achieved if possible? > > Thank You > > Regards > Bejoy.K.S > +
Sonal Goyal 2011-09-07, 07:52
-
Re: No Mapper but ReducerBejoy KS 2011-09-07, 08:00
Thanks Sonal. I was just thinking of some weird design and wanted to make
sure whether there is a possibility like that- no maps and all reducers. On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <[EMAIL PROTECTED]> wrote: > I dont think that is possible, can you explain in what scenario you want to > have no mappers, only reducers? > > Best Regards, > Sonal > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > > > On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > >> Hi >> I'm having a query here. Is it possible to have no mappers but >> reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can >> set numReduceTasks to zero but such a setting on mapper wont work. So how >> can it be achieved if possible? >> >> Thank You >> >> Regards >> Bejoy.K.S >> > > +
Bejoy KS 2011-09-07, 08:00
-
RE: No Mapper but ReducerDevaraj K 2011-09-07, 08:54
Hi Bejoy,
It is possible to execute a job with no mappers and reducers alone. You can try this by giving the empty directory as input for the job. Devaraj K _____ From: Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 07, 2011 1:30 PM To: [EMAIL PROTECTED] Subject: Re: No Mapper but Reducer Thanks Sonal. I was just thinking of some weird design and wanted to make sure whether there is a possibility like that- no maps and all reducers. On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <[EMAIL PROTECTED]> wrote: I dont think that is possible, can you explain in what scenario you want to have no mappers, only reducers? Best Regards, Sonal Crux: Reporting <https://github.com/sonalgoyal/crux> for HBase Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: Hi I'm having a query here. Is it possible to have no mappers but reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can set numReduceTasks to zero but such a setting on mapper wont work. So how can it be achieved if possible? Thank You Regards Bejoy.K.S +
Devaraj K 2011-09-07, 08:54
-
Re: No Mapper but ReducerHarsh J 2011-09-07, 08:55
Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-)
/me puts his troll-mask on. ➜ ~HADOOP_HOME hadoop fs -mkdir abc ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process : 0 11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001 11/09/07 14:24:15 INFO mapred.JobClient: map 0% reduce 0% 11/09/07 14:24:21 INFO mapred.JobClient: map 0% reduce 100% 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: job_201109071413_0001 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13 11/09/07 14:24:22 INFO mapred.JobClient: Job Counters 11/09/07 14:24:22 INFO mapred.JobClient: Launched reduce tasks=1 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2209 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3113 11/09/07 14:24:22 INFO mapred.JobClient: FileSystemCounters 11/09/07 14:24:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59220 11/09/07 14:24:22 INFO mapred.JobClient: Map-Reduce Framework 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input groups=0 11/09/07 14:24:22 INFO mapred.JobClient: Combine output records=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce shuffle bytes=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce output records=0 11/09/07 14:24:22 INFO mapred.JobClient: Spilled Records=0 11/09/07 14:24:22 INFO mapred.JobClient: Combine input records=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 /me takes off troll mask. On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > Thanks Sonal. I was just thinking of some weird design and wanted to make > sure whether there is a possibility like that- no maps and all reducers. > > On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <[EMAIL PROTECTED]> wrote: >> >> I dont think that is possible, can you explain in what scenario you want >> to have no mappers, only reducers? >> Best Regards, >> Sonal >> Crux: Reporting for HBase >> Nube Technologies >> >> >> >> >> >> >> >> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: >>> >>> Hi >>> I'm having a query here. Is it possible to have no mappers but >>> reducers alone? AFAIK If we need to avoid the tyriggering of reducers we can >>> set numReduceTasks to zero but such a setting on mapper wont work. So how >>> can it be achieved if possible? >>> >>> Thank You >>> >>> Regards >>> Bejoy.K.S >> > > -- Harsh J +
Harsh J 2011-09-07, 08:55
-
Re: No Mapper but ReducerSudharsan Sampath 2011-09-07, 09:39
This is true and it took as off by surprise in recent past. Also, it had
quite some impact on our job cycles where the size of input is totally random and could also be zero at times. In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers for a job which does not have any input. Y -> No of tasktrackers in the cluster H -> Time Interval for Heartbeat response With the cdh2 version, the job takes, ( X / Y) * H seconds to complete without doing any work since we assign only one reduce task per heartbeat If the number of such jobs in the cycle is more, then the total time that the cluster spends doing nothing accumulates. I was thinking of raising this as a jira but not sure. Should we raise and fix this as jira request? Num of reducers set by the client can be overriden if the number of mappers is 0? We have a way to hack, by verifying the existence of the input path to the Map phase ourselves but just thought would be more intuitive for the framework to handle itself -Sudhan S On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a > job ;-) > > /me puts his troll-mask on. > > ➜ ~HADOOP_HOME hadoop fs -mkdir abc > ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc > out > 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process > : 0 > 11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001 > 11/09/07 14:24:15 INFO mapred.JobClient: map 0% reduce 0% > 11/09/07 14:24:21 INFO mapred.JobClient: map 0% reduce 100% > 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: > job_201109071413_0001 > 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13 > 11/09/07 14:24:22 INFO mapred.JobClient: Job Counters > 11/09/07 14:24:22 INFO mapred.JobClient: Launched reduce tasks=1 > 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2209 > 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all > maps waiting after reserving slots (ms)=0 > 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3113 > 11/09/07 14:24:22 INFO mapred.JobClient: FileSystemCounters > 11/09/07 14:24:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59220 > 11/09/07 14:24:22 INFO mapred.JobClient: Map-Reduce Framework > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input groups=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Combine output records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce shuffle bytes=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce output records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Spilled Records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Combine input records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 > > /me takes off troll mask. > > On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > > Thanks Sonal. I was just thinking of some weird design and wanted to make > > sure whether there is a possibility like that- no maps and all reducers. > > > > On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <[EMAIL PROTECTED]> > wrote: > >> > >> I dont think that is possible, can you explain in what scenario you want > >> to have no mappers, only reducers? > >> Best Regards, > >> Sonal > >> Crux: Reporting for HBase > >> Nube Technologies > >> > >> > >> > >> > >> > >> > >> > >> On Wed, Sep 7, 2011 at 1:18 PM, Bejoy KS <[EMAIL PROTECTED]> > wrote: > >>> > >>> Hi > >>> I'm having a query here. Is it possible to have no mappers > but > >>> reducers alone? AFAIK If we need to avoid the tyriggering of reducers > we can > >>> set numReduceTasks to zero but such a setting on mapper wont work. So > how > >>> can it be achieved if possible? > >>> > >>> Thank You > >>> > >>> Regards > >>> Bejoy.K.S > >> > > > +
Sudharsan Sampath 2011-09-07, 09:39
-
Re: No Mapper but ReducerBejoy KS 2011-09-07, 11:21
Thank You All. Even I have noticed this strange behavior some time back.
Now my inital concern still remains. If I provide my input directory an empty one, yes the map tasks wont be executed .But my reducer needs input to do the processing/ aggregation. In such a scenario, is there an option to provide input just to the reducer? Regards Bejoy.K.S On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <[EMAIL PROTECTED]>wrote: > This is true and it took as off by surprise in recent past. Also, it had > quite some impact on our job cycles where the size of input is totally > random and could also be zero at times. > > In one of our cycles, we run a lot of jobs. Say we configure X as the num > of reducers for a job which does not have any input. > > Y -> No of tasktrackers in the cluster > > H -> Time Interval for Heartbeat response > > With the cdh2 version, the job takes, > > ( X / Y) * H seconds to complete without doing any work since we assign > only one reduce task per heartbeat > > > If the number of such jobs in the cycle is more, then the total time that > the cluster spends doing nothing accumulates. > > I was thinking of raising this as a jira but not sure. Should we raise and > fix this as jira request? Num of reducers set by the client can be overriden > if the number of mappers is 0? > > We have a way to hack, by verifying the existence of the input path to the > Map phase ourselves but just thought would be more intuitive for the > framework to handle itself > > -Sudhan S > > On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a >> job ;-) >> >> /me puts his troll-mask on. >> >> ➜ ~HADOOP_HOME hadoop fs -mkdir abc >> ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount >> abc out >> 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process >> : 0 >> 11/09/07 14:24:14 INFO mapred.JobClient: Running job: >> job_201109071413_0001 >> 11/09/07 14:24:15 INFO mapred.JobClient: map 0% reduce 0% >> 11/09/07 14:24:21 INFO mapred.JobClient: map 0% reduce 100% >> 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: >> job_201109071413_0001 >> 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13 >> 11/09/07 14:24:22 INFO mapred.JobClient: Job Counters >> 11/09/07 14:24:22 INFO mapred.JobClient: Launched reduce tasks=1 >> 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2209 >> 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all >> reduces waiting after reserving slots (ms)=0 >> 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all >> maps waiting after reserving slots (ms)=0 >> 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3113 >> 11/09/07 14:24:22 INFO mapred.JobClient: FileSystemCounters >> 11/09/07 14:24:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59220 >> 11/09/07 14:24:22 INFO mapred.JobClient: Map-Reduce Framework >> 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input groups=0 >> 11/09/07 14:24:22 INFO mapred.JobClient: Combine output records=0 >> 11/09/07 14:24:22 INFO mapred.JobClient: Reduce shuffle bytes=0 >> 11/09/07 14:24:22 INFO mapred.JobClient: Reduce output records=0 >> 11/09/07 14:24:22 INFO mapred.JobClient: Spilled Records=0 >> 11/09/07 14:24:22 INFO mapred.JobClient: Combine input records=0 >> 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 >> >> /me takes off troll mask. >> >> On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: >> > Thanks Sonal. I was just thinking of some weird design and wanted to >> make >> > sure whether there is a possibility like that- no maps and all reducers. >> > >> > On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <[EMAIL PROTECTED]> >> wrote: >> >> >> >> I dont think that is possible, can you explain in what scenario you >> want >> >> to have no mappers, only reducers? >> >> Best Regards, >> >> Sonal +
Bejoy KS 2011-09-07, 11:21
-
Re: No Mapper but ReducerHarsh J 2011-09-07, 11:39
Nope. A reducer's input is from the map outputs alone (fetched in by
the shuffling code), which would not exist here. What are you looking to do? Why won't a map task suffice for doing that? On Wed, Sep 7, 2011 at 4:51 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > Thank You All. Even I have noticed this strange behavior some time back. > Now my inital concern still remains. If I provide my input directory an > empty one, yes the map tasks wont be executed .But my reducer needs input > to do the processing/ aggregation. In such a scenario, is there an option to > provide input just to the reducer? > > Regards > Bejoy.K.S > > On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <[EMAIL PROTECTED]> > wrote: >> >> This is true and it took as off by surprise in recent past. Also, it had >> quite some impact on our job cycles where the size of input is totally >> random and could also be zero at times. >> In one of our cycles, we run a lot of jobs. Say we configure X as the num >> of reducers for a job which does not have any input. >> Y -> No of tasktrackers in the cluster >> H -> Time Interval for Heartbeat response >> With the cdh2 version, the job takes, >> ( X / Y) * H seconds to complete without doing any work since we assign >> only one reduce task per heartbeat >> >> If the number of such jobs in the cycle is more, then the total time that >> the cluster spends doing nothing accumulates. >> I was thinking of raising this as a jira but not sure. Should we raise and >> fix this as jira request? Num of reducers set by the client can be overriden >> if the number of mappers is 0? >> We have a way to hack, by verifying the existence of the input path to the >> Map phase ourselves but just thought would be more intuitive for the >> framework to handle itself >> -Sudhan S >> On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >>> Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a >>> job ;-) >>> >>> /me puts his troll-mask on. >>> >>> ➜ ~HADOOP_HOME hadoop fs -mkdir abc >>> ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount >>> abc out >>> 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to >>> process : 0 >>> 11/09/07 14:24:14 INFO mapred.JobClient: Running job: >>> job_201109071413_0001 >>> 11/09/07 14:24:15 INFO mapred.JobClient: map 0% reduce 0% >>> 11/09/07 14:24:21 INFO mapred.JobClient: map 0% reduce 100% >>> 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: >>> job_201109071413_0001 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Job Counters >>> 11/09/07 14:24:22 INFO mapred.JobClient: Launched reduce tasks=1 >>> 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2209 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all >>> reduces waiting after reserving slots (ms)=0 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all >>> maps waiting after reserving slots (ms)=0 >>> 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3113 >>> 11/09/07 14:24:22 INFO mapred.JobClient: FileSystemCounters >>> 11/09/07 14:24:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59220 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Map-Reduce Framework >>> 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input groups=0 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Combine output records=0 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Reduce shuffle bytes=0 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Reduce output records=0 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Spilled Records=0 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Combine input records=0 >>> 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 >>> >>> /me takes off troll mask. >>> >>> On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: >>> > Thanks Sonal. I was just thinking of some weird design and wanted to >> Harsh J +
Harsh J 2011-09-07, 11:39
-
Re: No Mapper but ReducerRobert Hafner 2011-09-07, 16:33
You could just have a mapper which sent off the exact values it took in (ie, output k1,v1 as k2,v2). I think that's the best you'll be able to do here. On Sep 7, 2011, at 4:21 AM, Bejoy KS <[EMAIL PROTECTED]> wrote: > Thank You All. Even I have noticed this strange behavior some time back. > Now my inital concern still remains. If I provide my input directory an empty one, yes the map tasks wont be executed .But my reducer needs input to do the processing/ aggregation. In such a scenario, is there an option to provide input just to the reducer? > > Regards > Bejoy.K.S > > On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <[EMAIL PROTECTED]> wrote: > This is true and it took as off by surprise in recent past. Also, it had quite some impact on our job cycles where the size of input is totally random and could also be zero at times. > > In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers for a job which does not have any input. > > Y -> No of tasktrackers in the cluster > > H -> Time Interval for Heartbeat response > > With the cdh2 version, the job takes, > > ( X / Y) * H seconds to complete without doing any work since we assign only one reduce task per heartbeat > > > If the number of such jobs in the cycle is more, then the total time that the cluster spends doing nothing accumulates. > > I was thinking of raising this as a jira but not sure. Should we raise and fix this as jira request? Num of reducers set by the client can be overriden if the number of mappers is 0? > > We have a way to hack, by verifying the existence of the input path to the Map phase ourselves but just thought would be more intuitive for the framework to handle itself > > -Sudhan S > > On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-) > > /me puts his troll-mask on. > > ➜ ~HADOOP_HOME hadoop fs -mkdir abc > ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out > 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process : 0 > 11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001 > 11/09/07 14:24:15 INFO mapred.JobClient: map 0% reduce 0% > 11/09/07 14:24:21 INFO mapred.JobClient: map 0% reduce 100% > 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: job_201109071413_0001 > 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13 > 11/09/07 14:24:22 INFO mapred.JobClient: Job Counters > 11/09/07 14:24:22 INFO mapred.JobClient: Launched reduce tasks=1 > 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2209 > 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all > maps waiting after reserving slots (ms)=0 > 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3113 > 11/09/07 14:24:22 INFO mapred.JobClient: FileSystemCounters > 11/09/07 14:24:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59220 > 11/09/07 14:24:22 INFO mapred.JobClient: Map-Reduce Framework > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input groups=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Combine output records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce shuffle bytes=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce output records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Spilled Records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Combine input records=0 > 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 > > /me takes off troll mask. > > On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > > Thanks Sonal. I was just thinking of some weird design and wanted to make > > sure whether there is a possibility like that- no maps and all reducers. > > > > On Wed, Sep 7, 2011 at 1:22 PM, Sonal Goyal <[EMAIL PROTECTED]> wrote: +
Robert Hafner 2011-09-07, 16:33
-
RE: No Mapper but ReducerGOEKE, MATTHEW 2011-09-07, 17:00
Bejoy,
What exactly is your use case? I know down below you said you were just thinking of a weird design but it would really help if we knew exactly what you were shooting for because we might be able to refactor it. I have a job that I developed that still required the input to be sorted for the reduce but I did not need to do any transformation or filtering in the map side so I just did an identity mapper, as Robert mentions below this, and it works perfectly. I do not think that there is any way to pass data directly into the S/S phase without going through the map phase (if that is what you were hinting at) and if you don’t require the data to go through S/S then you can make it a map only job. Matt From: Robert Hafner [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 07, 2011 11:34 AM To: [EMAIL PROTECTED] Subject: Re: No Mapper but Reducer You could just have a mapper which sent off the exact values it took in (ie, output k1,v1 as k2,v2). I think that's the best you'll be able to do here. On Sep 7, 2011, at 4:21 AM, Bejoy KS <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Thank You All. Even I have noticed this strange behavior some time back. Now my inital concern still remains. If I provide my input directory an empty one, yes the map tasks wont be executed .But my reducer needs input to do the processing/ aggregation. In such a scenario, is there an option to provide input just to the reducer? Regards Bejoy.K.S On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: This is true and it took as off by surprise in recent past. Also, it had quite some impact on our job cycles where the size of input is totally random and could also be zero at times. In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers for a job which does not have any input. Y -> No of tasktrackers in the cluster H -> Time Interval for Heartbeat response With the cdh2 version, the job takes, ( X / Y) * H seconds to complete without doing any work since we assign only one reduce task per heartbeat If the number of such jobs in the cycle is more, then the total time that the cluster spends doing nothing accumulates. I was thinking of raising this as a jira but not sure. Should we raise and fix this as jira request? Num of reducers set by the client can be overriden if the number of mappers is 0? We have a way to hack, by verifying the existence of the input path to the Map phase ourselves but just thought would be more intuitive for the framework to handle itself -Sudhan S On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-) /me puts his troll-mask on. ➜ ~HADOOP_HOME hadoop fs -mkdir abc ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process : 0 11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001 11/09/07 14:24:15 INFO mapred.JobClient: map 0% reduce 0% 11/09/07 14:24:21 INFO mapred.JobClient: map 0% reduce 100% 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: job_201109071413_0001 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13 11/09/07 14:24:22 INFO mapred.JobClient: Job Counters 11/09/07 14:24:22 INFO mapred.JobClient: Launched reduce tasks=1 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2209 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3113 11/09/07 14:24:22 INFO mapred.JobClient: FileSystemCounters 11/09/07 14:24:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59220 11/09/07 14:24:22 INFO mapred.JobClient: Map-Reduce Framework 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input groups=0 11/09/07 14:24:22 INFO mapred.JobClient: Combine output records=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce shuffle bytes=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce output records=0 11/09/07 14:24:22 INFO mapred.JobClient: Spilled Records=0 11/09/07 14:24:22 INFO mapred.JobClient: Combine input records=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 /me takes off troll mask. On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. +
GOEKE, MATTHEW 2011-09-07, 17:00
-
Re: No Mapper but ReducerBejoy KS 2011-09-08, 06:09
Exactly Matthew, The weird thought was in that direction. Basically i do
have a tilde separated input which has to undergo some aggregation operation. So I was just giving a shot to see if there is a possibility to run directly into Sort Shuffle phase directly and then the reducer without a mapper. I know I need to need at least depend on IdentityMapper. A small query on top of this. If we take a basic map reduce job, say word count without a combiner. What would the percentage distribution of execution time on map, reduce and the sort shuffle phase? On Wed, Sep 7, 2011 at 10:30 PM, GOEKE, MATTHEW (AG/1000) < [EMAIL PROTECTED]> wrote: > Bejoy,**** > > ** ** > > What exactly is your use case? I know down below you said you were just > thinking of a weird design but it would really help if we knew exactly what > you were shooting for because we might be able to refactor it.**** > > ** ** > > I have a job that I developed that still required the input to be sorted > for the reduce but I did not need to do any transformation or filtering in > the map side so I just did an identity mapper, as Robert mentions below > this, and it works perfectly. I do not think that there is any way to pass > data directly into the S/S phase without going through the map phase (if > that is what you were hinting at) and if you don’t require the data to go > through S/S then you can make it a map only job.**** > > ** ** > > Matt**** > > ** ** > > *From:* Robert Hafner [mailto:[EMAIL PROTECTED]] > *Sent:* Wednesday, September 07, 2011 11:34 AM > > *To:* [EMAIL PROTECTED] > *Subject:* Re: No Mapper but Reducer**** > > ** ** > > ** ** > > You could just have a mapper which sent off the exact values it took in > (ie, output k1,v1 as k2,v2). I think that's the best you'll be able to do > here. > > **** > > > On Sep 7, 2011, at 4:21 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:**** > > Thank You All. Even I have noticed this strange behavior some time back. > Now my inital concern still remains. If I provide my input directory an > empty one, yes the map tasks wont be executed .But my reducer needs input > to do the processing/ aggregation. In such a scenario, is there an option to > provide input just to the reducer? > > Regards > Bejoy.K.S**** > > On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <[EMAIL PROTECTED]> > wrote:**** > > This is true and it took as off by surprise in recent past. Also, it had > quite some impact on our job cycles where the size of input is totally > random and could also be zero at times. **** > > ** ** > > In one of our cycles, we run a lot of jobs. Say we configure X as the num > of reducers for a job which does not have any input.**** > > ** ** > > Y -> No of tasktrackers in the cluster**** > > ** ** > > H -> Time Interval for Heartbeat response**** > > ** ** > > With the cdh2 version, the job takes, **** > > ** ** > > ( X / Y) * H seconds to complete without doing any work since we assign > only one reduce task per heartbeat**** > > ** ** > > ** ** > > If the number of such jobs in the cycle is more, then the total time that > the cluster spends doing nothing accumulates.**** > > ** ** > > I was thinking of raising this as a jira but not sure. Should we raise and > fix this as jira request? Num of reducers set by the client can be overriden > if the number of mappers is 0?**** > > ** ** > > We have a way to hack, by verifying the existence of the input path to the > Map phase ourselves but just thought would be more intuitive for the > framework to handle itself**** > > ** ** > > -Sudhan S**** > > ** ** > > On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <[EMAIL PROTECTED]> wrote:**** > > Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a > job ;-) > > /me puts his troll-mask on. > > ➜ ~HADOOP_HOME hadoop fs -mkdir abc > ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount > abc out > 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process +
Bejoy KS 2011-09-08, 06:09
-
RE: No Mapper but ReducerGOEKE, MATTHEW 2011-09-08, 14:40
Your last question is not as straight forward and would be better answered by running it on your own cluster and looking at the job tracker history. Data skew and partitioning, map and reduce slots available, mapred.reduce.slowstart.completed.maps, and several other things have the potential to affect this distribution.
Matt From: Bejoy KS [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 08, 2011 1:10 AM To: [EMAIL PROTECTED] Subject: Re: No Mapper but Reducer Exactly Matthew, The weird thought was in that direction. Basically i do have a tilde separated input which has to undergo some aggregation operation. So I was just giving a shot to see if there is a possibility to run directly into Sort Shuffle phase directly and then the reducer without a mapper. I know I need to need at least depend on IdentityMapper. A small query on top of this. If we take a basic map reduce job, say word count without a combiner. What would the percentage distribution of execution time on map, reduce and the sort shuffle phase? On Wed, Sep 7, 2011 at 10:30 PM, GOEKE, MATTHEW (AG/1000) <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Bejoy, What exactly is your use case? I know down below you said you were just thinking of a weird design but it would really help if we knew exactly what you were shooting for because we might be able to refactor it. I have a job that I developed that still required the input to be sorted for the reduce but I did not need to do any transformation or filtering in the map side so I just did an identity mapper, as Robert mentions below this, and it works perfectly. I do not think that there is any way to pass data directly into the S/S phase without going through the map phase (if that is what you were hinting at) and if you don’t require the data to go through S/S then you can make it a map only job. Matt From: Robert Hafner [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Wednesday, September 07, 2011 11:34 AM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Re: No Mapper but Reducer You could just have a mapper which sent off the exact values it took in (ie, output k1,v1 as k2,v2). I think that's the best you'll be able to do here. On Sep 7, 2011, at 4:21 AM, Bejoy KS <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Thank You All. Even I have noticed this strange behavior some time back. Now my inital concern still remains. If I provide my input directory an empty one, yes the map tasks wont be executed .But my reducer needs input to do the processing/ aggregation. In such a scenario, is there an option to provide input just to the reducer? Regards Bejoy.K.S On Wed, Sep 7, 2011 at 3:09 PM, Sudharsan Sampath <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: This is true and it took as off by surprise in recent past. Also, it had quite some impact on our job cycles where the size of input is totally random and could also be zero at times. In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers for a job which does not have any input. Y -> No of tasktrackers in the cluster H -> Time Interval for Heartbeat response With the cdh2 version, the job takes, ( X / Y) * H seconds to complete without doing any work since we assign only one reduce task per heartbeat If the number of such jobs in the cycle is more, then the total time that the cluster spends doing nothing accumulates. I was thinking of raising this as a jira but not sure. Should we raise and fix this as jira request? Num of reducers set by the client can be overriden if the number of mappers is 0? We have a way to hack, by verifying the existence of the input path to the Map phase ourselves but just thought would be more intuitive for the framework to handle itself -Sudhan S On Wed, Sep 7, 2011 at 2:25 PM, Harsh J <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Oh boy are you in for a surprise. Reducers _can_ run with 0 mappers in a job ;-) /me puts his troll-mask on. ➜ ~HADOOP_HOME hadoop fs -mkdir abc ➜ ~HADOOP_HOME hadoop jar hadoop-examples-0.20.2-cdh3u1.jar wordcount abc out 11/09/07 14:24:14 INFO input.FileInputFormat: Total input paths to process : 0 11/09/07 14:24:14 INFO mapred.JobClient: Running job: job_201109071413_0001 11/09/07 14:24:15 INFO mapred.JobClient: map 0% reduce 0% 11/09/07 14:24:21 INFO mapred.JobClient: map 0% reduce 100% 11/09/07 14:24:22 INFO mapred.JobClient: Job complete: job_201109071413_0001 11/09/07 14:24:22 INFO mapred.JobClient: Counters: 13 11/09/07 14:24:22 INFO mapred.JobClient: Job Counters 11/09/07 14:24:22 INFO mapred.JobClient: Launched reduce tasks=1 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=2209 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/09/07 14:24:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/09/07 14:24:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3113 11/09/07 14:24:22 INFO mapred.JobClient: FileSystemCounters 11/09/07 14:24:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59220 11/09/07 14:24:22 INFO mapred.JobClient: Map-Reduce Framework 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input groups=0 11/09/07 14:24:22 INFO mapred.JobClient: Combine output records=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce shuffle bytes=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce output records=0 11/09/07 14:24:22 INFO mapred.JobClient: Spilled Records=0 11/09/07 14:24:22 INFO mapred.JobClient: Combine input records=0 11/09/07 14:24:22 INFO mapred.JobClient: Reduce input records=0 /me takes off troll mask. On Wed, Sep 7, 2011 at 1:30 PM, Bejoy KS <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Harsh J This e-mail message may contain privileged and/or confidential information, an +
GOEKE, MATTHEW 2011-09-08, 14:40
|