Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Is it possible to share a key across maps?


Copy link to this message
-
Re: Is it possible to share a key across maps?
Actually you can treat the mapper task as a template design pattern, here's
the persuade code:

Mapper.configure(JobConf)
for each record in InputSplit:
      do Mapper.map(key,value,outputkey,outputvalue)
Mapper.close()

Any sub class of mapper can override the three method: configure(),
map(),close() to do customization.

2010/1/8 Gang Luo <[EMAIL PROTECTED]>

> I don't do that in map method, but in configure( JobConf ) method which
> runs ahead of any map method call in that map task.
> JobConf.get("map.input.file") can tell you which file this map task is
> processing. Use this path to read first line of corresponding file. All
> these are done in configure method, that means, before any map method is
> called.
>
>
> -Gang
>
>
>
> ----- 原始邮件 ----
> 发件人: Raymond Jennings III <[EMAIL PROTECTED]>
> 收件人: [EMAIL PROTECTED]
> 发送日期: 2010/1/8 (周��) 7:54:30 下午
> 主   题: Re: Is it possible to share a key across maps?
>
> Hi, you do this in the map method (open the file and read the first line?)
>  Could you explain a little more how you do it with configure(), thank you.
>
> --- On Fri, 1/8/10, Gang Luo <[EMAIL PROTECTED]> wrote:
>
> > From: Gang Luo <[EMAIL PROTECTED]>
> > Subject: Re: Is it possible to share a key across maps?
> > To: [EMAIL PROTECTED]
> > Date: Friday, January 8, 2010, 4:46 PM
> > I will do that like this: at each map
> > task, I get the input file to
> > this mapper in the configure(), and manually read the first
> > line of
> > that file to get the user ID. Then start running the map
> > function.
> >
> >
> > -Gang
> >
> >
> > ----- 原始邮件 ----
> > 发件人: Raymond Jennings III <[EMAIL PROTECTED]>
> > 收件人: [EMAIL PROTECTED]
> > 发送日期: 2010/1/8 (周��) 4:23:15 下午
> > 主   题: Is it possible to share a key
> > across maps?
> >
> > I have large files where the userid is the first line of
> > each file.  I want to use that value as the output of
> > the map phase for each subsequent line of the file.  If
> > each map task gets a chunk of this file only one map task
> > will read the key value from the first line.  Is there
> > anyway I can force the other map tasks to wait until this
> > key is read and then somehow pass this value to other map
> > tasks?  Or is my reasoning incorrect?  Thanks.
> >
> >
> >
> > ___________________________________________________________
> >
> >   好玩贺卡等你发��邮箱贺卡全新上线!
> >
> > http://card.mail.cn.yahoo.com/
> >
>
>
>       ___________________________________________________________
>   好玩贺卡等你发��邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>

--
Best Regards

Jeff Zhang