-Re: the last job in the mapreduce plan
Ashutosh Chauhan 2010-06-16, 16:30
Without knowing much about your operator, I will suggest couple of
things. In general, you should avoid designing operators which
requires you to take explicit actions on pipeline (which in your case
requires pipeline to be closed immediately). Currently there are
operators in Pig which actually does that but we should carefully
consider adding more as there interactions in pipeline leads to
If thats not feasible, then one option for you could be to write a
visitor which will traverse the generated MR plan. If it finds your
operator in the pipeline, it will look if there is any more MR
operator following the current one and if its safe to remove that or
readjust the pipeline.
This may be more complicated then what it needs to be, if you can shed
more light on your operator I may be able to suggest a better
Hope it helps,
On Wed, Jun 16, 2010 at 07:20, Gang Luo <[EMAIL PROTECTED]> wrote:
> Thanks for replying. Actually, I didn't observe such thing happen in pig now. But one of the operators I implement in Pig requires to end the current MR operator afterwards. That issue may happen in my case.
> ----- 原始邮件 ----
> 发件人： Ashutosh Chauhan <[EMAIL PROTECTED]>
> 收件人： [EMAIL PROTECTED]
> 发送日期： 2010/6/15 (周二) 1:24:46 下�
�> 主 题： Re: the last job in the mapreduce plan
> What you are saying can never happen because we create a new MR
> operator only when we have a blocking operator which needs to go in
> the next MR operator. We dont create new MR operator apriori without
> looking at next physical operator in the pipeline. If you are seeing
> this happening, I would consider that as a bug.
> On Tue, Jun 15, 2010 at 09:26, Alan Gates <[EMAIL PROTECTED]> wrote:
>> I've never seen a case where this happens. �s this a theoretical question
>> or are you seeing this issue?
>> On Jun 15, 2010, at 8:49 AM, Gang Luo wrote:
>>> Is it possible the last MapReduce job in the MR plan only loads something
>>> and stores it without any other processing in between? For example, when
>>> visiting some physical operator, we need to end the current MR operator
>>> after embedding the physical operator into MR operator, and create a new MR
>>> operator for later physical operators. Unfortunately, the following physical
>>> operator is a store, the end of the entire query. In this case, the last MR
>>> operator only contain load and store without any meaningful work in between.
>>> This idle MapReduce job will degrade the performance. Will this happen in