Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Are there any explanations of the implementation of illustrate?


Copy link to this message
-
Re: Are there any explanations of the implementation of illustrate?
Earlier implementation of illustrate used the pig local mode execution
engine (which corresponds to the time when paper was published) .

As part of illustrate reword in PIG-1712, Yan replaced the default Map
and Reduce context objects with a IllustratorContext. Look for
IllustratorContext and LocalMapReduceSimulator in
https://issues.apache.org/jira/secure/attachment/12459267/illustrator_2.patch
The context objects write their output and read input from memory.

We can consider using this for pig local mode as well, by replacing the
in memory list with something that can spill to disk.

-Thejas
On 7/3/12 6:34 PM, Jonathan Coveney wrote:
> Jie, that's perfect, thanks. This doc, specifically:
> http://i.stanford.edu/~olston/publications/sigmod09.pdf is exactly the
> detailed explanation I was looking for.
>
> 2012/7/3 Jie Li <[EMAIL PROTECTED]>
>
>> Some document here: http://wiki.apache.org/pig/PigIllustrate
>>
>> I agree that more tests are needed for illustrate, otherwise it can be
>> easily broken without notice.
>>
>> Jie
>>
>> On Tue, Jul 3, 2012 at 12:45 PM, Jon Coveney <[EMAIL PROTECTED]> wrote:
>>> I was curious at a level slightly higher than "dig through the code" how
>> illustrate is so fast, and how it deals with joins effectively. Are there
>> any resources on this (or does anyone at Hortonworks want to write a tech
>> oriented blog post? :)
>>>
>>
>