-RE: Are there any explanations of the implementation of illustrate?
We've done a bunch of updates to illustrate on a private 0.9 branch that we're looking to merge in as soon as we can, including making it much more testable and making far more of the pig operators work again.
One thing to note on the algorithm is that the upstream 'data synthesis' process doesn't work on scripts that include UDFs, as it cannot invert data through the UDF. This means that for many scripts, joins (and very selective filters) are still troublesome.
From: Jonathan Coveney [[EMAIL PROTECTED]]
Sent: Tuesday, July 03, 2012 9:34 PM
To: [EMAIL PROTECTED]
Subject: Re: Are there any explanations of the implementation of illustrate?
Jie, that's perfect, thanks. This doc, specifically:
http://i.stanford.edu/~olston/publications/sigmod09.pdf is exactly the
detailed explanation I was looking for.
2012/7/3 Jie Li <[EMAIL PROTECTED]>
> Some document here: http://wiki.apache.org/pig/PigIllustrate
> I agree that more tests are needed for illustrate, otherwise it can be
> easily broken without notice.
> On Tue, Jul 3, 2012 at 12:45 PM, Jon Coveney <[EMAIL PROTECTED]> wrote:
> > I was curious at a level slightly higher than "dig through the code" how
> illustrate is so fast, and how it deals with joins effectively. Are there
> any resources on this (or does anyone at Hortonworks want to write a tech
> oriented blog post? :)