Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Are there any explanations of the implementation of illustrate?


+
Jon Coveney 2012-07-03, 19:45
+
Jie Li 2012-07-03, 20:50
+
Jonathan Coveney 2012-07-04, 01:34
+
Thejas Nair 2012-07-06, 01:10
+
Jonathan Coveney 2012-07-07, 01:28
Copy link to this message
-
RE: Are there any explanations of the implementation of illustrate?
Hey Jonathan,

We've done a bunch of updates to illustrate on a private 0.9 branch that we're looking to merge in as soon as we can, including making it much more testable and making far more of the pig operators work again.  

One thing to note on the algorithm is that the upstream 'data synthesis' process doesn't work on scripts that include UDFs, as it cannot invert data through the UDF.  This means that for many scripts, joins (and very selective filters) are still troublesome.

-Doug
________________________________________
From: Jonathan Coveney [[EMAIL PROTECTED]]
Sent: Tuesday, July 03, 2012 9:34 PM
To: [EMAIL PROTECTED]
Subject: Re: Are there any explanations of the implementation of illustrate?

Jie, that's perfect, thanks. This doc, specifically:
http://i.stanford.edu/~olston/publications/sigmod09.pdf is exactly the
detailed explanation I was looking for.

2012/7/3 Jie Li <[EMAIL PROTECTED]>

> Some document here: http://wiki.apache.org/pig/PigIllustrate
>
> I agree that more tests are needed for illustrate, otherwise it can be
> easily broken without notice.
>
> Jie
>
> On Tue, Jul 3, 2012 at 12:45 PM, Jon Coveney <[EMAIL PROTECTED]> wrote:
> > I was curious at a level slightly higher than "dig through the code" how
> illustrate is so fast, and how it deals with joins effectively. Are there
> any resources on this (or does anyone at Hortonworks want to write a tech
> oriented blog post? :)
> >
>