-Re: [GSoC 2012] Self Introduction and interested projects
Russell Jurney 2012-03-26, 01:25
I suggest you create a simple, minimal web application that visualizes a
pig script file each time a url with the script filename is loaded.
For instance, the process to use the tool might go like this:
1) Run pigvisualizer.(pl/py/rb) locally, at the start of your pig work
2) Create a new pig script at /my/dif/filename.pig
3) Open http://localhost:4567/pigviz/my/dir/filename.pig in a web browser
5) Reload this web page each time you want to see a new visualization OR
have to page try to reload the file periodically
There are several sources of data:
1) Start a pig session, via grunt,PigServer or HCatalog, and use
ILLUSTRATE/EXPLAIN. An old example of doing this is available at
2) Use the explain or -dot commands from pig command line. In looking at
the dot output, the graph is not as helpful as I had thought :(
3) Use the PigPen code to get ILLUSTRATE data for visualization
The ideal situation is that you get the data plan via EXPLAIN, and sample
data via ILLUSTRATE, and combine them to produce an even better version of
figure 2 in the paper
[image: Inline image 1]
As to the presentation of the data in an interface, I suggest you AVOID
eclipse and the UI code to PigPen, as there is little utility in having
this visualization there. Not all Pig users use Eclipse, and there is
little utility in editing scripts in the diagrams. There is great utility
in visualizing, understanding and debugging this way, but not so much in
On the other hand, anyone can edit Pig in their favorite tool and view
their pig graph in a simple web application on their localhost by directing
a web browser at it. This is why a simple, small web application seems
best. You can use ruby/sinatra or python/bottle/flask or perl/catalyst to
make a simple web app. Check out sigma.js for graph visualization:
http://sigmajs.org/examples.html or http://neyric.github.com/wireit/ for
something more fully featured.
Perhaps the best plan is to fix ILLUSTRATE (see
http://wiki.apache.org/pig/ExampleGenerator and talk to the guys at
mortardata.com who have a patch for this), and edit the PigPen code to
remove the Eclipse dependencies and have it output simple JSON for a web
application to consume. It could write to a file, or you could create a
simple web service that publishes JSON for the current pig session.
Once we have JSON of ILLUSTRATE... getting a web visualization is easy. I
can help, I've done it before in Cloud Stenography by parsing data in
Grunt. Which you could do, btw. Old Perl code is available on github (see
Interested in thoughts of others.
On Fri, Mar 23, 2012 at 11:21 PM, Shasha Liu <[EMAIL PROTECTED]> wrote:
> Hi Daniel,
> Thanks a lot for the reply.
> I installed the latest Pig and read through the book of "programming in
> I manged to use "-dot -out filename" to produce three graphs in dot file
> Based on the existing dot file, my next question is what is the
> requirement regarding a better visualizer?
> Are we going to generate a picture (e.g., .png) for different plans
> (logical plan, physical plan, map reduce plan), or provide a web interface
> to visualize these graphs of plans?
> Best regards,
> Shasha(Amy) Liu
> On Sun, Mar 18, 2012 at 3:30 AM, Daniel Dai <[EMAIL PROTECTED]> wrote:
>> See comments inline.
>> On Sat, Mar 17, 2012 at 6:52 AM, grassonsand <[EMAIL PROTECTED]>
>> > Dear all,
>> > I am a Ph.D. student in Computer Science and have 4-year Java
>> > experience focusing on Java Web development.
>> > In the candidate projects in PIG, I am interested in PIG-2586 (A better
>> > plan/data flow visualizer) and PIG-2599 (Mavenize Pig).
>> > In my on-going research project, I am in charge of (1). web user
>> > development and (2). build system. Now I am working on adding hadoop
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com