Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - [GSoC 2012] Self Introduction and interested projects


Copy link to this message
-
Re: [GSoC 2012] Self Introduction and interested projects
Russell Jurney 2012-04-06, 14:29
That's great, I'll review it now!

Russell Jurney http://datasyndrome.com

On Apr 6, 2012, at 7:03 AM, Shasha Liu <[EMAIL PROTECTED]> wrote:

> Hi Russell,
>
> Based on the email discussions, I wrote my proposal of this pig visualizer project and submit it  onto google-melange. Please take a look at it at your convenience, and it would also appreciated a lot if further feedback/comments could be provided.
>
> Thank you very much.
> Best,
>
> On Sun, Mar 25, 2012 at 9:25 PM, Russell Jurney <[EMAIL PROTECTED]> wrote:
> I suggest you create a simple, minimal web application that visualizes a pig script file each time a url with the script filename is loaded.  
>
> For instance, the process to use the tool might go like this:
>
> 1) Run pigvisualizer.(pl/py/rb) locally, at the start of your pig work session
> 2) Create a new pig script at /my/dif/filename.pig
> 3) Open http://localhost:4567/pigviz/my/dir/filename.pig in a web browser
> 4) See a javascript-based visualization of your pig script
> 5) Reload this web page each time you want to see a new visualization OR have to page try to reload the file periodically
>
> There are several sources of data:
>
> 1) Start a pig session, via grunt,PigServer or HCatalog, and use ILLUSTRATE/EXPLAIN.  An old example of doing this is available at https://github.com/rjurney/Cloud-Stenography
> 2) Use the explain or -dot commands from pig command line. In looking at the dot output, the graph is not as helpful as I had thought :(
> 3) Use the PigPen code to get ILLUSTRATE data for visualization
>
> The ideal situation is that you get the data plan via EXPLAIN, and sample data via ILLUSTRATE, and combine them to produce an even better version of figure 2 in the paper http://infolab.stanford.edu/~olston/publications/sigmod09.pdf
>
> <image.png>
>
> As to the presentation of the data in an interface, I suggest you AVOID eclipse and the UI code to PigPen, as there is little utility in having this visualization there.  Not all Pig users use Eclipse, and there is little utility in editing scripts in the diagrams.  There is great utility in visualizing, understanding and debugging this way, but not so much in editing.
>
> On the other hand, anyone can edit Pig in their favorite tool and view their pig graph in a simple web application on their localhost by directing a web browser at it.  This is why a simple, small web application seems best. You can use ruby/sinatra or python/bottle/flask or perl/catalyst to make a simple web app.  Check out sigma.js for graph visualization: http://sigmajs.org/examples.html or http://neyric.github.com/wireit/ for something more fully featured.
>
> Perhaps the best plan is to fix ILLUSTRATE (see http://wiki.apache.org/pig/ExampleGenerator and talk to the guys at mortardata.com who have a patch for this), and edit the PigPen code to remove the Eclipse dependencies and have it output simple JSON for a web application to consume.  It could write to a file, or you could create a simple web service that publishes JSON for the current pig session.
>
> Once we have JSON of ILLUSTRATE... getting a web visualization is easy.  I can help, I've done it before in Cloud Stenography by parsing data in Grunt.  Which you could do, btw.  Old Perl code is available on github (see above link).
>
> Interested in thoughts of others.
>
> On Fri, Mar 23, 2012 at 11:21 PM, Shasha Liu <[EMAIL PROTECTED]> wrote:
> Hi Daniel,
>
> Thanks a lot for the reply.
> I installed the latest Pig and read through the book of "programming in pig".
> I manged to use "-dot -out filename" to produce three graphs in dot file format.
>
> Based on the existing dot file, my next question is what is the requirement regarding a better visualizer?
> Are we going to generate a picture (e.g., .png) for different plans (logical plan, physical plan, map reduce plan), or provide a web interface to visualize these graphs of plans?
>
> Best regards,
> --
> Shasha(Amy) Liu
>
>
> On Sun, Mar 18, 2012 at 3:30 AM, Daniel Dai <[EMAIL PROTECTED]> wrote: