Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Introducing rPig

Copy link to this message
Re: Introducing rPig
On Sun, Jun 16, 2013 at 3:15 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> I mentioned a few months ago that I was interested in creating a new
> Scripting Engine for Pig based off of the R language. I have finally gotten
> that project to a point where I feel comfortable sharing it with the Pig
> community.
> This project can be found at: http://www.github.com/cd-wood/pigaddons
> RScriptEngine is a scripting engine for Apache Pig that interprets the R
> language <http://www.r-project.org/>. The goal behind this scripting
> engine
> is compatability and ease of use of the R language in Amazon EMR jobs.
> Included /scripts is the rpig-bootstrap.sh script, that is meant as a
> bootstrap script for Amazon EMR instances; it can also be used on personal
> instances to set up an environment compatible with the scripting engine.
> This interpreter makes use of JRI <http://www.rforge.net/JRI/> to an
> instance of R to run inside of the Java process.
> By combining R with Pig, I feel that a large number of new analyses are
> possible that can not be done natively in Pig; while there are already
> other languages for creating UDFs, the more options the better.
> A cool feature that is possible by including R in a big-data analysis
> package is the ease of generating images / plotting data provided by R.
> While not currently implemented, one upcoming feature is the integration of
> JavaGD which will allow all images generated by the R script to be rendered
> into a Java class, from which it might be possible to save, email, or do
> other stuff with those saved images.
> To showcase using R with Pig, I've included a Naive Bayes (contrived)
> example that is a simplistic form of classifying emails as spam based off
> of the presence of certain words.
> I have tested this scripting engine on Pig 0.9.2 to make sure that it works
> in Amazon EMR, however I haven't had a chance to test it in EMR yet. If
> someone does, please let me know how it goes, and if anyone has more cool
> examples of using R, I'd be happy to include them.
> And of course, please let me know of any bugs you find or any other
> suggestions you may have.
> Thanks,
> - Connor

Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com