Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Introducing rPig


Copy link to this message
-
Re: Introducing rPig
Very cool!
2013/6/17 Russell Jurney <[EMAIL PROTECTED]>

> Awesome!
>
>
> On Sun, Jun 16, 2013 at 3:15 PM, Connor Woodson <[EMAIL PROTECTED]
> >wrote:
>
> > I mentioned a few months ago that I was interested in creating a new
> > Scripting Engine for Pig based off of the R language. I have finally
> gotten
> > that project to a point where I feel comfortable sharing it with the Pig
> > community.
> >
> > This project can be found at: http://www.github.com/cd-wood/pigaddons
> >
> > RScriptEngine is a scripting engine for Apache Pig that interprets the R
> > language <http://www.r-project.org/>. The goal behind this scripting
> > engine
> > is compatability and ease of use of the R language in Amazon EMR jobs.
> > Included /scripts is the rpig-bootstrap.sh script, that is meant as a
> > bootstrap script for Amazon EMR instances; it can also be used on
> personal
> > instances to set up an environment compatible with the scripting engine.
> > This interpreter makes use of JRI <http://www.rforge.net/JRI/> to an
> > instance of R to run inside of the Java process.
> >
> > By combining R with Pig, I feel that a large number of new analyses are
> > possible that can not be done natively in Pig; while there are already
> > other languages for creating UDFs, the more options the better.
> >
> > A cool feature that is possible by including R in a big-data analysis
> > package is the ease of generating images / plotting data provided by R.
> > While not currently implemented, one upcoming feature is the integration
> of
> > JavaGD which will allow all images generated by the R script to be
> rendered
> > into a Java class, from which it might be possible to save, email, or do
> > other stuff with those saved images.
> >
> > To showcase using R with Pig, I've included a Naive Bayes (contrived)
> > example that is a simplistic form of classifying emails as spam based off
> > of the presence of certain words.
> >
> > I have tested this scripting engine on Pig 0.9.2 to make sure that it
> works
> > in Amazon EMR, however I haven't had a chance to test it in EMR yet. If
> > someone does, please let me know how it goes, and if anyone has more cool
> > examples of using R, I'd be happy to include them.
> >
> > And of course, please let me know of any bugs you find or any other
> > suggestions you may have.
> >
> > Thanks,
> >
> > - Connor
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>