Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Introducing rPig


Copy link to this message
-
Re: Introducing rPig
Very cool!
2013/6/17 Russell Jurney <[EMAIL PROTECTED]>

> Awesome!
>
>
> On Sun, Jun 16, 2013 at 3:15 PM, Connor Woodson <[EMAIL PROTECTED]
> >wrote:
>
> > I mentioned a few months ago that I was interested in creating a new
> > Scripting Engine for Pig based off of the R language. I have finally
> gotten
> > that project to a point where I feel comfortable sharing it with the Pig
> > community.
> >
> > This project can be found at: http://www.github.com/cd-wood/pigaddons
> >
> > RScriptEngine is a scripting engine for Apache Pig that interprets the R
> > language <http://www.r-project.org/>. The goal behind this scripting
> > engine
> > is compatability and ease of use of the R language in Amazon EMR jobs.
> > Included /scripts is the rpig-bootstrap.sh script, that is meant as a
> > bootstrap script for Amazon EMR instances; it can also be used on
> personal
> > instances to set up an environment compatible with the scripting engine.
> > This interpreter makes use of JRI <http://www.rforge.net/JRI/> to an
> > instance of R to run inside of the Java process.
> >
> > By combining R with Pig, I feel that a large number of new analyses are
> > possible that can not be done natively in Pig; while there are already
> > other languages for creating UDFs, the more options the better.
> >
> > A cool feature that is possible by including R in a big-data analysis
> > package is the ease of generating images / plotting data provided by R.
> > While not currently implemented, one upcoming feature is the integration
> of
> > JavaGD which will allow all images generated by the R script to be
> rendered
> > into a Java class, from which it might be possible to save, email, or do
> > other stuff with those saved images.
> >
> > To showcase using R with Pig, I've included a Naive Bayes (contrived)
> > example that is a simplistic form of classifying emails as spam based off
> > of the presence of certain words.
> >
> > I have tested this scripting engine on Pig 0.9.2 to make sure that it
> works
> > in Amazon EMR, however I haven't had a chance to test it in EMR yet. If
> > someone does, please let me know how it goes, and if anyone has more cool
> > examples of using R, I'd be happy to include them.
> >
> > And of course, please let me know of any bugs you find or any other
> > suggestions you may have.
> >
> > Thanks,
> >
> > - Connor
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB