Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Introducing rPig


+
Connor Woodson 2013-06-16, 22:15
Copy link to this message
-
Re: Introducing rPig
Awesome!
On Sun, Jun 16, 2013 at 3:15 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> I mentioned a few months ago that I was interested in creating a new
> Scripting Engine for Pig based off of the R language. I have finally gotten
> that project to a point where I feel comfortable sharing it with the Pig
> community.
>
> This project can be found at: http://www.github.com/cd-wood/pigaddons
>
> RScriptEngine is a scripting engine for Apache Pig that interprets the R
> language <http://www.r-project.org/>. The goal behind this scripting
> engine
> is compatability and ease of use of the R language in Amazon EMR jobs.
> Included /scripts is the rpig-bootstrap.sh script, that is meant as a
> bootstrap script for Amazon EMR instances; it can also be used on personal
> instances to set up an environment compatible with the scripting engine.
> This interpreter makes use of JRI <http://www.rforge.net/JRI/> to an
> instance of R to run inside of the Java process.
>
> By combining R with Pig, I feel that a large number of new analyses are
> possible that can not be done natively in Pig; while there are already
> other languages for creating UDFs, the more options the better.
>
> A cool feature that is possible by including R in a big-data analysis
> package is the ease of generating images / plotting data provided by R.
> While not currently implemented, one upcoming feature is the integration of
> JavaGD which will allow all images generated by the R script to be rendered
> into a Java class, from which it might be possible to save, email, or do
> other stuff with those saved images.
>
> To showcase using R with Pig, I've included a Naive Bayes (contrived)
> example that is a simplistic form of classifying emails as spam based off
> of the presence of certain words.
>
> I have tested this scripting engine on Pig 0.9.2 to make sure that it works
> in Amazon EMR, however I haven't had a chance to test it in EMR yet. If
> someone does, please let me know how it goes, and if anyone has more cool
> examples of using R, I'd be happy to include them.
>
> And of course, please let me know of any bugs you find or any other
> suggestions you may have.
>
> Thanks,
>
> - Connor
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
+
Jonathan Coveney 2013-06-17, 21:05
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB