Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> ANN: R and Hadoop = RHIPE 0.1


Copy link to this message
-
ANN: R and Hadoop = RHIPE 0.1
Hello,
I'd like to announce the release of the 0.1 version of RHIPE -R and
Hadoop Integrated Processing Environment. Using RHIPE, it is possible
to write map-reduce algorithms using the R language and start them
from within R.
RHIPE is built on Hadoop and so benefits from Hadoop's fault
tolerance, distributed file system and job scheduling features.
For the R user, there is rhlapply which runs an lapply across the cluster.
For the Hadoop user, there is rhmr which runs a general map-reduce program.

The tired example of counting words:

m <- function(key,val){
  words <- substr(val," +")[[1]]
  wc <- table(words)
  cln <- names(wc)
  return(sapply(1:length(wc),function(r)
list(key=cln[r],value=wc[[r]]),simplify=F))
}
r <- function(key,value){
  value <- do.call("rbind",value)
  return(list(list(key=key,value=sum(value))))
}
rhmr(mapper=m,reduce=r,input.folder="X",output.folder="Y")

URL: http://ml.stat.purdue.edu/rhipe

There are some downsides to RHIPE which are described at
http://ml.stat.purdue.edu/rhipe/install.html#sec-5

Regards
Saptarshi Guha
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB