Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> pydoop -- Python MapReduce and HDFS API for Hadoop


Copy link to this message
-
pydoop -- Python MapReduce and HDFS API for Hadoop
Hello everybody,

we recently released pydoop, a Python MapReduce and HDFS API for Hadoop:

http://pydoop.sourceforge.net

It is implemented as a Boost.Python wrapper around the C++ code (pipes
and libhdfs). It allows you to write complete MapReduce application in
CPython, with the same capabilities as the C++ API. Here is a minimal
wordcount example:
from pydoop.pipes import Mapper, Reducer, Factory, runTask

class WordCountMapper(Mapper):

  def __init__(self, context):
    super(WordCountMapper, self).__init__(context)

  def map(self, context):
    words = context.getInputValue().split()
    for w in words:
      context.emit(w, "1")

class WordCountReducer(Reducer):

  def __init__(self, context):
    super(WordCountReducer, self).__init__(context)

  def reduce(self, context):
    s = 0
    while context.nextValue():
      s += int(context.getInputValue())
    context.emit(context.getInputKey(), str(s))

runTask(Factory(WordCountMapper, WordCountReducer))
Any feedback would be greatly appreciated.

--
Simone Leo
Distributed Computing group
Advanced Computing and Communications program
CRS4
POLARIS - Building #1
Piscina Manna
I-09010 Pula (CA) - Italy
e-mail: [EMAIL PROTECTED]
http://www.crs4.it
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB