|
|
-
pydoop -- Python MapReduce and HDFS API for HadoopSimone Leo 2009-11-06, 17:20
Hello everybody,
we recently released pydoop, a Python MapReduce and HDFS API for Hadoop: http://pydoop.sourceforge.net It is implemented as a Boost.Python wrapper around the C++ code (pipes and libhdfs). It allows you to write complete MapReduce application in CPython, with the same capabilities as the C++ API. Here is a minimal wordcount example: from pydoop.pipes import Mapper, Reducer, Factory, runTask class WordCountMapper(Mapper): def __init__(self, context): super(WordCountMapper, self).__init__(context) def map(self, context): words = context.getInputValue().split() for w in words: context.emit(w, "1") class WordCountReducer(Reducer): def __init__(self, context): super(WordCountReducer, self).__init__(context) def reduce(self, context): s = 0 while context.nextValue(): s += int(context.getInputValue()) context.emit(context.getInputKey(), str(s)) runTask(Factory(WordCountMapper, WordCountReducer)) Any feedback would be greatly appreciated. -- Simone Leo Distributed Computing group Advanced Computing and Communications program CRS4 POLARIS - Building #1 Piscina Manna I-09010 Pula (CA) - Italy e-mail: [EMAIL PROTECTED] http://www.crs4.it |