You'd have to do something about shipping the UDFs to Mongo. But try
it -- the generalization code that was pulled was stuff like fs
abstraction (want to work on something that's not HDFS? just implement
the FileSystem interface from hadoop, like S3 and Cassandra did) and
"slices" (just use an InputFormat!). You'd have to essentially write a
parallel MRCompiler and switch to using it if mongo mode is set. There
may be other problems, of course, but see how far you can get, it'd be
.. also, for smaller datasets in realtime, I just use local mode. It
can read from remote file systems, and is fast again.
On Sat, Jul 7, 2012 at 5:53 PM, Russell Jurney <[EMAIL PROTECTED]> wrote:
> I'm actually talking about implementing another system underneath Pig,
> MongoDB along with Hadoop. Write a pig script, pig translates it to Mongo
> MapReduce instead of Hadoop MapReduce, if you so desire. I know the
> generalization code was pulled a long time ago (for multiple engines
> underneath Pig, Hadoop + some), so I'm wondering how hard Pig/MongoDB would
> be to implement.
> I'd like to see Pig spread beyond Hadoop, and MongoDB's simple json
> MapReduce system might make this easy?
> On Sat, Jul 7, 2012 at 5:39 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>> There are mongo load and store functions for pig at
>> https://github.com/mongodb/mongo-hadoop/ Is this what you were looking
>> for or were you more asking if pig and mongo play well together?
>> On Jul 7, 2012, at 2:56 PM, Russell Jurney wrote:
>> > I want Pig for MongoDB, for acting on smaller datasets in realtime. Is
>> > crazy? Given that the MR code is just JSON, isn't this easier than
>> > Hadoop MapReduce?
>> > Crazy idea, I'm just curious if this might not be too hard owing to the
>> > json interface to Mongo MapReduce.
>> > --
>> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com