Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> question about machine learning on Hive


+
qiaoresearcher 2013-01-17, 21:23
+
Igor Tatarinov 2013-01-17, 21:29
Copy link to this message
-
Re: question about machine learning on Hive
In a similar way, ML algorithms can be put into a Hive UDAF.  I'm working on this at the moment, and it's proved quite straightforward to integrate liblinear into a UDAF.  As Igor notes, by setting the number of reducers, you can set the number of parallel learners.

Robin
www.baynote.com

From: Igor Tatarinov <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Thursday, January 17, 2013 1:29 PM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: question about machine learning on Hive

Here is how Twitter does it with Pig:
http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf

We use a similar approach and I think that Pig, being somewhat lower-level with better support of nested objects, is a better tool than Hive. It should be possible to do something similar with Hive but we haven't tried. The trick is to implement the learner as a serializer. Then, the number of reducers will determine how many parallel learners (bags) you can run.

igor
decide.com<http://decide.com>

On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

How to run machine learning algorithms (whatever ML algorithms) directly in Hive? assume the input and output already stored as Hive tables.

ps: I know mahout is available there, but would prefer run machine learning algorithms directly in Hive

many thanks,

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB