Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> question about machine learning on Hive


Copy link to this message
-
Re: question about machine learning on Hive
Here is how Twitter does it with Pig:
http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf

We use a similar approach and I think that Pig, being somewhat lower-level
with better support of nested objects, is a better tool than Hive. It
should be possible to do something similar with Hive but we haven't tried.
The trick is to implement the learner as a serializer. Then, the number of
reducers will determine how many parallel learners (bags) you can run.

igor
decide.com

On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher <[EMAIL PROTECTED]>wrote:

>
> How to run machine learning algorithms (whatever ML algorithms) directly
> in Hive? assume the input and output already stored as Hive tables.
>
> ps: I know mahout is available there, but would prefer run machine
> learning algorithms directly in Hive
>
> many thanks,
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB