Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> [ANN] Hivemall: Hive scalable machine learning library

Copy link to this message
Re: [ANN] Hivemall: Hive scalable machine learning library
This is great news! I know that Twitter has done something similar with
UDFs for Pig, as described in this paper:

I'm glad to see the same thing start with Hive.

On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI <[EMAIL PROTECTED]> wrote:

> Hello all,
> My employer, AIST, has given the thumbs up to open source our machine
> learning library, named Hivemall.
> Hivemall is a scalable machine learning library running on Hive/Hadoop,
> licensed under the LGPL 2.1.
>   https://github.com/myui/hivemall
> Hivemall provides machine learning functionality as well as feature
> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
> to be scalable to the number of training instances as well as the number
> of training features.
> Hivemall is very easy to use as every machine learning step is done
> within HiveQL.
> -- Installation is just as follows:
> add jar /tmp/hivemall.jar;
> source /tmp/define-all.hive;
> -- Logistic regression is performed by a query.
>   feature,
>   avg(weight) as weight
>  (SELECT logress(features,label) as (feature,weight) FROM
> training_features) t
> GROUP BY feature;
> You can find detailed examples on our wiki pages.
> https://github.com/myui/hivemall/wiki/_pages
> Though we consider that Hivemall is much easier to use and more scalable
> than Mahout for classification/regression tasks, please check it by
> yourself. If you have a Hive environment, you can evaluate Hivemall
> within 5 minutes or so.
> Hope you enjoy the release! Feedback (and pull request) is always welcome.
> Thank you,
> Makoto

Dean Wampler, Ph.D.