Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - [ANN] Hivemall: Hive scalable machine learning library


Copy link to this message
-
Re: [ANN] Hivemall: Hive scalable machine learning library
Edward Capriolo 2013-10-04, 14:02
Looks cool im already starting to play with it.

On Friday, October 4, 2013, Makoto Yui <[EMAIL PROTECTED]> wrote:
> Hi Dean,
>
> Thank you for your interest in Hivemall.
>
> Twitter's paper actually influenced me in developing Hivemall and I
> initially implemented such functionality as Pig UDFs.
>
> Though my Pig ML library is not released, you can find a similar
> attempt for Pig in
> https://github.com/y-tag/java-pig-MyUDFs
>
> Thanks,
> Makoto
>
> 2013/10/3 Dean Wampler <[EMAIL PROTECTED]>:
>> This is great news! I know that Twitter has done something similar with
UDFs
>> for Pig, as described in this paper:
>> http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
>>
>> I'm glad to see the same thing start with Hive.
>>
>> Dean
>>
>>
>> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI <[EMAIL PROTECTED]> wrote:
>>>
>>> Hello all,
>>>
>>> My employer, AIST, has given the thumbs up to open source our machine
>>> learning library, named Hivemall.
>>>
>>> Hivemall is a scalable machine learning library running on Hive/Hadoop,
>>> licensed under the LGPL 2.1.
>>>
>>>   https://github.com/myui/hivemall
>>>
>>> Hivemall provides machine learning functionality as well as feature
>>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
>>> to be scalable to the number of training instances as well as the number
>>> of training features.
>>>
>>> Hivemall is very easy to use as every machine learning step is done
>>> within HiveQL.
>>>
>>> -- Installation is just as follows:
>>> add jar /tmp/hivemall.jar;
>>> source /tmp/define-all.hive;
>>>
>>> -- Logistic regression is performed by a query.
>>> SELECT
>>>   feature,
>>>   avg(weight) as weight
>>> FROM
>>>  (SELECT logress(features,label) as (feature,weight) FROM
>>> training_features) t
>>> GROUP BY feature;
>>>
>>> You can find detailed examples on our wiki pages.
>>> https://github.com/myui/hivemall/wiki/_pages
>>>
>>> Though we consider that Hivemall is much easier to use and more scalable
>>> than Mahout for classification/regression tasks, please check it by
>>> yourself. If you have a Hive environment, you can evaluate Hivemall
>>> within 5 minutes or so.
>>>
>>> Hope you enjoy the release! Feedback (and pull request) is always
welcome.
>>>
>>> Thank you,
>>> Makoto
>>
>>
>>
>>
>> --
>> Dean Wampler, Ph.D.
>> @deanwampler
>> http://polyglotprogramming.com
>