Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> [ANN] Hivemall: Hive scalable machine learning library


+
Makoto YUI 2013-10-02, 08:21
+
Dean Wampler 2013-10-03, 14:27
+
Makoto Yui 2013-10-04, 04:14
Copy link to this message
-
Re: [ANN] Hivemall: Hive scalable machine learning library
Looks cool im already starting to play with it.

On Friday, October 4, 2013, Makoto Yui <[EMAIL PROTECTED]> wrote:
> Hi Dean,
>
> Thank you for your interest in Hivemall.
>
> Twitter's paper actually influenced me in developing Hivemall and I
> initially implemented such functionality as Pig UDFs.
>
> Though my Pig ML library is not released, you can find a similar
> attempt for Pig in
> https://github.com/y-tag/java-pig-MyUDFs
>
> Thanks,
> Makoto
>
> 2013/10/3 Dean Wampler <[EMAIL PROTECTED]>:
>> This is great news! I know that Twitter has done something similar with
UDFs
>> for Pig, as described in this paper:
>> http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
>>
>> I'm glad to see the same thing start with Hive.
>>
>> Dean
>>
>>
>> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI <[EMAIL PROTECTED]> wrote:
>>>
>>> Hello all,
>>>
>>> My employer, AIST, has given the thumbs up to open source our machine
>>> learning library, named Hivemall.
>>>
>>> Hivemall is a scalable machine learning library running on Hive/Hadoop,
>>> licensed under the LGPL 2.1.
>>>
>>>   https://github.com/myui/hivemall
>>>
>>> Hivemall provides machine learning functionality as well as feature
>>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
>>> to be scalable to the number of training instances as well as the number
>>> of training features.
>>>
>>> Hivemall is very easy to use as every machine learning step is done
>>> within HiveQL.
>>>
>>> -- Installation is just as follows:
>>> add jar /tmp/hivemall.jar;
>>> source /tmp/define-all.hive;
>>>
>>> -- Logistic regression is performed by a query.
>>> SELECT
>>>   feature,
>>>   avg(weight) as weight
>>> FROM
>>>  (SELECT logress(features,label) as (feature,weight) FROM
>>> training_features) t
>>> GROUP BY feature;
>>>
>>> You can find detailed examples on our wiki pages.
>>> https://github.com/myui/hivemall/wiki/_pages
>>>
>>> Though we consider that Hivemall is much easier to use and more scalable
>>> than Mahout for classification/regression tasks, please check it by
>>> yourself. If you have a Hive environment, you can evaluate Hivemall
>>> within 5 minutes or so.
>>>
>>> Hope you enjoy the release! Feedback (and pull request) is always
welcome.
>>>
>>> Thank you,
>>> Makoto
>>
>>
>>
>>
>> --
>> Dean Wampler, Ph.D.
>> @deanwampler
>> http://polyglotprogramming.com
>
+
Makoto YUI 2013-10-04, 16:42
+
Clark Yang 2013-10-10, 19:28
+
Makoto YUI 2013-10-11, 07:28
+
Nitin Pawar 2013-10-11, 09:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB