Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Logistic regression package on Hadoop


Copy link to this message
-
Re: Logistic regression package on Hadoop
Hi Rajesh,

You may want to use the mahout mailing list for mahout related question.
http://mahout.apache.org/mailinglists.html

Regards

Bertrand

On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam <[EMAIL PROTECTED]> wrote:

> Hi Harsh,
>
> Thanks for giving link for sgd from mahout.
>
> I have asked question on issue with using sgd. Below is description of it.
> Ted Dunning has mentioned their may be some issue with data encoding.
>
> However I am not able to point issue. Could you please let me know what is
> issue its format or usage.
>
> Attached uses input files
>
> I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> Converted this to csv file just by updating header: iris-3-classes.csv
>
> mahout org.apache.mahout.classifier.
> sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output /usr/local/mahout/trunk/
> *iris-3-classes.model* --target class *--categories 3* --predictors
> sepallength sepalwidth petallength petalwidth --types n
>
> >> it gave following error.
> Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
>
> Now created csv with only 2 classes. PFA iris-2-classes.csv
>
> >> trained iris-2-classes.csv with sgd
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepallength sepalwidth petallength petalwidth --types n
>
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
>
> AUC = 0.14
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.6, -0.3], [-0.8, -0.4]]
>
> >> AUC seems to poor. Now changed --predictors
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepalwidth petallength --types n n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
> AUC is improved, however from confusion matrix seems everything is
> classified as class a.
>
> Below is the output.
>
> "target","model-output","log-likelihood"
> 0,0.492,-0.677017
> 0,0.493,-0.679192
> 0,0.493,-0.678355
> 0,0.493,-0.678724
> 0,0.492,-0.676583
> 0,0.491,-0.675182
> 0,0.492,-0.677452
> 0,0.492,-0.677419
> 0,0.493,-0.679628
> 0,0.493,-0.678724
> 0,0.491,-0.676116
> 0,0.492,-0.677386
> 0,0.493,-0.679192
> 0,0.493,-0.679291
> 0,0.491,-0.674912
> 0,0.490,-0.673081
> 0,0.491,-0.675313
> 0,0.492,-0.677017
> 0,0.491,-0.675616
> 0,0.491,-0.675682
> 0,0.492,-0.677353
> 0,0.491,-0.676116
> 0,0.492,-0.676714
> 0,0.492,-0.677788
> 0,0.492,-0.677287
> 0,0.493,-0.679126
> 0,0.492,-0.677386
> 0,0.492,-0.676984
> 0,0.492,-0.677452
> 0,0.492,-0.678256
> 0,0.493,-0.678691
> 0,0.492,-0.677419
> 0,0.491,-0.674381
> 0,0.490,-0.673980
> 0,0.493,-0.678724
> 0,0.493,-0.678387
> 0,0.492,-0.677050
> 0,0.493,-0.678724
> 0,0.493,-0.679225
> 0,0.492,-0.677419
> 0,0.492,-0.677050
> 0,0.495,-0.682279
> 0,0.493,-0.678355
> 0,0.492,-0.676951
> 0,0.491,-0.675550
> 0,0.493,-0.679192
> 0,0.491,-0.675649
> 0,0.493,-0.678322
> 0,0.491,-0.676116
> 0,0.492,-0.677887
> 1,0.492,-0.709316
> 1,0.492,-0.709248
> 1,0.492,-0.708935
> 1,0.494,-0.705048
> 1,0.493,-0.707488
> 1,0.493,-0.707454
> 1,0.492,-0.709765
> 1,0.494,-0.705258
> 1,0.493,-0.707936
> 1,0.493,-0.706803
> 1,0.495,-0.703539
> 1,0.493,-0.708249
> 1,0.494,-0.704601
> 1,0.493,-0.707970
> 1,0.493,-0.707597
> 1,0.492,-0.708765
> 1,0.492,-0.708351
> 1,0.493,-0.706871
> 1,0.494,-0.704770
> 1,0.494,-0.705908
> 1,0.492,-0.709350
> 1,0.493,-0.707285
> 1,0.493,-0.706247
> 1,0.493,-0.707522
Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB