Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Logistic regression package on Hadoop


Copy link to this message
-
Re: Logistic regression package on Hadoop
Bertrand Dechoux 2012-10-15, 12:53
Hi Rajesh,

You may want to use the mahout mailing list for mahout related question.
http://mahout.apache.org/mailinglists.html

Regards

Bertrand

On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam <[EMAIL PROTECTED]> wrote:

> Hi Harsh,
>
> Thanks for giving link for sgd from mahout.
>
> I have asked question on issue with using sgd. Below is description of it.
> Ted Dunning has mentioned their may be some issue with data encoding.
>
> However I am not able to point issue. Could you please let me know what is
> issue its format or usage.
>
> Attached uses input files
>
> I am using Iris Plants Database from Michael Marshall. PFA iris.arff.
> Converted this to csv file just by updating header: iris-3-classes.csv
>
> mahout org.apache.mahout.classifier.
> sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output /usr/local/mahout/trunk/
> *iris-3-classes.model* --target class *--categories 3* --predictors
> sepallength sepalwidth petallength petalwidth --types n
>
> >> it gave following error.
> Exception in thread "main" java.lang.IllegalArgumentException: Can only
> call classifyScalar with two categories
>
> Now created csv with only 2 classes. PFA iris-2-classes.csv
>
> >> trained iris-2-classes.csv with sgd
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepallength sepalwidth petallength petalwidth --types n
>
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
>
> AUC = 0.14
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.6, -0.3], [-0.8, -0.4]]
>
> >> AUC seems to poor. Now changed --predictors
>
> mahout org.apache.mahout.classifier.sgd.TrainLogistic --input
> /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output
> /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories
> 2* --predictors sepalwidth petallength --types n n
>
> mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv
> --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion
> --scores
>
> AUC = 0.80
> confusion: [[50.0, 50.0], [0.0, 0.0]]
> entropy: [[-0.7, -0.3], [-0.7, -0.4]]
>
> AUC is improved, however from confusion matrix seems everything is
> classified as class a.
>
> Below is the output.
>
> "target","model-output","log-likelihood"
> 0,0.492,-0.677017
> 0,0.493,-0.679192
> 0,0.493,-0.678355
> 0,0.493,-0.678724
> 0,0.492,-0.676583
> 0,0.491,-0.675182
> 0,0.492,-0.677452
> 0,0.492,-0.677419
> 0,0.493,-0.679628
> 0,0.493,-0.678724
> 0,0.491,-0.676116
> 0,0.492,-0.677386
> 0,0.493,-0.679192
> 0,0.493,-0.679291
> 0,0.491,-0.674912
> 0,0.490,-0.673081
> 0,0.491,-0.675313
> 0,0.492,-0.677017
> 0,0.491,-0.675616
> 0,0.491,-0.675682
> 0,0.492,-0.677353
> 0,0.491,-0.676116
> 0,0.492,-0.676714
> 0,0.492,-0.677788
> 0,0.492,-0.677287
> 0,0.493,-0.679126
> 0,0.492,-0.677386
> 0,0.492,-0.676984
> 0,0.492,-0.677452
> 0,0.492,-0.678256
> 0,0.493,-0.678691
> 0,0.492,-0.677419
> 0,0.491,-0.674381
> 0,0.490,-0.673980
> 0,0.493,-0.678724
> 0,0.493,-0.678387
> 0,0.492,-0.677050
> 0,0.493,-0.678724
> 0,0.493,-0.679225
> 0,0.492,-0.677419
> 0,0.492,-0.677050
> 0,0.495,-0.682279
> 0,0.493,-0.678355
> 0,0.492,-0.676951
> 0,0.491,-0.675550
> 0,0.493,-0.679192
> 0,0.491,-0.675649
> 0,0.493,-0.678322
> 0,0.491,-0.676116
> 0,0.492,-0.677887
> 1,0.492,-0.709316
> 1,0.492,-0.709248
> 1,0.492,-0.708935
> 1,0.494,-0.705048
> 1,0.493,-0.707488
> 1,0.493,-0.707454
> 1,0.492,-0.709765
> 1,0.494,-0.705258
> 1,0.493,-0.707936
> 1,0.493,-0.706803
> 1,0.495,-0.703539
> 1,0.493,-0.708249
> 1,0.494,-0.704601
> 1,0.493,-0.707970
> 1,0.493,-0.707597
> 1,0.492,-0.708765
> 1,0.492,-0.708351
> 1,0.493,-0.706871
> 1,0.494,-0.704770
> 1,0.494,-0.705908
> 1,0.492,-0.709350
> 1,0.493,-0.707285
> 1,0.493,-0.706247
> 1,0.493,-0.707522
Bertrand Dechoux