|
Rajesh Nikam
2012-10-12, 13:06
Harsh J
2012-10-12, 15:36
Ted Dunning
2012-10-12, 17:21
Rajesh Nikam
2012-10-15, 12:34
Bertrand Dechoux
2012-10-15, 12:53
|
-
Logistic regression package on HadoopRajesh Nikam 2012-10-12, 13:06
Hi,
Could you please suggest Logistic regression package that could be used on Hadoop ? I have large data and looking for LR package with kernel supports. Thanks Rajesh
-
Re: Logistic regression package on HadoopHarsh J 2012-10-12, 15:36
Hi Rajesh,
Please head over to the Apache Mahout project. See https://cwiki.apache.org/MAHOUT/logistic-regression.html Apache Mahout is homed at http://mahout.apache.org and works well with Hadoop MR, etc.. On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <[EMAIL PROTECTED]> wrote: > Hi, > > Could you please suggest Logistic regression package that could be used on > Hadoop ? > I have large data and looking for LR package with kernel supports. > > Thanks > Rajesh > > -- Harsh J
-
Re: Logistic regression package on HadoopTed Dunning 2012-10-12, 17:21
Harsh,
THanks for the plug. Rajesh has been talking to us. On Fri, Oct 12, 2012 at 8:36 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Hi Rajesh, > > Please head over to the Apache Mahout project. See > https://cwiki.apache.org/MAHOUT/logistic-regression.html > > Apache Mahout is homed at http://mahout.apache.org and works well with > Hadoop MR, etc.. > > On Fri, Oct 12, 2012 at 6:36 PM, Rajesh Nikam <[EMAIL PROTECTED]> > wrote: > > Hi, > > > > Could you please suggest Logistic regression package that could be used > on > > Hadoop ? > > I have large data and looking for LR package with kernel supports. > > > > Thanks > > Rajesh > > > > > > > > -- > Harsh J >
-
Re: Logistic regression package on HadoopRajesh Nikam 2012-10-15, 12:34
Hi Harsh,
Thanks for giving link for sgd from mahout. I have asked question on issue with using sgd. Below is description of it. Ted Dunning has mentioned their may be some issue with data encoding. However I am not able to point issue. Could you please let me know what is issue its format or usage. Attached uses input files I am using Iris Plants Database from Michael Marshall. PFA iris.arff. Converted this to csv file just by updating header: iris-3-classes.csv mahout org.apache.mahout.classifier. sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output /usr/local/mahout/trunk/ *iris-3-classes.model* --target class *--categories 3* --predictors sepallength sepalwidth petallength petalwidth --types n >> it gave following error. Exception in thread "main" java.lang.IllegalArgumentException: Can only call classifyScalar with two categories Now created csv with only 2 classes. PFA iris-2-classes.csv >> trained iris-2-classes.csv with sgd mahout org.apache.mahout.classifier.sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories 2* --predictors sepallength sepalwidth petallength petalwidth --types n mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion AUC = 0.14 confusion: [[50.0, 50.0], [0.0, 0.0]] entropy: [[-0.6, -0.3], [-0.8, -0.4]] >> AUC seems to poor. Now changed --predictors mahout org.apache.mahout.classifier.sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories 2* --predictors sepalwidth petallength --types n n mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion --scores AUC = 0.80 confusion: [[50.0, 50.0], [0.0, 0.0]] entropy: [[-0.7, -0.3], [-0.7, -0.4]] AUC is improved, however from confusion matrix seems everything is classified as class a. Below is the output. "target","model-output","log-likelihood" 0,0.492,-0.677017 0,0.493,-0.679192 0,0.493,-0.678355 0,0.493,-0.678724 0,0.492,-0.676583 0,0.491,-0.675182 0,0.492,-0.677452 0,0.492,-0.677419 0,0.493,-0.679628 0,0.493,-0.678724 0,0.491,-0.676116 0,0.492,-0.677386 0,0.493,-0.679192 0,0.493,-0.679291 0,0.491,-0.674912 0,0.490,-0.673081 0,0.491,-0.675313 0,0.492,-0.677017 0,0.491,-0.675616 0,0.491,-0.675682 0,0.492,-0.677353 0,0.491,-0.676116 0,0.492,-0.676714 0,0.492,-0.677788 0,0.492,-0.677287 0,0.493,-0.679126 0,0.492,-0.677386 0,0.492,-0.676984 0,0.492,-0.677452 0,0.492,-0.678256 0,0.493,-0.678691 0,0.492,-0.677419 0,0.491,-0.674381 0,0.490,-0.673980 0,0.493,-0.678724 0,0.493,-0.678387 0,0.492,-0.677050 0,0.493,-0.678724 0,0.493,-0.679225 0,0.492,-0.677419 0,0.492,-0.677050 0,0.495,-0.682279 0,0.493,-0.678355 0,0.492,-0.676951 0,0.491,-0.675550 0,0.493,-0.679192 0,0.491,-0.675649 0,0.493,-0.678322 0,0.491,-0.676116 0,0.492,-0.677887 1,0.492,-0.709316 1,0.492,-0.709248 1,0.492,-0.708935 1,0.494,-0.705048 1,0.493,-0.707488 1,0.493,-0.707454 1,0.492,-0.709765 1,0.494,-0.705258 1,0.493,-0.707936 1,0.493,-0.706803 1,0.495,-0.703539 1,0.493,-0.708249 1,0.494,-0.704601 1,0.493,-0.707970 1,0.493,-0.707597 1,0.492,-0.708765 1,0.492,-0.708351 1,0.493,-0.706871 1,0.494,-0.704770 1,0.494,-0.705908 1,0.492,-0.709350 1,0.493,-0.707285 1,0.493,-0.706247 1,0.493,-0.707522 1,0.493,-0.707835 1,0.492,-0.708317 1,0.493,-0.707556 1,0.492,-0.708520 1,0.493,-0.707902 1,0.494,-0.706220 1,0.494,-0.705427 1,0.494,-0.705393 1,0.493,-0.706803 1,0.493,-0.707210 1,0.492,-0.708351 1,0.492,-0.710146 1,0.492,-0.708867 1,0.494,-0.705183 1,0.493,-0.708215 1,0.494,-0.705942 1,0.493,-0.706525 1,0.492,-0.708385 1,0.493,-0.706389 1,0.494,-0.704811 1,0.493,-0.706905 1,0.493,-0.708249 1,0.493,-0.707801 1,0.493,-0.707835 1,0.494,-0.705604 1,0.493,-0.707319 AUC = 0.80 confusion: [[50.0, 50.0], [0.0, 0.0]] entropy: [[-0.7, -0.3], [-0.7, -0.4]] On Fri, Oct 12, 2012 at 10:51 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
-
Re: Logistic regression package on HadoopBertrand Dechoux 2012-10-15, 12:53
Hi Rajesh,
You may want to use the mahout mailing list for mahout related question. http://mahout.apache.org/mailinglists.html Regards Bertrand On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam <[EMAIL PROTECTED]> wrote: > Hi Harsh, > > Thanks for giving link for sgd from mahout. > > I have asked question on issue with using sgd. Below is description of it. > Ted Dunning has mentioned their may be some issue with data encoding. > > However I am not able to point issue. Could you please let me know what is > issue its format or usage. > > Attached uses input files > > I am using Iris Plants Database from Michael Marshall. PFA iris.arff. > Converted this to csv file just by updating header: iris-3-classes.csv > > mahout org.apache.mahout.classifier. > sgd.TrainLogistic --input /usr/local/mahout/trunk/*iris-3-classes.csv*--features 4 --output /usr/local/mahout/trunk/ > *iris-3-classes.model* --target class *--categories 3* --predictors > sepallength sepalwidth petallength petalwidth --types n > > >> it gave following error. > Exception in thread "main" java.lang.IllegalArgumentException: Can only > call classifyScalar with two categories > > Now created csv with only 2 classes. PFA iris-2-classes.csv > > >> trained iris-2-classes.csv with sgd > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input > /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output > /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories > 2* --predictors sepallength sepalwidth petallength petalwidth --types n > > > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion > > AUC = 0.14 > confusion: [[50.0, 50.0], [0.0, 0.0]] > entropy: [[-0.6, -0.3], [-0.8, -0.4]] > > >> AUC seems to poor. Now changed --predictors > > mahout org.apache.mahout.classifier.sgd.TrainLogistic --input > /usr/local/mahout/trunk/*iris-2-classes.csv* --features 4 --output > /usr/local/mahout/trunk/*iris-2-classes.mode*l --target class *--categories > 2* --predictors sepalwidth petallength --types n n > > mahout runlogistic --input /usr/local/mahout/trunk/iris-2-classes.csv > --model /usr/local/mahout/trunk/iris-2-classes.model --auc --confusion > --scores > > AUC = 0.80 > confusion: [[50.0, 50.0], [0.0, 0.0]] > entropy: [[-0.7, -0.3], [-0.7, -0.4]] > > AUC is improved, however from confusion matrix seems everything is > classified as class a. > > Below is the output. > > "target","model-output","log-likelihood" > 0,0.492,-0.677017 > 0,0.493,-0.679192 > 0,0.493,-0.678355 > 0,0.493,-0.678724 > 0,0.492,-0.676583 > 0,0.491,-0.675182 > 0,0.492,-0.677452 > 0,0.492,-0.677419 > 0,0.493,-0.679628 > 0,0.493,-0.678724 > 0,0.491,-0.676116 > 0,0.492,-0.677386 > 0,0.493,-0.679192 > 0,0.493,-0.679291 > 0,0.491,-0.674912 > 0,0.490,-0.673081 > 0,0.491,-0.675313 > 0,0.492,-0.677017 > 0,0.491,-0.675616 > 0,0.491,-0.675682 > 0,0.492,-0.677353 > 0,0.491,-0.676116 > 0,0.492,-0.676714 > 0,0.492,-0.677788 > 0,0.492,-0.677287 > 0,0.493,-0.679126 > 0,0.492,-0.677386 > 0,0.492,-0.676984 > 0,0.492,-0.677452 > 0,0.492,-0.678256 > 0,0.493,-0.678691 > 0,0.492,-0.677419 > 0,0.491,-0.674381 > 0,0.490,-0.673980 > 0,0.493,-0.678724 > 0,0.493,-0.678387 > 0,0.492,-0.677050 > 0,0.493,-0.678724 > 0,0.493,-0.679225 > 0,0.492,-0.677419 > 0,0.492,-0.677050 > 0,0.495,-0.682279 > 0,0.493,-0.678355 > 0,0.492,-0.676951 > 0,0.491,-0.675550 > 0,0.493,-0.679192 > 0,0.491,-0.675649 > 0,0.493,-0.678322 > 0,0.491,-0.676116 > 0,0.492,-0.677887 > 1,0.492,-0.709316 > 1,0.492,-0.709248 > 1,0.492,-0.708935 > 1,0.494,-0.705048 > 1,0.493,-0.707488 > 1,0.493,-0.707454 > 1,0.492,-0.709765 > 1,0.494,-0.705258 > 1,0.493,-0.707936 > 1,0.493,-0.706803 > 1,0.495,-0.703539 > 1,0.493,-0.708249 > 1,0.494,-0.704601 > 1,0.493,-0.707970 > 1,0.493,-0.707597 > 1,0.492,-0.708765 > 1,0.492,-0.708351 > 1,0.493,-0.706871 > 1,0.494,-0.704770 > 1,0.494,-0.705908 > 1,0.492,-0.709350 > 1,0.493,-0.707285 > 1,0.493,-0.706247 > 1,0.493,-0.707522 Bertrand Dechoux |