Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Desicion Tree Implementation in Hadoop MapReduce


Copy link to this message
-
Re: Desicion Tree Implementation in Hadoop MapReduce
In my opinion.

1. Build the decision tree model with the training data.
2. Store it somewhere.
3. When the unlabeled data is available:
   3.1 if the unlabeled data is huge, write another mrjob to process them,
load the model at the setup stage, use the model to label the data one by
one in map stage. There is no necessary to have a reducer.
  3.2 if the unlabeled data is small, it is trivial.
2013/12/1 unmesha sreeveni <[EMAIL PROTECTED]>

> Thanks Yexi ,
>
> But how  it can be accomplished.
> The input to Desicion Tree MR will be a set of data. But while
> predicting a data it will be a one line data without classlabel right?
> So what changes will be there in mrjob.Should we design like this.
> 1. When a set of data is coming draw Desicion tree
> 2. else if a one line data is coming.check the output of decision
> tree(Decision tree generated from mr) and predict the class label.
>
> -------
>
> M1_train - dataset for training.
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Pls suggest if my thoughts are wrong.
>
> On 11/30/13, Yexi Jiang <[EMAIL PROTECTED]> wrote:
> > I watched the video in it but I cannot access its source code due to
> > permission issue.
> > In my opinion, once the decision tree model is built, the model is small
> > enough to be loaded into memory and can be used directly without another
> > mrjob for prediction. The prediction can be conducted in a streaming way.
> >
> >
> > 2013/11/30 unmesha sreeveni <[EMAIL PROTECTED]>
> >
> >> I have gone through a Map Reduce implementation of c4.5 in
> >>
> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
> >>
> >> Here a decision tree is build. So my doubt is
> >> Can we also include the prediction along with  that?
> >>
> >>
> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> You are welcome :)
> >>>
> >>>
> >>> 2013/11/25 unmesha sreeveni <[EMAIL PROTECTED]>
> >>>
> >>>> ok . Thx Yexi
> >>>>
> >>>>
> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <[EMAIL PROTECTED]>
> >>>> wrote:
> >>>>
> >>>>> As far as I know, there is no ID3 implementation in mahout currently,
> >>>>> but you can use the decision forest instead.
> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
> >>>>>
> >>>>>
> >>>>> 2013/11/25 unmesha sreeveni <[EMAIL PROTECTED]>
> >>>>>
> >>>>>> Is that ID3 classification?
> >>>>>> It includes prediction also?
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
> >>>>>> <[EMAIL PROTECTED]>wrote:
> >>>>>>
> >>>>>>> You can directly find it at https://github.com/apache/mahout, or
> you
> >>>>>>> can check out from svn by following
> >>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control
> .
> >>>>>>>
> >>>>>>>
> >>>>>>> 2013/11/23 unmesha sreeveni <[EMAIL PROTECTED]>
> >>>>>>>
> >>>>>>>>  I want to go through Decision tree implementation in mahout.
> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
> >>>>>>>>
> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
> >>>>>>>> Apache Mahout has reached version 0.6. All developers are
> encouraged
> >>>>>>>> to begin using version 0.6. Highlights include:
> >>>>>>>> Improved Decision Tree performance and added support for
> regression
> >>>>>>>> problems
> >>>>>>>>
> >>>>>>>> Where can I find its source code and documentation.
> >>>>>>>>
> >>>>>>>> Should I download mahout
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Thanks & Regards*
> >>>>>>>>
> >>>>>>>> Unmesha Sreeveni U.B
> >>>>>>>>
> >>>>>>>> *Junior Developer*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> ------
> >>>>>>> Yexi Jiang,
> >>>>>>> ECS 251,  [EMAIL PROTECTED]
> >>>>>>> School of Computer and Information Science,

Yexi Jiang,
ECS 251,  [EMAIL PROTECTED]
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB