Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Text Analysis

Copy link to this message
Re: Text Analysis
If you've got existing R code, you might want to look at this http://www.quora.com/How-can-R-and-Hadoop-be-used-together.
Quora posting, also by Cloudera, or the rhipe R Hadoop package https://github.com/saptarshiguha/RHIPE/wiki
Mahout and Lucene/Solr offer some level of text analysis, although I would not call these complete text analysis packages.
What I've found are specific algorithms as opposed to a complete package: for example LDA for topic discovery -- Mahout and Yahoo Research (https://github.com/shravanmn/Yahoo_LDA) have Hadoop based implementations -- in the case of Yahoo_LDA the data is stored in HDFS, while the computation is essentially MPI based. Whether the algorithm reads data from HDFS store and uses another approach other than map reduce is another question.

On Apr 25, 2012, at 12:47 PM, Jagat wrote:

> There are Api which you can use , offcourse they are third party.
> -----------
> Sent from Mobile , short and crisp.
> On 25-Apr-2012 8:57 PM, "Robert Evans" <[EMAIL PROTECTED]> wrote:
>> Hadoop itself is the core Map/Reduce and HDFS functionality.  The higher
>> level algorithms like sentiment analysis are often done by others.
>> Cloudera has a video from HadoopWorld 2010 about it
>> http://www.cloudera.com/resource/hw10_video_sentiment_analysis_powered_by_hadoop/
>> And there are likely to be other tools like R that can help you out with
>> it.  I am not really sure if mahout offers sentiment analysis or not, but
>> you might want to look there too http://mahout.apache.org/
>> --Bobby Evans
>> On 4/25/12 7:50 AM, "[EMAIL PROTECTED]" <
>> [EMAIL PROTECTED]> wrote:
>> Hi,
>> I wanted to know if there are any existing API's within Hadoop for us to
>> do some text analysis like sentiment analysis, etc. OR are we to rely on
>> tools like R, etc. for this.
>> Regards,
>> Karanveer
>> This e-mail and any attachments are confidential and intended
>> solely for the addressee and may also be privileged or exempt from
>> disclosure under applicable law. If you are not the addressee, or
>> have received this e-mail in error, please notify the sender
>> immediately, delete it from your system and do not copy, disclose
>> or otherwise act upon any part of this e-mail or its attachments.
>> Internet communications are not guaranteed to be secure or
>> virus-free.
>> The Barclays Group does not accept responsibility for any loss
>> arising from unauthorised access to, or interference with, any
>> Internet communications by any third party, or from the
>> transmission of any viruses. Replies to this e-mail may be
>> monitored by the Barclays Group for operational or business
>> reasons.
>> Any opinion or other information in this e-mail or its attachments
>> that does not relate to the business of the Barclays Group is
>> personal to the sender and is not given or endorsed by the Barclays
>> Group.
>> Barclays Bank PLC. Registered in England and Wales (registered no.
>> 1026167).
>> Registered Office: 1 Churchill Place, London, E14 5HP, United
>> Kingdom.
>> Barclays Bank PLC is authorised and regulated by the Financial
>> Services Authority.