Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Reading multiple lines from a microsoft doc in hadoop

Copy link to this message
Re: Reading multiple lines from a microsoft doc in hadoop
It's much easier if you convert the documents to text first


or some other doc parser

On Fri, Aug 24, 2012 at 7:52 AM, Siddharth Tiwari
> hi,
> I have doc files in msword doc and docx format. These have entries which are
> seperated by an empty line. Is it possible for me to read
> these lines separated from empty lines at a time. Also which inpurformat
> shall I use to read doc docx. Please help
> *------------------------*
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of
> God.”
> "Maybe other people will try to limit me but I don't limit myself"

Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences