Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> XML to TEXT


Copy link to this message
-
Re: XML to TEXT
Hi,

you can use org.apache.hadoop.streaming.StreamInputFormat  using map reduce
to convert XML to text.

such as your xml like this:
<xml>
  <name>lll</name>
</xml>

you need to specify stream.recordreader.begin and stream.recordreader.end
in the Configuration:
Configuration conf = new Configuration();
conf.set("stream.recordreader.begin", "<xml>");
conf.set("stream.recordreader.end", "</xml>");
On Fri, Jan 3, 2014 at 1:16 PM, Ranjini Rathinam <[EMAIL PROTECTED]>wrote:

> Hi,
>
> Need to convert XML into text using mapreduce.
>
> I have used DOM and SAX parser.
>
> After using SAX Builder in mapper class. the child node act as root
> Element.
>
> While seeing in Sys out i found thar root element is taking the child
> element and printing.
>
> For Eg,
>
> <Comp><Emp><id>100</id><name>RR</name></Emp></Comp>
> when this xml is passed in mapper , in sys out printing the root element
>
> I am getting the the root element as
>
> <id>
> <name>
>
> Please suggest and help to fix this.
>
> I need to convert the xml into text using mapreduce code. Please provide
> with example.
>
> Required output is
>
> id,name
> 100,RR
>
> Please help.
>
> Thanks in advance,
> Ranjini R
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>