Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> XML to TEXT


Copy link to this message
-
Re: XML to TEXT
Hi,

you can use org.apache.hadoop.streaming.StreamInputFormat  using map reduce
to convert XML to text.

such as your xml like this:
<xml>
  <name>lll</name>
</xml>

you need to specify stream.recordreader.begin and stream.recordreader.end
in the Configuration:
Configuration conf = new Configuration();
conf.set("stream.recordreader.begin", "<xml>");
conf.set("stream.recordreader.end", "</xml>");
On Fri, Jan 3, 2014 at 1:16 PM, Ranjini Rathinam <[EMAIL PROTECTED]>wrote:

> Hi,
>
> Need to convert XML into text using mapreduce.
>
> I have used DOM and SAX parser.
>
> After using SAX Builder in mapper class. the child node act as root
> Element.
>
> While seeing in Sys out i found thar root element is taking the child
> element and printing.
>
> For Eg,
>
> <Comp><Emp><id>100</id><name>RR</name></Emp></Comp>
> when this xml is passed in mapper , in sys out printing the root element
>
> I am getting the the root element as
>
> <id>
> <name>
>
> Please suggest and help to fix this.
>
> I need to convert the xml into text using mapreduce code. Please provide
> with example.
>
> Required output is
>
> id,name
> 100,RR
>
> Please help.
>
> Thanks in advance,
> Ranjini R
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB