Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Processing xml documents using StreamXmlRecordReader


Copy link to this message
-
Re: Processing xml documents using StreamXmlRecordReader
Hi,
 Set the following properties in driver class

  jobConf.set("stream.recordreader.class",
"org.apache.hadoop.streaming.StreamXmlRecordReader");
jobConf.set("stream.recordreader.begin",
"start-tag");
jobConf.set("stream.recordreader.end",
"end-tag");
                        jobConf.setInputFormat(StreamInputFormat,class);

 In Mapper, xml record will come as key of type Text,so your mapper will
look like

  public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello list,
>
>        Could anyone, who has written MapReduce jobs to process xml
> documents stored in there cluster using "StreamXmlRecordReader" share
> his/her experience??...or if you can provide me some pointers
> addressing that..Many thanks.
>
> Regards,
>     Mohammad Tariq
>

--
https://github.com/zinnia-phatak-dev/Nectar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB