Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Processing xml documents using StreamXmlRecordReader


+
Mohammad Tariq 2012-06-18, 21:19
Copy link to this message
-
Re: Processing xml documents using StreamXmlRecordReader
Hi,
 Set the following properties in driver class

  jobConf.set("stream.recordreader.class",
"org.apache.hadoop.streaming.StreamXmlRecordReader");
jobConf.set("stream.recordreader.begin",
"start-tag");
jobConf.set("stream.recordreader.end",
"end-tag");
                        jobConf.setInputFormat(StreamInputFormat,class);

 In Mapper, xml record will come as key of type Text,so your mapper will
look like

  public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello list,
>
>        Could anyone, who has written MapReduce jobs to process xml
> documents stored in there cluster using "StreamXmlRecordReader" share
> his/her experience??...or if you can provide me some pointers
> addressing that..Many thanks.
>
> Regards,
>     Mohammad Tariq
>

--
https://github.com/zinnia-phatak-dev/Nectar
+
Mohammad Tariq 2012-06-19, 11:05
+
Mohammad Tariq 2012-06-19, 11:19
+
madhu phatak 2012-06-19, 12:13
+
Mohammad Tariq 2012-06-19, 12:24
+
Mohammad Tariq 2012-06-19, 12:28
+
madhu phatak 2012-06-19, 12:41
+
madhu phatak 2012-06-21, 07:07
+
Mohammad Tariq 2012-06-21, 07:12
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB