Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Processing xml documents using StreamXmlRecordReader


+
Mohammad Tariq 2012-06-18, 21:19
Copy link to this message
-
Re: Processing xml documents using StreamXmlRecordReader
Hi,
 Set the following properties in driver class

  jobConf.set("stream.recordreader.class",
"org.apache.hadoop.streaming.StreamXmlRecordReader");
jobConf.set("stream.recordreader.begin",
"start-tag");
jobConf.set("stream.recordreader.end",
"end-tag");
                        jobConf.setInputFormat(StreamInputFormat,class);

 In Mapper, xml record will come as key of type Text,so your mapper will
look like

  public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello list,
>
>        Could anyone, who has written MapReduce jobs to process xml
> documents stored in there cluster using "StreamXmlRecordReader" share
> his/her experience??...or if you can provide me some pointers
> addressing that..Many thanks.
>
> Regards,
>     Mohammad Tariq
>

--
https://github.com/zinnia-phatak-dev/Nectar
+
Mohammad Tariq 2012-06-19, 11:05
+
Mohammad Tariq 2012-06-19, 11:19
+
madhu phatak 2012-06-19, 12:13
+
Mohammad Tariq 2012-06-19, 12:24
+
Mohammad Tariq 2012-06-19, 12:28
+
madhu phatak 2012-06-19, 12:41
+
madhu phatak 2012-06-21, 07:07
+
Mohammad Tariq 2012-06-21, 07:12