Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Processing xml documents using StreamXmlRecordReader


Copy link to this message
-
Re: Processing xml documents using StreamXmlRecordReader
Hello Madhu,

             Thanks for the response. Actually I was trying to use the
new API (Job). Have you tried that. I was not able to set the
InputFormat using the Job API.

Regards,
    Mohammad Tariq
On Tue, Jun 19, 2012 at 4:28 PM, madhu phatak <[EMAIL PROTECTED]> wrote:
> Hi,
>  Set the following properties in driver class
>
>   jobConf.set("stream.recordreader.class",
> "org.apache.hadoop.streaming.StreamXmlRecordReader");
> jobConf.set("stream.recordreader.begin",
> "start-tag");
> jobConf.set("stream.recordreader.end",
> "end-tag");
>                         jobConf.setInputFormat(StreamInputFormat,class);
>
>  In Mapper, xml record will come as key of type Text,so your mapper will
> look like
>
>   public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
>
>
> On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>
>> Hello list,
>>
>>        Could anyone, who has written MapReduce jobs to process xml
>> documents stored in there cluster using "StreamXmlRecordReader" share
>> his/her experience??...or if you can provide me some pointers
>> addressing that..Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>
>
>
>
> --
> https://github.com/zinnia-phatak-dev/Nectar
>