Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Processing xml documents using StreamXmlRecordReader


Copy link to this message
-
Re: Processing xml documents using StreamXmlRecordReader
Hello Madhu,

             Thanks for the response. Actually I was trying to use the
new API (Job). Have you tried that. I was not able to set the
InputFormat using the Job API.

Regards,
    Mohammad Tariq
On Tue, Jun 19, 2012 at 4:28 PM, madhu phatak <[EMAIL PROTECTED]> wrote:
> Hi,
>  Set the following properties in driver class
>
>   jobConf.set("stream.recordreader.class",
> "org.apache.hadoop.streaming.StreamXmlRecordReader");
> jobConf.set("stream.recordreader.begin",
> "start-tag");
> jobConf.set("stream.recordreader.end",
> "end-tag");
>                         jobConf.setInputFormat(StreamInputFormat,class);
>
>  In Mapper, xml record will come as key of type Text,so your mapper will
> look like
>
>   public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
>
>
> On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>
>> Hello list,
>>
>>        Could anyone, who has written MapReduce jobs to process xml
>> documents stored in there cluster using "StreamXmlRecordReader" share
>> his/her experience??...or if you can provide me some pointers
>> addressing that..Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>
>
>
>
> --
> https://github.com/zinnia-phatak-dev/Nectar
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB