Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Processing xml documents using StreamXmlRecordReader


Copy link to this message
-
Re: Processing xml documents using StreamXmlRecordReader
Mohammad Tariq 2012-06-19, 11:19
My driver function looks like this -

public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
// TODO Auto-generated method stub

Configuration conf = new Configuration();
Job job = new Job();
conf.set("stream.recordreader.class",
"org.apache.hadoop.streaming.StreamXmlRecordReader");
conf.set("stream.recordreader.begin", "<info>");
conf.set("stream.recordreader.end", "</info>");
job.setInputFormatClass(StreamInputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("/mapin/demo.xml"));
FileOutputFormat.setOutputPath(job, new Path("/mapout/demo"));
job.waitForCompletion(true);
}

Could you please out my mistake??

Regards,
    Mohammad Tariq
On Tue, Jun 19, 2012 at 4:35 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Hello Madhu,
>
>             Thanks for the response. Actually I was trying to use the
> new API (Job). Have you tried that. I was not able to set the
> InputFormat using the Job API.
>
> Regards,
>     Mohammad Tariq
>
>
> On Tue, Jun 19, 2012 at 4:28 PM, madhu phatak <[EMAIL PROTECTED]> wrote:
>> Hi,
>>  Set the following properties in driver class
>>
>>   jobConf.set("stream.recordreader.class",
>> "org.apache.hadoop.streaming.StreamXmlRecordReader");
>> jobConf.set("stream.recordreader.begin",
>> "start-tag");
>> jobConf.set("stream.recordreader.end",
>> "end-tag");
>>                         jobConf.setInputFormat(StreamInputFormat,class);
>>
>>  In Mapper, xml record will come as key of type Text,so your mapper will
>> look like
>>
>>   public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
>>
>>
>> On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>>
>>> Hello list,
>>>
>>>        Could anyone, who has written MapReduce jobs to process xml
>>> documents stored in there cluster using "StreamXmlRecordReader" share
>>> his/her experience??...or if you can provide me some pointers
>>> addressing that..Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>
>>
>>
>>
>> --
>> https://github.com/zinnia-phatak-dev/Nectar
>>