Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Hadoop and XML


Copy link to this message
-
Re: Hadoop and XML
>From an earlier post:
http://oobaloo.co.uk/articles/2010/1/20/processing-xml-in-hadoop.html

On Fri, Jul 16, 2010 at 3:07 PM, Peter Minearo <
[EMAIL PROTECTED]> wrote:

> Moving the variable to a local variable did not seem to work:
>
>
> </PrivateRateSet>vateRateSet>
>
>
>
> public void map(Object key, Object value, OutputCollector output, Reporter
> reporter) throws IOException {
>                Text valueText = (Text)value;
>                String valueString = new String(valueText.getBytes(),
> "UTF-8");
>                String keyString = getXmlKey(valueString);
>                 Text returnKeyText = new Text();
>                Text returnValueText = new Text();
>                returnKeyText.set(keyString);
>                returnValueText.set(valueString);
>                output.collect(returnKeyText, returnValueText);
> }
>
> -----Original Message-----
> From: Peter Minearo [mailto:[EMAIL PROTECTED]]
> Sent: Fri 7/16/2010 2:51 PM
> To: [EMAIL PROTECTED]
> Subject: RE: Hadoop and XML
>
> Whoops....right after I sent it and someone else made a suggestion; I
> realized what question 2 was about.  I can try that, but wouldn't that
> cause Object bloat?  During the Hadoop training I went through; it was
> mentioned to reuse the returning Key and Value objects to keep the
> number of Objects created down to a minimum.  Is this not really a valid
> point?
>
>
>
> -----Original Message-----
> From: Peter Minearo [mailto:[EMAIL PROTECTED]]
> Sent: Friday, July 16, 2010 2:44 PM
> To: [EMAIL PROTECTED]
> Subject: RE: Hadoop and XML
>
>
> I am not using multi-threaded Map tasks.  Also, if I understand your
> second question correctly:
> "Also can you try creating the output key and values in the map
> method(method lacal) ?"
> In the first code snippet I am doing exactly that.
>
> Below is the class that runs the Job.
>
> public class HadoopJobClient {
>
>        private static final Log LOGGER > LogFactory.getLog(Prds.class.getName());
>
>        public static void main(String[] args) {
>                JobConf conf = new JobConf(Prds.class);
>
>                conf.set("xmlinput.start", "<PrivateRateSet>");
>                conf.set("xmlinput.end", "</PrivateRateSet>");
>
>                conf.setJobName("PRDS Parse");
>
>                conf.setOutputKeyClass(Text.class);
>                conf.setOutputValueClass(Text.class);
>
>                conf.setMapperClass(PrdsMapper.class);
>                conf.setReducerClass(PrdsReducer.class);
>
>                conf.setInputFormat(XmlInputFormat.class);
>                conf.setOutputFormat(TextOutputFormat.class);
>
>                FileInputFormat.setInputPaths(conf, new Path(args[0]));
>                FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>                // Run the job
>                try {
>                        JobClient.runJob(conf);
>                } catch (IOException e) {
>                        LOGGER.error(e.getMessage(), e);
>                }
>
>        }
>
>
> }
>
>
>
>
> -----Original Message-----
> From: Soumya Banerjee [mailto:[EMAIL PROTECTED]]
> Sent: Fri 7/16/2010 2:29 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Hadoop and XML
>
> Hi,
>
> Can you please share the code of the job submission client ?
>
> Also can you try creating the output key and values in the map
> method(method
> lacal) ?
> Make sure you are not using multi threaded map task configuration.
>
> map()
> {
> private Text keyText = new Text();
>  private Text valueText = new Text();
>
> //rest of the code
> }
>
> Soumya.
>
> On Sat, Jul 17, 2010 at 2:30 AM, Peter Minearo <
> [EMAIL PROTECTED]> wrote:
>
> > I have an XML file that has sparse data in it.  I am running a
> > MapReduce Job that reads in an XML file, pulls out a Key from within
> > the XML snippet and then hands back the Key and the XML snippet (as
> > the Value) to the OutputCollector.  The reason is to sort the file
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB