Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Reading fields from a Text line


Copy link to this message
-
Re: Reading fields from a Text line
Thank you everyone. Here is the code from the driver :

Configuration conf = new Configuration();
conf.addResource("/home/cluster/hadoop-1.0.3/conf/core-site.xml");
conf.addResource("/home/cluster/hadoop-1.0.3/conf/hdfs-site.xml");
Job job = new Job(conf, "XPTReader");
job.setJarByClass(XPTReader.class);
job.setMapperClass(XPTMapper.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
Path inPath = new Path("/mapin/TX.xpt");
FileInputFormat.addInputPath(job, inPath);
FileOutputFormat.setOutputPath(job, new
Path("/mapout/"+inPath.toString().split("/")[4]+java.util.Random.class.newInstance().nextInt()));
System.exit(job.waitForCompletion(true) ? 0 : 1);

Bejoy : I have observed one strange thing. When I am using
IntWritable, the output file contains the entire content of the input
file, but if I am using LongWritable, the output file is empty.

Sri, Code is working outside MR.

Regards,
    Mohammad Tariq
On Thu, Aug 2, 2012 at 4:38 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi Tariq
>
> I assume the mapper being used is IdentityMapper instead of XPTMapper class. Can you share your main class?
>
> If you are using TextInputFormat an reading from a file in hdfs, it should have LongWritable Keys as input and your code has IntWritable as the input key type. Have a check on that as well.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <[EMAIL PROTECTED]>
> Date: Thu, 2 Aug 2012 15:48:42
> To: <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: Reading fields from a Text line
>
> Thanks for the response Harsh n Sri. Actually, I was trying to prepare
> a template for my application using which I was trying to read one
> line at a time, extract the first field from it and emit that
> extracted value from the mapper. I have these few lines of code for
> that :
>
> public static class XPTMapper extends Mapper<IntWritable, Text,
> LongWritable, Text>{
>
>                 public void map(LongWritable key, Text value, Context context)
> throws IOException, InterruptedException{
>
>                         Text word = new Text();
>                         String line = value.toString();
>                         if (!line.startsWith("TT")){
>                                 context.setStatus("INVALID LINE..SKIPPING........");
>                         }else{
>                                 String stdid = line.substring(0, 7);
>                                 word.set(stdid);
>                                 context.write(key, word);
>                         }
>                 }
>
> But the output file contains all the rows of the input file including
> the lines which I was expecting to get skipped. Also, I was expecting
> only the fields I am emitting but the file contains entire lines.
> Could you guys please point out the the mistake I might have made.
> (Pardon my ignorance, as I am not very good at MapReduce).Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
> On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
> <[EMAIL PROTECTED]> wrote:
>> Wouldn't it be better if you could skip those unwanted lines
>> upfront(preprocess) and have a file which is ready to be processed by the MR
>> system? In any case, more details are needed.
>>
>>
>> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>> Mohammad,
>>>
>>> > But it seems I am not doing  things in correct way. Need some guidance.
>>>
>>> What do you mean by the above? What is your written code exactly
>>> expected to do and what is it not doing? Perhaps since you ask for a
>>> code question here, can you share it with us (pastebin or gists,
>>> etc.)?
>>>
>>> For skipping 8 lines, if you are using splits, you need to detect
>>> within the mapper or your record reader if the map task filesplit has
>>> an offset of 0 and skip 8 line reads if so (Cause its the first split
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB