Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Reading fields from a Text line


Copy link to this message
-
Re: Reading fields from a Text line
Thank you everyone. Here is the code from the driver :

Configuration conf = new Configuration();
conf.addResource("/home/cluster/hadoop-1.0.3/conf/core-site.xml");
conf.addResource("/home/cluster/hadoop-1.0.3/conf/hdfs-site.xml");
Job job = new Job(conf, "XPTReader");
job.setJarByClass(XPTReader.class);
job.setMapperClass(XPTMapper.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
Path inPath = new Path("/mapin/TX.xpt");
FileInputFormat.addInputPath(job, inPath);
FileOutputFormat.setOutputPath(job, new
Path("/mapout/"+inPath.toString().split("/")[4]+java.util.Random.class.newInstance().nextInt()));
System.exit(job.waitForCompletion(true) ? 0 : 1);

Bejoy : I have observed one strange thing. When I am using
IntWritable, the output file contains the entire content of the input
file, but if I am using LongWritable, the output file is empty.

Sri, Code is working outside MR.

Regards,
    Mohammad Tariq
On Thu, Aug 2, 2012 at 4:38 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi Tariq
>
> I assume the mapper being used is IdentityMapper instead of XPTMapper class. Can you share your main class?
>
> If you are using TextInputFormat an reading from a file in hdfs, it should have LongWritable Keys as input and your code has IntWritable as the input key type. Have a check on that as well.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <[EMAIL PROTECTED]>
> Date: Thu, 2 Aug 2012 15:48:42
> To: <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: Re: Reading fields from a Text line
>
> Thanks for the response Harsh n Sri. Actually, I was trying to prepare
> a template for my application using which I was trying to read one
> line at a time, extract the first field from it and emit that
> extracted value from the mapper. I have these few lines of code for
> that :
>
> public static class XPTMapper extends Mapper<IntWritable, Text,
> LongWritable, Text>{
>
>                 public void map(LongWritable key, Text value, Context context)
> throws IOException, InterruptedException{
>
>                         Text word = new Text();
>                         String line = value.toString();
>                         if (!line.startsWith("TT")){
>                                 context.setStatus("INVALID LINE..SKIPPING........");
>                         }else{
>                                 String stdid = line.substring(0, 7);
>                                 word.set(stdid);
>                                 context.write(key, word);
>                         }
>                 }
>
> But the output file contains all the rows of the input file including
> the lines which I was expecting to get skipped. Also, I was expecting
> only the fields I am emitting but the file contains entire lines.
> Could you guys please point out the the mistake I might have made.
> (Pardon my ignorance, as I am not very good at MapReduce).Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
> On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
> <[EMAIL PROTECTED]> wrote:
>> Wouldn't it be better if you could skip those unwanted lines
>> upfront(preprocess) and have a file which is ready to be processed by the MR
>> system? In any case, more details are needed.
>>
>>
>> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>> Mohammad,
>>>
>>> > But it seems I am not doing  things in correct way. Need some guidance.
>>>
>>> What do you mean by the above? What is your written code exactly
>>> expected to do and what is it not doing? Perhaps since you ask for a
>>> code question here, can you share it with us (pastebin or gists,
>>> etc.)?
>>>
>>> For skipping 8 lines, if you are using splits, you need to detect
>>> within the mapper or your record reader if the map task filesplit has
>>> an offset of 0 and skip 8 line reads if so (Cause its the first split