|
Mohammad Tariq
2012-08-01, 20:24
Harsh J
2012-08-02, 02:53
Sriram Ramachandrasekaran...
2012-08-02, 05:28
Mohammad Tariq
2012-08-02, 10:18
Sriram Ramachandrasekaran...
2012-08-02, 10:48
Alok Kumar
2012-08-02, 10:52
Bejoy KS
2012-08-02, 11:08
Mohammad Tariq
2012-08-02, 12:39
Bejoy Ks
2012-08-02, 15:27
Bejoy Ks
2012-08-02, 23:11
Harsh J
2012-08-03, 07:32
Bejoy KS
2012-08-03, 07:37
Harsh J
2012-08-03, 10:58
Bejoy KS
2012-08-03, 15:23
|
-
Reading fields from a Text lineMohammad Tariq 2012-08-01, 20:24
Hello list,
I have a flat file in which data is stored as lines of 107 bytes each. I need to skip the first 8 lines(as they don't contain any valuable info). Thereafter, I have to read each line and extract the information from them, but not the line as a whole. Each line is composed of several fields without any delimiter between them. For example, the first field is of 8 bytes, second of 2 bytes and so on. I was trying to reach each line as a Text value, convert it into string and using String.subring() method to extract the value of each field. But it seems I am not doing things in correct way. Need some guidance. Many thanks. Regards, Mohammad Tariq
-
Re: Reading fields from a Text lineHarsh J 2012-08-02, 02:53
Mohammad,
> But it seems I am not doing things in correct way. Need some guidance. What do you mean by the above? What is your written code exactly expected to do and what is it not doing? Perhaps since you ask for a code question here, can you share it with us (pastebin or gists, etc.)? For skipping 8 lines, if you are using splits, you need to detect within the mapper or your record reader if the map task filesplit has an offset of 0 and skip 8 line reads if so (Cause its the first split of some file). On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Hello list, > > I have a flat file in which data is stored as lines of 107 > bytes each. I need to skip the first 8 lines(as they don't contain any > valuable info). Thereafter, I have to read each line and extract the > information from them, but not the line as a whole. Each line is > composed of several fields without any delimiter between them. For > example, the first field is of 8 bytes, second of 2 bytes and so on. I > was trying to reach each line as a Text value, convert it into string > and using String.subring() method to extract the value of each field. > But it seems I am not doing things in correct way. Need some > guidance. Many thanks. > > Regards, > Mohammad Tariq -- Harsh J
-
Re: Reading fields from a Text lineSriram Ramachandrasekaran... 2012-08-02, 05:28
Wouldn't it be better if you could skip those unwanted lines
upfront(preprocess) and have a file which is ready to be processed by the MR system? In any case, more details are needed. On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Mohammad, > > > But it seems I am not doing things in correct way. Need some guidance. > > What do you mean by the above? What is your written code exactly > expected to do and what is it not doing? Perhaps since you ask for a > code question here, can you share it with us (pastebin or gists, > etc.)? > > For skipping 8 lines, if you are using splits, you need to detect > within the mapper or your record reader if the map task filesplit has > an offset of 0 and skip 8 line reads if so (Cause its the first split > of some file). > > On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > > Hello list, > > > > I have a flat file in which data is stored as lines of 107 > > bytes each. I need to skip the first 8 lines(as they don't contain any > > valuable info). Thereafter, I have to read each line and extract the > > information from them, but not the line as a whole. Each line is > > composed of several fields without any delimiter between them. For > > example, the first field is of 8 bytes, second of 2 bytes and so on. I > > was trying to reach each line as a Text value, convert it into string > > and using String.subring() method to extract the value of each field. > > But it seems I am not doing things in correct way. Need some > > guidance. Many thanks. > > > > Regards, > > Mohammad Tariq > > > > -- > Harsh J > -- It's just about how deep your longing is!
-
Re: Reading fields from a Text lineMohammad Tariq 2012-08-02, 10:18
Thanks for the response Harsh n Sri. Actually, I was trying to prepare
a template for my application using which I was trying to read one line at a time, extract the first field from it and emit that extracted value from the mapper. I have these few lines of code for that : public static class XPTMapper extends Mapper<IntWritable, Text, LongWritable, Text>{ public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ Text word = new Text(); String line = value.toString(); if (!line.startsWith("TT")){ context.setStatus("INVALID LINE..SKIPPING........"); }else{ String stdid = line.substring(0, 7); word.set(stdid); context.write(key, word); } } But the output file contains all the rows of the input file including the lines which I was expecting to get skipped. Also, I was expecting only the fields I am emitting but the file contains entire lines. Could you guys please point out the the mistake I might have made. (Pardon my ignorance, as I am not very good at MapReduce).Many thanks. Regards, Mohammad Tariq On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran <[EMAIL PROTECTED]> wrote: > Wouldn't it be better if you could skip those unwanted lines > upfront(preprocess) and have a file which is ready to be processed by the MR > system? In any case, more details are needed. > > > On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Mohammad, >> >> > But it seems I am not doing things in correct way. Need some guidance. >> >> What do you mean by the above? What is your written code exactly >> expected to do and what is it not doing? Perhaps since you ask for a >> code question here, can you share it with us (pastebin or gists, >> etc.)? >> >> For skipping 8 lines, if you are using splits, you need to detect >> within the mapper or your record reader if the map task filesplit has >> an offset of 0 and skip 8 line reads if so (Cause its the first split >> of some file). >> >> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >> > Hello list, >> > >> > I have a flat file in which data is stored as lines of 107 >> > bytes each. I need to skip the first 8 lines(as they don't contain any >> > valuable info). Thereafter, I have to read each line and extract the >> > information from them, but not the line as a whole. Each line is >> > composed of several fields without any delimiter between them. For >> > example, the first field is of 8 bytes, second of 2 bytes and so on. I >> > was trying to reach each line as a Text value, convert it into string >> > and using String.subring() method to extract the value of each field. >> > But it seems I am not doing things in correct way. Need some >> > guidance. Many thanks. >> > >> > Regards, >> > Mohammad Tariq >> >> >> >> -- >> Harsh J > > > > > -- > It's just about how deep your longing is! >
-
Re: Reading fields from a Text lineSriram Ramachandrasekaran... 2012-08-02, 10:48
I would suggest to try and see if your code works outside the MR. I do not
see why MR would bring in the whole word instead of your substring(0,7) unless the code does something else. Did you try running your map(k,v) emitting code separately and see what it does? On Thu, Aug 2, 2012 at 3:48 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Thanks for the response Harsh n Sri. Actually, I was trying to prepare > a template for my application using which I was trying to read one > line at a time, extract the first field from it and emit that > extracted value from the mapper. I have these few lines of code for > that : > > public static class XPTMapper extends Mapper<IntWritable, Text, > LongWritable, Text>{ > > public void map(LongWritable key, Text value, Context > context) > throws IOException, InterruptedException{ > > Text word = new Text(); > String line = value.toString(); > if (!line.startsWith("TT")){ > context.setStatus("INVALID > LINE..SKIPPING........"); > }else{ > String stdid = line.substring(0, 7); > word.set(stdid); > context.write(key, word); > } > } > > But the output file contains all the rows of the input file including > the lines which I was expecting to get skipped. Also, I was expecting > only the fields I am emitting but the file contains entire lines. > Could you guys please point out the the mistake I might have made. > (Pardon my ignorance, as I am not very good at MapReduce).Many thanks. > > Regards, > Mohammad Tariq > > > On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran > <[EMAIL PROTECTED]> wrote: > > Wouldn't it be better if you could skip those unwanted lines > > upfront(preprocess) and have a file which is ready to be processed by > the MR > > system? In any case, more details are needed. > > > > > > On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >> Mohammad, > >> > >> > But it seems I am not doing things in correct way. Need some > guidance. > >> > >> What do you mean by the above? What is your written code exactly > >> expected to do and what is it not doing? Perhaps since you ask for a > >> code question here, can you share it with us (pastebin or gists, > >> etc.)? > >> > >> For skipping 8 lines, if you are using splits, you need to detect > >> within the mapper or your record reader if the map task filesplit has > >> an offset of 0 and skip 8 line reads if so (Cause its the first split > >> of some file). > >> > >> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <[EMAIL PROTECTED]> > wrote: > >> > Hello list, > >> > > >> > I have a flat file in which data is stored as lines of 107 > >> > bytes each. I need to skip the first 8 lines(as they don't contain any > >> > valuable info). Thereafter, I have to read each line and extract the > >> > information from them, but not the line as a whole. Each line is > >> > composed of several fields without any delimiter between them. For > >> > example, the first field is of 8 bytes, second of 2 bytes and so on. I > >> > was trying to reach each line as a Text value, convert it into string > >> > and using String.subring() method to extract the value of each field. > >> > But it seems I am not doing things in correct way. Need some > >> > guidance. Many thanks. > >> > > >> > Regards, > >> > Mohammad Tariq > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > > > -- > > It's just about how deep your longing is! > > > -- It's just about how deep your longing is!
-
Re: Reading fields from a Text lineAlok Kumar 2012-08-02, 10:52
Hi Tariq,
Is your file splittable? If it's not, Mapper will process entire file in one go! http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#isSplitable%28org.apache.hadoop.mapreduce.JobContext,%20org.apache.hadoop.fs.Path%29 How many mappers being created? See if that helps. Regards, Alok On Thu, Aug 2, 2012 at 3:48 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > Thanks for the response Harsh n Sri. Actually, I was trying to prepare > a template for my application using which I was trying to read one > line at a time, extract the first field from it and emit that > extracted value from the mapper. I have these few lines of code for > that : > > public static class XPTMapper extends Mapper<IntWritable, Text, > LongWritable, Text>{ > > public void map(LongWritable key, Text value, Context context) > throws IOException, InterruptedException{ > > Text word = new Text(); > String line = value.toString(); > if (!line.startsWith("TT")){ > context.setStatus("INVALID LINE..SKIPPING........"); > }else{ > String stdid = line.substring(0, 7); > word.set(stdid); > context.write(key, word); > } > } > > But the output file contains all the rows of the input file including > the lines which I was expecting to get skipped. Also, I was expecting > only the fields I am emitting but the file contains entire lines. > Could you guys please point out the the mistake I might have made. > (Pardon my ignorance, as I am not very good at MapReduce).Many thanks. > > Regards, > Mohammad Tariq > > > On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran > <[EMAIL PROTECTED]> wrote: >> Wouldn't it be better if you could skip those unwanted lines >> upfront(preprocess) and have a file which is ready to be processed by the MR >> system? In any case, more details are needed. >> >> >> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >>> Mohammad, >>> >>> > But it seems I am not doing things in correct way. Need some guidance. >>> >>> What do you mean by the above? What is your written code exactly >>> expected to do and what is it not doing? Perhaps since you ask for a >>> code question here, can you share it with us (pastebin or gists, >>> etc.)? >>> >>> For skipping 8 lines, if you are using splits, you need to detect >>> within the mapper or your record reader if the map task filesplit has >>> an offset of 0 and skip 8 line reads if so (Cause its the first split >>> of some file). >>> >>> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >>> > Hello list, >>> > >>> > I have a flat file in which data is stored as lines of 107 >>> > bytes each. I need to skip the first 8 lines(as they don't contain any >>> > valuable info). Thereafter, I have to read each line and extract the >>> > information from them, but not the line as a whole. Each line is >>> > composed of several fields without any delimiter between them. For >>> > example, the first field is of 8 bytes, second of 2 bytes and so on. I >>> > was trying to reach each line as a Text value, convert it into string >>> > and using String.subring() method to extract the value of each field. >>> > But it seems I am not doing things in correct way. Need some >>> > guidance. Many thanks. >>> > >>> > Regards, >>> > Mohammad Tariq >>> >>> >>> >>> -- >>> Harsh J >> >> >> >> >> -- >> It's just about how deep your longing is! >> -- Alok Kumar
-
Re: Reading fields from a Text lineBejoy KS 2012-08-02, 11:08
Hi Tariq
I assume the mapper being used is IdentityMapper instead of XPTMapper class. Can you share your main class? If you are using TextInputFormat an reading from a file in hdfs, it should have LongWritable Keys as input and your code has IntWritable as the input key type. Have a check on that as well. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Mohammad Tariq <[EMAIL PROTECTED]> Date: Thu, 2 Aug 2012 15:48:42 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: Reading fields from a Text line Thanks for the response Harsh n Sri. Actually, I was trying to prepare a template for my application using which I was trying to read one line at a time, extract the first field from it and emit that extracted value from the mapper. I have these few lines of code for that : public static class XPTMapper extends Mapper<IntWritable, Text, LongWritable, Text>{ public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ Text word = new Text(); String line = value.toString(); if (!line.startsWith("TT")){ context.setStatus("INVALID LINE..SKIPPING........"); }else{ String stdid = line.substring(0, 7); word.set(stdid); context.write(key, word); } } But the output file contains all the rows of the input file including the lines which I was expecting to get skipped. Also, I was expecting only the fields I am emitting but the file contains entire lines. Could you guys please point out the the mistake I might have made. (Pardon my ignorance, as I am not very good at MapReduce).Many thanks. Regards, Mohammad Tariq On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran <[EMAIL PROTECTED]> wrote: > Wouldn't it be better if you could skip those unwanted lines > upfront(preprocess) and have a file which is ready to be processed by the MR > system? In any case, more details are needed. > > > On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Mohammad, >> >> > But it seems I am not doing things in correct way. Need some guidance. >> >> What do you mean by the above? What is your written code exactly >> expected to do and what is it not doing? Perhaps since you ask for a >> code question here, can you share it with us (pastebin or gists, >> etc.)? >> >> For skipping 8 lines, if you are using splits, you need to detect >> within the mapper or your record reader if the map task filesplit has >> an offset of 0 and skip 8 line reads if so (Cause its the first split >> of some file). >> >> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >> > Hello list, >> > >> > I have a flat file in which data is stored as lines of 107 >> > bytes each. I need to skip the first 8 lines(as they don't contain any >> > valuable info). Thereafter, I have to read each line and extract the >> > information from them, but not the line as a whole. Each line is >> > composed of several fields without any delimiter between them. For >> > example, the first field is of 8 bytes, second of 2 bytes and so on. I >> > was trying to reach each line as a Text value, convert it into string >> > and using String.subring() method to extract the value of each field. >> > But it seems I am not doing things in correct way. Need some >> > guidance. Many thanks. >> > >> > Regards, >> > Mohammad Tariq >> >> >> >> -- >> Harsh J > > > > > -- > It's just about how deep your longing is! >
-
Re: Reading fields from a Text lineMohammad Tariq 2012-08-02, 12:39
Thank you everyone. Here is the code from the driver :
Configuration conf = new Configuration(); conf.addResource("/home/cluster/hadoop-1.0.3/conf/core-site.xml"); conf.addResource("/home/cluster/hadoop-1.0.3/conf/hdfs-site.xml"); Job job = new Job(conf, "XPTReader"); job.setJarByClass(XPTReader.class); job.setMapperClass(XPTMapper.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class); job.setInputFormatClass(TextInputFormat.class); Path inPath = new Path("/mapin/TX.xpt"); FileInputFormat.addInputPath(job, inPath); FileOutputFormat.setOutputPath(job, new Path("/mapout/"+inPath.toString().split("/")[4]+java.util.Random.class.newInstance().nextInt())); System.exit(job.waitForCompletion(true) ? 0 : 1); Bejoy : I have observed one strange thing. When I am using IntWritable, the output file contains the entire content of the input file, but if I am using LongWritable, the output file is empty. Sri, Code is working outside MR. Regards, Mohammad Tariq On Thu, Aug 2, 2012 at 4:38 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > Hi Tariq > > I assume the mapper being used is IdentityMapper instead of XPTMapper class. Can you share your main class? > > If you are using TextInputFormat an reading from a file in hdfs, it should have LongWritable Keys as input and your code has IntWritable as the input key type. Have a check on that as well. > > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -----Original Message----- > From: Mohammad Tariq <[EMAIL PROTECTED]> > Date: Thu, 2 Aug 2012 15:48:42 > To: <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > Subject: Re: Reading fields from a Text line > > Thanks for the response Harsh n Sri. Actually, I was trying to prepare > a template for my application using which I was trying to read one > line at a time, extract the first field from it and emit that > extracted value from the mapper. I have these few lines of code for > that : > > public static class XPTMapper extends Mapper<IntWritable, Text, > LongWritable, Text>{ > > public void map(LongWritable key, Text value, Context context) > throws IOException, InterruptedException{ > > Text word = new Text(); > String line = value.toString(); > if (!line.startsWith("TT")){ > context.setStatus("INVALID LINE..SKIPPING........"); > }else{ > String stdid = line.substring(0, 7); > word.set(stdid); > context.write(key, word); > } > } > > But the output file contains all the rows of the input file including > the lines which I was expecting to get skipped. Also, I was expecting > only the fields I am emitting but the file contains entire lines. > Could you guys please point out the the mistake I might have made. > (Pardon my ignorance, as I am not very good at MapReduce).Many thanks. > > Regards, > Mohammad Tariq > > > On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran > <[EMAIL PROTECTED]> wrote: >> Wouldn't it be better if you could skip those unwanted lines >> upfront(preprocess) and have a file which is ready to be processed by the MR >> system? In any case, more details are needed. >> >> >> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >>> Mohammad, >>> >>> > But it seems I am not doing things in correct way. Need some guidance. >>> >>> What do you mean by the above? What is your written code exactly >>> expected to do and what is it not doing? Perhaps since you ask for a >>> code question here, can you share it with us (pastebin or gists, >>> etc.)? >>> >>> For skipping 8 lines, if you are using splits, you need to detect >>> within the mapper or your record reader if the map task filesplit has >>> an offset of 0 and skip 8 line reads if so (Cause its the first split
-
Re: Reading fields from a Text lineBejoy Ks 2012-08-02, 15:27
Hi Tariq
Again I strongly suspect the IdentityMapper in play here. The reasoning why I suspect so is When you have the whole data in output file it should be the Identity Mapper. Due to the mismatch in input key type at class level and method level the framework is falling back to IdentityMapper. I have noticed this fall back while using new mapreduce API. public static class XPTMapper extends Mapper<*IntWritable*, Text, LongWritable, Text>{ public void map(*LongWritable* key, Text value, Context context) throws IOException, InterruptedException{ When you change the Input Key type to LongWritable in class level, it is your custom mapper(XPTMapper) being called. Because of some exceptional cases it is just going into if condition where you are not writing anything out of Mapper and hence an empty output file. public static class XPTMapper extends Mapper<*LongWritable*, Text, LongWritable, Text>{ public void map(*LongWritable* key, Text value, Context context) throws IOException, InterruptedException{ To cross check this, try enabling some logging on your code to see exactly what is happening. By the way are you getting the output of this line in your logs when you change the input key type to LongWritable? context.setStatus("INVALID LINE..SKIPPING........"); If so that confirms my assumption. :) Try adding more logs to trace the flow and see what is going wrong. Or you can use MRunit to unit test your code as the first step. Hope it helps!.. Regards Bejoy KS
-
Re: Reading fields from a Text lineBejoy Ks 2012-08-02, 23:11
Hi Tariq
On further analysis I noticed a odd behavior in this context. If we use the default InputFormat (TextInputFormat) but specify the Key type in mapper as IntWritable instead of Long Writable. The framework is supposed throw a class cast exception.Such an exception is thrown only if the key types at class level and method level are the same (IntWritable) in Mapper. But if we provide the Input key type as IntWritable on the class level but LongWritable on the method level (map method), instead of throwing a compile time error, the code compliles fine . In addition to it on execution the framework triggers Identity Mapper instead of the custom mapper provided with the configuration. This seems like a bug to me . Filed a jira to track this issue https://issues.apache.org/jira/browse/MAPREDUCE-4507 Regards Bejoy KS
-
Re: Reading fields from a Text lineHarsh J 2012-08-03, 07:32
That is not really a bug. Only if you use @Override will you be really
asserting that you've overriden the right method (since new API uses inheritance instead of interfaces). Without that kinda check, its easy to make mistakes and add in methods that won't get considered by the framework (and hence the default IdentityMapper comes into play). Always use @Override annotations when inheriting and overriding methods. On Fri, Aug 3, 2012 at 4:41 AM, Bejoy Ks <[EMAIL PROTECTED]> wrote: > Hi Tariq > > On further analysis I noticed a odd behavior in this context. > > If we use the default InputFormat (TextInputFormat) but specify the Key type > in mapper as IntWritable instead of Long Writable. The framework is supposed > throw a class cast exception.Such an exception is thrown only if the key > types at class level and method level are the same (IntWritable) in Mapper. > But if we provide the Input key type as IntWritable on the class level but > LongWritable on the method level (map method), instead of throwing a compile > time error, the code compliles fine . In addition to it on execution the > framework triggers Identity Mapper instead of the custom mapper provided > with the configuration. > > This seems like a bug to me . Filed a jira to track this issue > https://issues.apache.org/jira/browse/MAPREDUCE-4507 > > > Regards > Bejoy KS -- Harsh J
-
Re: Reading fields from a Text lineBejoy KS 2012-08-03, 07:37
That is a good pointer Harsh.
Thanks a lot. But if IdentityMapper is being used shouldn't the job.xml reflect that? But Job.xml always shows mapper as our CustomMapper. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Harsh J <[EMAIL PROTECTED]> Date: Fri, 3 Aug 2012 13:02:32 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Cc: Mohammad Tariq<[EMAIL PROTECTED]> Subject: Re: Reading fields from a Text line That is not really a bug. Only if you use @Override will you be really asserting that you've overriden the right method (since new API uses inheritance instead of interfaces). Without that kinda check, its easy to make mistakes and add in methods that won't get considered by the framework (and hence the default IdentityMapper comes into play). Always use @Override annotations when inheriting and overriding methods. On Fri, Aug 3, 2012 at 4:41 AM, Bejoy Ks <[EMAIL PROTECTED]> wrote: > Hi Tariq > > On further analysis I noticed a odd behavior in this context. > > If we use the default InputFormat (TextInputFormat) but specify the Key type > in mapper as IntWritable instead of Long Writable. The framework is supposed > throw a class cast exception.Such an exception is thrown only if the key > types at class level and method level are the same (IntWritable) in Mapper. > But if we provide the Input key type as IntWritable on the class level but > LongWritable on the method level (map method), instead of throwing a compile > time error, the code compliles fine . In addition to it on execution the > framework triggers Identity Mapper instead of the custom mapper provided > with the configuration. > > This seems like a bug to me . Filed a jira to track this issue > https://issues.apache.org/jira/browse/MAPREDUCE-4507 > > > Regards > Bejoy KS -- Harsh J
-
Re: Reading fields from a Text lineHarsh J 2012-08-03, 10:58
Bejoy,
In the new API, the default map() function, if not properly overridden, is the identity map function. There is no IdentityMapper class in the new API, the Mapper class itself is identity by default. On Fri, Aug 3, 2012 at 1:07 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > That is a good pointer Harsh. > Thanks a lot. > > But if IdentityMapper is being used shouldn't the job.xml reflect that? But Job.xml always shows mapper as our CustomMapper. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -----Original Message----- > From: Harsh J <[EMAIL PROTECTED]> > Date: Fri, 3 Aug 2012 13:02:32 > To: <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > Cc: Mohammad Tariq<[EMAIL PROTECTED]> > Subject: Re: Reading fields from a Text line > > That is not really a bug. Only if you use @Override will you be really > asserting that you've overriden the right method (since new API uses > inheritance instead of interfaces). Without that kinda check, its easy > to make mistakes and add in methods that won't get considered by the > framework (and hence the default IdentityMapper comes into play). > > Always use @Override annotations when inheriting and overriding methods. > > On Fri, Aug 3, 2012 at 4:41 AM, Bejoy Ks <[EMAIL PROTECTED]> wrote: >> Hi Tariq >> >> On further analysis I noticed a odd behavior in this context. >> >> If we use the default InputFormat (TextInputFormat) but specify the Key type >> in mapper as IntWritable instead of Long Writable. The framework is supposed >> throw a class cast exception.Such an exception is thrown only if the key >> types at class level and method level are the same (IntWritable) in Mapper. >> But if we provide the Input key type as IntWritable on the class level but >> LongWritable on the method level (map method), instead of throwing a compile >> time error, the code compliles fine . In addition to it on execution the >> framework triggers Identity Mapper instead of the custom mapper provided >> with the configuration. >> >> This seems like a bug to me . Filed a jira to track this issue >> https://issues.apache.org/jira/browse/MAPREDUCE-4507 >> >> >> Regards >> Bejoy KS > > > > -- > Harsh J -- Harsh J
-
Re: Reading fields from a Text lineBejoy KS 2012-08-03, 15:23
Ok Got it now. That is a good piece of information.
Thank You :) Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Harsh J <[EMAIL PROTECTED]> Date: Fri, 3 Aug 2012 16:28:27 To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Cc: Mohammad Tariq<[EMAIL PROTECTED]> Subject: Re: Reading fields from a Text line Bejoy, In the new API, the default map() function, if not properly overridden, is the identity map function. There is no IdentityMapper class in the new API, the Mapper class itself is identity by default. On Fri, Aug 3, 2012 at 1:07 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > That is a good pointer Harsh. > Thanks a lot. > > But if IdentityMapper is being used shouldn't the job.xml reflect that? But Job.xml always shows mapper as our CustomMapper. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -----Original Message----- > From: Harsh J <[EMAIL PROTECTED]> > Date: Fri, 3 Aug 2012 13:02:32 > To: <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > Cc: Mohammad Tariq<[EMAIL PROTECTED]> > Subject: Re: Reading fields from a Text line > > That is not really a bug. Only if you use @Override will you be really > asserting that you've overriden the right method (since new API uses > inheritance instead of interfaces). Without that kinda check, its easy > to make mistakes and add in methods that won't get considered by the > framework (and hence the default IdentityMapper comes into play). > > Always use @Override annotations when inheriting and overriding methods. > > On Fri, Aug 3, 2012 at 4:41 AM, Bejoy Ks <[EMAIL PROTECTED]> wrote: >> Hi Tariq >> >> On further analysis I noticed a odd behavior in this context. >> >> If we use the default InputFormat (TextInputFormat) but specify the Key type >> in mapper as IntWritable instead of Long Writable. The framework is supposed >> throw a class cast exception.Such an exception is thrown only if the key >> types at class level and method level are the same (IntWritable) in Mapper. >> But if we provide the Input key type as IntWritable on the class level but >> LongWritable on the method level (map method), instead of throwing a compile >> time error, the code compliles fine . In addition to it on execution the >> framework triggers Identity Mapper instead of the custom mapper provided >> with the configuration. >> >> This seems like a bug to me . Filed a jira to track this issue >> https://issues.apache.org/jira/browse/MAPREDUCE-4507 >> >> >> Regards >> Bejoy KS > > > > -- > Harsh J -- Harsh J |