Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Reading fields from a Text line


Copy link to this message
-
Re: Reading fields from a Text line
Thanks for the response Harsh n Sri. Actually, I was trying to prepare
a template for my application using which I was trying to read one
line at a time, extract the first field from it and emit that
extracted value from the mapper. I have these few lines of code for
that :

public static class XPTMapper extends Mapper<IntWritable, Text,
LongWritable, Text>{

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{

Text word = new Text();
String line = value.toString();
if (!line.startsWith("TT")){
context.setStatus("INVALID LINE..SKIPPING........");
}else{
String stdid = line.substring(0, 7);
word.set(stdid);
context.write(key, word);
}
}

But the output file contains all the rows of the input file including
the lines which I was expecting to get skipped. Also, I was expecting
only the fields I am emitting but the file contains entire lines.
Could you guys please point out the the mistake I might have made.
(Pardon my ignorance, as I am not very good at MapReduce).Many thanks.

Regards,
    Mohammad Tariq
On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
<[EMAIL PROTECTED]> wrote:
> Wouldn't it be better if you could skip those unwanted lines
> upfront(preprocess) and have a file which is ready to be processed by the MR
> system? In any case, more details are needed.
>
>
> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Mohammad,
>>
>> > But it seems I am not doing  things in correct way. Need some guidance.
>>
>> What do you mean by the above? What is your written code exactly
>> expected to do and what is it not doing? Perhaps since you ask for a
>> code question here, can you share it with us (pastebin or gists,
>> etc.)?
>>
>> For skipping 8 lines, if you are using splits, you need to detect
>> within the mapper or your record reader if the map task filesplit has
>> an offset of 0 and skip 8 line reads if so (Cause its the first split
>> of some file).
>>
>> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>> > Hello list,
>> >
>> >        I have a flat file in which data is stored as lines of 107
>> > bytes each. I need to skip the first 8 lines(as they don't contain any
>> > valuable info). Thereafter, I have to read each line and extract the
>> > information from them, but not the line as a whole. Each line is
>> > composed of several fields without any delimiter between them. For
>> > example, the first field is of 8 bytes, second of 2 bytes and so on. I
>> > was trying to reach each line as a Text value, convert it into string
>> > and using String.subring() method to extract the value of each field.
>> > But it seems I am not doing  things in correct way. Need some
>> > guidance. Many thanks.
>> >
>> > Regards,
>> >     Mohammad Tariq
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> It's just about how deep your longing is!
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB