Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> RE: Hadoop 101


Copy link to this message
-
RE: Hadoop 101
You use TextInputFormat, you'll get the following key<LongWritable>,
value<Text> pairs in your mapper:

file_position, your_input

Example:
0,
"0\t[356:0.3481597,359:0.3481597,358:0.3481597,361:0.3481597,360:0.3481597]"
100,
"8\t[356:0.34786037,359:0.34786037,358:0.34786037,361:0.34786037,360:0.34786
037]"
200,
"25\t[284:0.34821576,286:0.34821576,287:0.34821576,288:0.34821576,289:0.3482
1576]"

Then just parse it out in your mapper.
-----Original Message-----
From: Pat Ferrel [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, December 12, 2012 7:50 AM
To: [EMAIL PROTECTED]
Subject: Hadoop 101

Stupid question for the day.

I have a file created by a mahout job of the form:

0
[356:0.3481597,359:0.3481597,358:0.3481597,361:0.3481597,360:0.3481597]
8
[356:0.34786037,359:0.34786037,358:0.34786037,361:0.34786037,360:0.34786037]
25
[284:0.34821576,286:0.34821576,287:0.34821576,288:0.34821576,289:0.34821576]
28
[452:0.34802154,454:0.34802154,453:0.34802154,456:0.34802154,455:0.34802154]
.

If this were a SequenceFile I could read it and be merrily on my way but
it's a text file. The classes written are key, value pairs <LongWritable,
VectorWritable> but the file is tab delimited text.

I was hoping to do something like:

SequenceFile.Reader reader = new SequenceFile.Reader(fs, inputFile, conf);
Writable userId = new LongWritable(); VectorWritable recommendations = new
VectorWritable(); while (reader.next(userId, recommendations)) {
//do something with each pair
}

But alas Google fails me. How do you read in key, values pairs from text
files outside of a map or reduce?
+
Chris Embree 2012-12-13, 05:11