|
|
-
Re: TransposeSandy Ryza 2013-03-05, 17:27
Hi,
Essentially what you want to do is group your data points by their position in the column, and have each reduce call construct the data for each row into a row. To have each record that the mapper processes be one of the columns, you can use TextInputFormat with conf.set("textinputformat.record.delimiter", ";"). Your mapper will receive keys as LongWritables specifying the byte index into the input file, and Text as values. The mapper will tokenize the input string. Emiting a map output for each data point in each column, you can then use secondary sort to send the data to the right place in the right order (see http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/). Your composite key would look like (index of data point in column, which is the row index; the LongWritable passed in as the map input key). Each reduce call would get all the points in a single row. You would sort/group by row index, and within a reduce's values, sort by byte index so that entries from earlier columns come before later ones. Does that make sense? Sandy On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <[EMAIL PROTECTED]> wrote: > Hi > > I have data in a file as follows . There are 3 columns separated by > semicolon(;). Each column would have multiple values separated by comma > (,). > > 11,22,33;144,244,344;yny; > > I need output data in below format. It is like transposing values of each > column. > > 11 144 y > 22 244 n > 33 344 y > > Can we write map reduce program to achieve this. Could you help on the > code on how to write. > > > Thanks > |