Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Transpose


+
Mix Nin 2013-03-05, 15:11
Copy link to this message
-
Re: Transpose
Sandy Ryza 2013-03-05, 17:27
Hi,

Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row.  To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
conf.set("textinputformat.record.delimiter", ";").  Your mapper will
receive keys as LongWritables specifying the byte index into the input
file, and Text as values.  The mapper will tokenize the input string.

Emiting a map output for each data point in each column, you can then use
secondary sort to send the data to the right place in the right order (see
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/).
Your composite key would look like (index of data point in column, which is
the row index; the LongWritable passed in as the map input key).  Each
reduce call would get all the points in a single row. You would sort/group
by row index, and within a reduce's values, sort by byte index so that
entries from earlier columns come before later ones.

Does that make sense?

Sandy

On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <[EMAIL PROTECTED]> wrote:

> Hi
>
> I have data in a file as follows . There are 3 columns separated by
> semicolon(;). Each column would have multiple values separated by comma
> (,).
>
> 11,22,33;144,244,344;yny;
>
> I need output data in below format. It is like transposing  values of each
> column.
>
> 11 144 y
> 22 244 n
> 33 344 y
>
> Can we write map reduce program to achieve this. Could you help on the
> code on how to write.
>
>
> Thanks
>
+
Michel Segel 2013-03-05, 15:18