Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Transpose


Copy link to this message
-
Re: Transpose
Hi,

Essentially what you want to do is group your data points by their position
in the column, and have each reduce call construct the data for each row
into a row.  To have each record that the mapper processes be one of the
columns, you can use TextInputFormat with
conf.set("textinputformat.record.delimiter", ";").  Your mapper will
receive keys as LongWritables specifying the byte index into the input
file, and Text as values.  The mapper will tokenize the input string.

Emiting a map output for each data point in each column, you can then use
secondary sort to send the data to the right place in the right order (see
http://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/).
Your composite key would look like (index of data point in column, which is
the row index; the LongWritable passed in as the map input key).  Each
reduce call would get all the points in a single row. You would sort/group
by row index, and within a reduce's values, sort by byte index so that
entries from earlier columns come before later ones.

Does that make sense?

Sandy

On Tue, Mar 5, 2013 at 7:11 AM, Mix Nin <[EMAIL PROTECTED]> wrote:

> Hi
>
> I have data in a file as follows . There are 3 columns separated by
> semicolon(;). Each column would have multiple values separated by comma
> (,).
>
> 11,22,33;144,244,344;yny;
>
> I need output data in below format. It is like transposing  values of each
> column.
>
> 11 144 y
> 22 244 n
> 33 344 y
>
> Can we write map reduce program to achieve this. Could you help on the
> code on how to write.
>
>
> Thanks
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB