Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> RCFile in java MapReduce


Copy link to this message
-
Re: RCFile in java MapReduce
I have some experiences using RCFile with new MapReduce API from the
project HCatalog ( http://incubator.apache.org/hcatalog/ ).

For the output part,
In your main, you need ...

> job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
>
> RCFileMapReduceOutputFormat.setColumnNumber(job.getConfiguration(),
>> numCols); // numCols is the total number of columns of your output table
>
> RCFileMapReduceOutputFormat.setOutputPath(job, new Path(outputPath));
>
> RCFileMapReduceOutputFormat.setCompressOutput(job, true);
>
> The Map class would look like ...

> public static class Map
>
>     extends Mapper<Object, Text, NullWritable, BytesRefArrayWritable>{
>
>   private byte[] fieldData;
>
>  private int numCols;
>
>  private BytesRefArrayWritable bytes;
>
>   @Override
>
>  protected void setup(Context context) throws IOException,
>> InterruptedException {
>
>  numCols >> context.getConfiguration().getInt("hive.io.rcfile.column.number.conf", 0);
>
>  bytes = new BytesRefArrayWritable(numCols);
>
>  }
>
>   public void map(Object key, Text line, Context context
>
>                 ) throws IOException, InterruptedException {
>
>  bytes.clear();
>
>  String[] cols = line.toString().split("\\|");
>
>  for (int i=0; i<numCols; i++){
>
>          fieldData = cols[i].getBytes("UTF-8");
>
>          BytesRefWritable cu = null;
>
>             cu = new BytesRefWritable(fieldData, 0, fieldData.length);
>
>             bytes.set(i, cu);
>
>         }
>
>  context.write(NullWritable.get(), bytes);
>
>  }
>
>  }
>
> Basically, you need to convert a row to a BytesRefArrayWritable object
(which is bytes in above example).

For the input part, I do not know how to use RCFileMapReduceInputFormat to
write a MapReduce job for a join operation, so I customized a new
InputFormat and RecordReader.
You can find these two class (MultiRCFileMapReduceInputFormat and
MultiRCFileMapReduceRecordReader) from
http://www.cse.ohio-state.edu/~huai/RCFile/ .
In this link, TestPrintTables.java is an example program that you can use
it to convert tables in RCFile format to text. I hope that this example is
self-explaining. If you need to

Hope these can help you.

Thanks,

Yin

On Wed, Dec 14, 2011 at 8:54 AM, Dominik Wiernicki <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Can someone show me how to use RCfile in plain MapReduce job (as Input and
> Output Format)?
> Please.
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB