|
|
-
Re: RCFile in java MapReduceAniket Mokashi 2012-01-10, 06:49
A better way would be to mount a table on top of RCFiles and use
http://incubator.apache.org/hcatalog/docs/r0.2.0/inputoutput.html#HCatInputFormat But, you will have to install and run hcatalog server for it. (Note: By default, hcatalog assumes underlying storage is RCFile, so you do not need to patch any metadata aka do not need create table through hcat). Thanks, Aniket On Mon, Jan 9, 2012 at 7:16 PM, Yin Huai <[EMAIL PROTECTED]> wrote: > I have some experiences using RCFile with new MapReduce API from the > project HCatalog ( http://incubator.apache.org/hcatalog/ ). > > For the output part, > In your main, you need ... > >> job.setOutputFormatClass(RCFileMapReduceOutputFormat.class); >> >> RCFileMapReduceOutputFormat.setColumnNumber(job.getConfiguration(), >>> numCols); // numCols is the total number of columns of your output table >> >> RCFileMapReduceOutputFormat.setOutputPath(job, new Path(outputPath)); >> >> RCFileMapReduceOutputFormat.setCompressOutput(job, true); >> >> The Map class would look like ... > >> public static class Map >> >> extends Mapper<Object, Text, NullWritable, BytesRefArrayWritable>{ >> >> private byte[] fieldData; >> >> private int numCols; >> >> private BytesRefArrayWritable bytes; >> >> @Override >> >> protected void setup(Context context) throws IOException, >>> InterruptedException { >> >> numCols >>> context.getConfiguration().getInt("hive.io.rcfile.column.number.conf", 0); >> >> bytes = new BytesRefArrayWritable(numCols); >> >> } >> >> public void map(Object key, Text line, Context context >> >> ) throws IOException, InterruptedException { >> >> bytes.clear(); >> >> String[] cols = line.toString().split("\\|"); >> >> for (int i=0; i<numCols; i++){ >> >> fieldData = cols[i].getBytes("UTF-8"); >> >> BytesRefWritable cu = null; >> >> cu = new BytesRefWritable(fieldData, 0, fieldData.length); >> >> bytes.set(i, cu); >> >> } >> >> context.write(NullWritable.get(), bytes); >> >> } >> >> } >> >> Basically, you need to convert a row to a BytesRefArrayWritable object > (which is bytes in above example). > > For the input part, I do not know how to use RCFileMapReduceInputFormat to > write a MapReduce job for a join operation, so I customized a new > InputFormat and RecordReader. > You can find these two class (MultiRCFileMapReduceInputFormat and > MultiRCFileMapReduceRecordReader) from > http://www.cse.ohio-state.edu/~huai/RCFile/ . > In this link, TestPrintTables.java is an example program that you can use > it to convert tables in RCFile format to text. I hope that this example is > self-explaining. If you need to > > Hope these can help you. > > Thanks, > > Yin > > On Wed, Dec 14, 2011 at 8:54 AM, Dominik Wiernicki <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> Can someone show me how to use RCfile in plain MapReduce job (as Input >> and Output Format)? >> Please. >> >> >> > -- "...:::Aniket:::... Quetzalco@tl" |