Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - RCFile in java MapReduce


Copy link to this message
-
Re: RCFile in java MapReduce
Aniket Mokashi 2012-01-10, 06:49
A better way would be to mount a table on top of RCFiles and use
http://incubator.apache.org/hcatalog/docs/r0.2.0/inputoutput.html#HCatInputFormat
But, you will have to install and run hcatalog server for it.

(Note: By default, hcatalog assumes underlying storage is RCFile, so you do
not need to patch any metadata aka do not need create table through hcat).

Thanks,
Aniket

On Mon, Jan 9, 2012 at 7:16 PM, Yin Huai <[EMAIL PROTECTED]> wrote:

> I have some experiences using RCFile with new MapReduce API from the
> project HCatalog ( http://incubator.apache.org/hcatalog/ ).
>
> For the output part,
> In your main, you need ...
>
>> job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
>>
>> RCFileMapReduceOutputFormat.setColumnNumber(job.getConfiguration(),
>>> numCols); // numCols is the total number of columns of your output table
>>
>> RCFileMapReduceOutputFormat.setOutputPath(job, new Path(outputPath));
>>
>> RCFileMapReduceOutputFormat.setCompressOutput(job, true);
>>
>> The Map class would look like ...
>
>> public static class Map
>>
>>     extends Mapper<Object, Text, NullWritable, BytesRefArrayWritable>{
>>
>>   private byte[] fieldData;
>>
>>  private int numCols;
>>
>>  private BytesRefArrayWritable bytes;
>>
>>   @Override
>>
>>  protected void setup(Context context) throws IOException,
>>> InterruptedException {
>>
>>  numCols >>> context.getConfiguration().getInt("hive.io.rcfile.column.number.conf", 0);
>>
>>  bytes = new BytesRefArrayWritable(numCols);
>>
>>  }
>>
>>   public void map(Object key, Text line, Context context
>>
>>                 ) throws IOException, InterruptedException {
>>
>>  bytes.clear();
>>
>>  String[] cols = line.toString().split("\\|");
>>
>>  for (int i=0; i<numCols; i++){
>>
>>          fieldData = cols[i].getBytes("UTF-8");
>>
>>          BytesRefWritable cu = null;
>>
>>             cu = new BytesRefWritable(fieldData, 0, fieldData.length);
>>
>>             bytes.set(i, cu);
>>
>>         }
>>
>>  context.write(NullWritable.get(), bytes);
>>
>>  }
>>
>>  }
>>
>> Basically, you need to convert a row to a BytesRefArrayWritable object
> (which is bytes in above example).
>
> For the input part, I do not know how to use RCFileMapReduceInputFormat to
> write a MapReduce job for a join operation, so I customized a new
> InputFormat and RecordReader.
> You can find these two class (MultiRCFileMapReduceInputFormat and
> MultiRCFileMapReduceRecordReader) from
> http://www.cse.ohio-state.edu/~huai/RCFile/ .
>  In this link, TestPrintTables.java is an example program that you can use
> it to convert tables in RCFile format to text. I hope that this example is
> self-explaining. If you need to
>
> Hope these can help you.
>
> Thanks,
>
> Yin
>
> On Wed, Dec 14, 2011 at 8:54 AM, Dominik Wiernicki <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Can someone show me how to use RCfile in plain MapReduce job (as Input
>> and Output Format)?
>> Please.
>>
>>
>>
>
--
"...:::Aniket:::... Quetzalco@tl"