Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Converting types from java HashMap, Long and Array to BytesWritable for RCFileOutputFormat


Copy link to this message
-
Converting types from java HashMap, Long and Array to BytesWritable for RCFileOutputFormat
Hi all,
  I am working on an M/R program to convert Zebra data to Hive RC
format.

The TableInputFormat (Zebra) returns keys and values in the form of
BytesWritable and (Pig) Tuple.

In order to convert it to the RCFileOutputFormat whose key is
"BytesWritable and value is "BytesRefArrayWritable" I need to take in a
Pig Tuple iterate over each of its contents and convert it to
"BytesRefWritable".

The easy part is for Strings, which can be converted to BytesRefWritable
as:

myvalue = new BytesRefArrayWritable(10);
//value is a Pig Tuple and get returns a string
String s = (String)value.get(0);
myvalue.set(0, new BytesRefWritable(s.getBytes("UTF-8")));

How do I do it for java "Long", "HashMap" and "Arrays"
//value is a Pig tuple
Long l = new Long((Long)value.get(1));
myvalue.set(iter, new BytesRefWritable(l.toString().getBytes("UTF-8")));
myvalue.set(1, new BytesRefWritable(l.getBytes("UTF-8")));
HashMap<String, Object> hm = new
HashMap<String,Object>((HashMap)value.get(2));

myvalue.set(iter, new
BytesRefWritable(hm.toString().getBytes("UTF-8")));
Would the toString() method work? If I need to re-read RC format back
through the "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" would
it interpret correctly?

Is there any documentation for the same?

Any suggestions would be beneficial.

Viraj