Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Wrapping around BitSet with the Writable interface


Copy link to this message
-
Re: Wrapping around BitSet with the Writable interface
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.

Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:

<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>

And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html

On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <[EMAIL PROTECTED]> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }

--
Harsh J