Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Wrapping around BitSet with the Writable interface


Copy link to this message
-
Wrapping around BitSet with the Writable interface
Jim Twensky 2013-05-12, 18:24
I have large java.util.BitSet objects that I want to bitwise-OR using a
MapReduce job. I decided to wrap around each object using the Writable
interface. Right now I convert each BitSet to a byte array and serialize
the byte array on disk.

Converting them to byte arrays is a bit inefficient but I could not find a
work around to write them directly to the DataOutput. Is there a way to
skip this and serialize the object directly? Here is what my current
implementation looks like:

public class BitSetWritable implements Writable {

  private BitSet bs;

  public BitSetWritable() {
    this.bs = new BitSet();
  }

  @Override
  public void write(DataOutput out) throws IOException {

    ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
    ObjectOutputStream oos = new ObjectOutputStream(bos);
    oos.writeObject(bs);
    byte[] bytes = bos.toByteArray();
    oos.close();
    out.writeInt(bytes.length);
    out.write(bytes);

  }

  @Override
  public void readFields(DataInput in) throws IOException {

    int len = in.readInt();
    byte[] bytes = new byte[len];
    in.readFully(bytes);

    ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
    ObjectInputStream ois = new ObjectInputStream(bis);
    try {
      bs = (BitSet) ois.readObject();
    } catch (ClassNotFoundException e) {
      throw new IOException(e);
    }

    ois.close();
  }

}