Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Wrapping around BitSet with the Writable interface


Copy link to this message
-
Re: Wrapping around BitSet with the Writable interface
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.

Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:

<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>

And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html

On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <[EMAIL PROTECTED]> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB