Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Wrapping around BitSet with the Writable interface


Copy link to this message
-
Re: Wrapping around BitSet with the Writable interface
Bertrand Dechoux 2013-05-12, 20:40
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.

Regards

Bertrand

Note to myself : I have to remember this one.
On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <[EMAIL PROTECTED]>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>>   private BitSet bs;
>>>
>>>   public BitSetWritable() {
>>>     this.bs = new BitSet();
>>>   }
>>>
>>>   @Override
>>>   public void write(DataOutput out) throws IOException {
>>>
>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>     oos.writeObject(bs);
>>>     byte[] bytes = bos.toByteArray();
>>>     oos.close();
>>>     out.writeInt(bytes.length);
>>>     out.write(bytes);
>>>
>>>   }
>>>
>>>   @Override
>>>   public void readFields(DataInput in) throws IOException {
>>>
>>>     int len = in.readInt();
>>>     byte[] bytes = new byte[len];
>>>     in.readFully(bytes);
>>>
>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>     try {
>>>       bs = (BitSet) ois.readObject();
>>>     } catch (ClassNotFoundException e) {
>>>       throw new IOException(e);
>>>     }
>>>
>>>     ois.close();
>>>   }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
--
Bertrand Dechoux