Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Wrapping around BitSet with the Writable interface


+
Jim Twensky 2013-05-12, 18:24
Copy link to this message
-
Re: Wrapping around BitSet with the Writable interface
Jim Twensky 2013-05-13, 15:51
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.

Jim
On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <[EMAIL PROTECTED]>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <[EMAIL PROTECTED]>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>>   private BitSet bs;
>>>>
>>>>   public BitSetWritable() {
>>>>     this.bs = new BitSet();
>>>>   }
>>>>
>>>>   @Override
>>>>   public void write(DataOutput out) throws IOException {
>>>>
>>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>>     oos.writeObject(bs);
>>>>     byte[] bytes = bos.toByteArray();
>>>>     oos.close();
>>>>     out.writeInt(bytes.length);
>>>>     out.write(bytes);
>>>>
>>>>   }
>>>>
>>>>   @Override
>>>>   public void readFields(DataInput in) throws IOException {
>>>>
>>>>     int len = in.readInt();
>>>>     byte[] bytes = new byte[len];
>>>>     in.readFully(bytes);
>>>>
>>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>>     try {
>>>>       bs = (BitSet) ois.readObject();
>>>>     } catch (ClassNotFoundException e) {
>>>>       throw new IOException(e);
>>>>     }
>>>>
>>>>     ois.close();
>>>>   }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>