Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Wrapping around BitSet with the Writable interface


+
Jim Twensky 2013-05-12, 18:24
Copy link to this message
-
Re: Wrapping around BitSet with the Writable interface
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.

Jim
On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <[EMAIL PROTECTED]>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <[EMAIL PROTECTED]>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>>   private BitSet bs;
>>>>
>>>>   public BitSetWritable() {
>>>>     this.bs = new BitSet();
>>>>   }
>>>>
>>>>   @Override
>>>>   public void write(DataOutput out) throws IOException {
>>>>
>>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>>     oos.writeObject(bs);
>>>>     byte[] bytes = bos.toByteArray();
>>>>     oos.close();
>>>>     out.writeInt(bytes.length);
>>>>     out.write(bytes);
>>>>
>>>>   }
>>>>
>>>>   @Override
>>>>   public void readFields(DataInput in) throws IOException {
>>>>
>>>>     int len = in.readInt();
>>>>     byte[] bytes = new byte[len];
>>>>     in.readFully(bytes);
>>>>
>>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>>     try {
>>>>       bs = (BitSet) ois.readObject();
>>>>     } catch (ClassNotFoundException e) {
>>>>       throw new IOException(e);
>>>>     }
>>>>
>>>>     ois.close();
>>>>   }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB