Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> what is the code for WritableComparator.readVInt and WritableUtils.decodeVIntSize doing?


+
Jane Wayne 2012-03-31, 04:38
+
Chris White 2012-03-31, 17:17
Copy link to this message
-
Re: what is the code for WritableComparator.readVInt and WritableUtils.decodeVIntSize doing?
chris,

thanks. i see now.

internally, i use String instead of Text and so I use
WritableUtils.writeString(...) and not Text.write(...). in the latter
method, i see that it calls WritableUtils.writeVInt(...) before
out.write(byte[], start, length).

tom white uses Text internally to represent strings (which is maybe what i
should do), so his example is correct and works. i think i was just
confusing myself.

thanks for the last paragraph too, that really helped a lot.

On Sat, Mar 31, 2012 at 1:17 PM, Chris White <[EMAIL PROTECTED]>wrote:

> A text object is written out as a vint representing the number of bytes and
> then the byte array contents of the text object
>
> Because a vintage can be between 1-5 bytes in length, the decodeVIntSize
> method examines the first byte of the vint to work out how many bytes to
> skip over before the text bytes start.
>
> readVInt then actually reads the vint bytes to get the length of the
> following byte array.
>
> So when you call the compareBytes method you need to pass in where the
> actual bytes start (s1 + vIntLen) and how many bytes to compare (vint)
> On Mar 31, 2012 12:38 AM, "Jane Wayne" <[EMAIL PROTECTED]> wrote:
>
> > in tom white's book, Hadoop, The Definitive Guide, in the second edition,
> > on page 99, he shows how to compare the raw bytes of a key with Text
> > fields. he shows an example like the following.
> >
> > int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
> > int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
> >
> > his explanation is that firstL1 is the length of the first String/Text in
> > b1, and firstL2 is the length of the first String/Text in b2. but i'm
> > unsure of what the code is actually doing.
> >
> > what is WritableUtils.decodeVIntSize(...) doing?
> > what is WritableComparator.readVInt(...) doing?
> > why do we have to add the outputs of these 2 methods to get the length of
> > the String/Text?
> >
> > could someone please explain in plain terms what's happening here? it
> seems
> > WritableComparator.readVInt(...) is already getting the length of the
> > byte[] corresponding to the string. it seems
> > WritableUtils.decodeVIntSize(...) is also doing the same thing (from
> > reading the javadoc).
> >
> > when i look at WritableUtils.writeString(...), two things happen. the
> > length of the byte[] is written, followed by writing the byte[] itself.
> why
> > can't we simply do something like the following to get the length?
> >
> > int firstL1 = readInt(b1[s1]);
> > int firstL2 = readInt(b2[s2]);
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB