Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - what is the code for WritableComparator.readVInt and WritableUtils.decodeVIntSize doing?


Copy link to this message
-
Re: what is the code for WritableComparator.readVInt and WritableUtils.decodeVIntSize doing?
Jane Wayne 2012-04-01, 03:19
chris,

thanks. i see now.

internally, i use String instead of Text and so I use
WritableUtils.writeString(...) and not Text.write(...). in the latter
method, i see that it calls WritableUtils.writeVInt(...) before
out.write(byte[], start, length).

tom white uses Text internally to represent strings (which is maybe what i
should do), so his example is correct and works. i think i was just
confusing myself.

thanks for the last paragraph too, that really helped a lot.

On Sat, Mar 31, 2012 at 1:17 PM, Chris White <[EMAIL PROTECTED]>wrote:

> A text object is written out as a vint representing the number of bytes and
> then the byte array contents of the text object
>
> Because a vintage can be between 1-5 bytes in length, the decodeVIntSize
> method examines the first byte of the vint to work out how many bytes to
> skip over before the text bytes start.
>
> readVInt then actually reads the vint bytes to get the length of the
> following byte array.
>
> So when you call the compareBytes method you need to pass in where the
> actual bytes start (s1 + vIntLen) and how many bytes to compare (vint)
> On Mar 31, 2012 12:38 AM, "Jane Wayne" <[EMAIL PROTECTED]> wrote:
>
> > in tom white's book, Hadoop, The Definitive Guide, in the second edition,
> > on page 99, he shows how to compare the raw bytes of a key with Text
> > fields. he shows an example like the following.
> >
> > int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
> > int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
> >
> > his explanation is that firstL1 is the length of the first String/Text in
> > b1, and firstL2 is the length of the first String/Text in b2. but i'm
> > unsure of what the code is actually doing.
> >
> > what is WritableUtils.decodeVIntSize(...) doing?
> > what is WritableComparator.readVInt(...) doing?
> > why do we have to add the outputs of these 2 methods to get the length of
> > the String/Text?
> >
> > could someone please explain in plain terms what's happening here? it
> seems
> > WritableComparator.readVInt(...) is already getting the length of the
> > byte[] corresponding to the string. it seems
> > WritableUtils.decodeVIntSize(...) is also doing the same thing (from
> > reading the javadoc).
> >
> > when i look at WritableUtils.writeString(...), two things happen. the
> > length of the byte[] is written, followed by writing the byte[] itself.
> why
> > can't we simply do something like the following to get the length?
> >
> > int firstL1 = readInt(b1[s1]);
> > int firstL2 = readInt(b2[s2]);
> >
>