Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Re: BigInt / longlong

Tatu Saloranta 2012-03-29, 16:54
Copy link to this message
Re: BigInt / longlong
Scott Carey 2012-03-28, 18:43
On 3/28/12 11:01 AM, "Meyer, Dennis" <[EMAIL PROTECTED]> wrote:

> Hi,
> What type refers to an Java Bigint or C long long? Or is there any other type
> in Avro that maps a 64 bit unsigned int?
> I unfortunately could only find smaller types in the docs:
> Primitive Types
> The set of primitive type names is:
> * string: unicode character sequence
> * bytes: sequence of 8-bit bytes
> * int: 32-bit signed integer
> * long: 64-bit signed integer
> * float: single precision (32-bit) IEEE 754 floating-point number
> * double: double precision (64-bit) IEEE 754 floating-point number
> * boolean: a binary value
> * null: no value
> Anyway in the encoding section theres some 64bit unsigned. Can I use them
> somehow by a type?

An unsigned value fits in a signed one.  They are both 64 bits.  Each
language that supports a long unsigned type has its own way to convert from
one to the other without loss of data.

> Work around might be to use the 52 significant bits of a double, but seems
> like a hack and of course loosing some more number space compared to uint64.
> I'd like to get around any other self-encoding hacks as I'd like to also use
> Hadoop/PIG/HIVE on top on AVRO, so would like to keep functionality on numbers
> if possible.

Java does not have an unsigned 64 bit type.  Hadoop/Pig/Hive all only have
signed 64 bit integer quantities.

Luckily, multiplication and addition on two's compliment signed values is
identical to the operations on unsigned ints, so for many operations there
is no loss in fidelity as long as you pass the raw bits on to something that
interprets the number as an unsigned quantity.

That is, if you take the raw bits of a set of unsigned 64 bit numbers, and
treat those bits as if they are a signed 64 bit quantities, then do
addition, subtraction, and multiplication on them, then treat the raw bit
result as an unsigned 64 bit value, it is as if you did the whole thing


Avro only has signed 32 and 64 bit integer quantities because they can be
mapped to unsigned ones in most cases without a problem and many (actually,
most) languages do not support unsigned integers.

If you want various precision quantities you can use an Avro Fixed type to
map to any type you choose.  For example you can use a 16 byte fixed to map
to 128 bit unsigned ints.

> Thanks,
> Dennis
Miki Tebeka 2012-03-28, 23:38