|
|
I am debating if a lookup table would help my situation.
I have a bunch of codes which map with timestamp (unsigned int). The codes look like this
AA4 AAA5 A21 A4 ... Z435
The size range from 1 character to 4 characters (1 to 4 bytes, respectively). Would adding a lookup table for all my codes help in reducing space? If so, what would be the best way to hash something like this? -- --- Get your facts first, then you can distort them as you please.--
On Sat, Sep 15, 2012 at 8:09 AM, Rita <[EMAIL PROTECTED]> wrote: > I am debating if a lookup table would help my situation. > > I have a bunch of codes which map with timestamp (unsigned int). The codes > look like this > > AA4 > AAA5 > A21 > A4 > ... > Z435 > > The size range from 1 character to 4 characters (1 to 4 bytes, > respectively). > > > Would adding a lookup table for all my codes help in reducing space? If so, > what would be the best way to hash something like this? >
You are trying to save on disk space? You could make your keys binary four bytes max null prefixed if < 4 characters? Why are you trying to save disk space? You want a lookup table so you can have a code that is smaller than that of the 1-4 character codes?
St.Ack St.Ack
+
Stack 2012-09-16, 22:16
Yes, I am trying to save on disk space because of limited resouces and the table will be around 30 billion rows.
The lookup table itself will be around 9k rows so its not too bad. A character's range will be from 1 to 4.
I suppose I really should worry about it too much.
On Sun, Sep 16, 2012 at 6:16 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Sat, Sep 15, 2012 at 8:09 AM, Rita <[EMAIL PROTECTED]> wrote: > > I am debating if a lookup table would help my situation. > > > > I have a bunch of codes which map with timestamp (unsigned int). The > codes > > look like this > > > > AA4 > > AAA5 > > A21 > > A4 > > ... > > Z435 > > > > The size range from 1 character to 4 characters (1 to 4 bytes, > > respectively). > > > > > > Would adding a lookup table for all my codes help in reducing space? If > so, > > what would be the best way to hash something like this? > > > > You are trying to save on disk space? You could make your keys binary > four bytes max null prefixed if < 4 characters? Why are you trying to > save disk space? You want a lookup table so you can have a code that > is smaller than that of the 1-4 character codes? > > St.Ack > St.Ack >
-- --- Get your facts first, then you can distort them as you please.--
Tom Brown 2012-09-17, 01:57
If there are 9k possible entries in the lookup table, in order to achieve space savings, the keys will need to be 1 or 2 bytes. For simplicity, let's say you go with the 2 byte version. For 30 billion cells you will save 2 bytes per cell at best (from 4 bytes to 2) for a total savings of 60Gb and at worst it will take more size because the lookup keys will be longer than the actual value being looked up.
The added complexity of a lookup table would not make that savings worth it to me, but you know your data best.
Just my $0.02
--Tom
On Sunday, September 16, 2012, Rita <[EMAIL PROTECTED]> wrote: > Yes, I am trying to save on disk space because of limited resouces and the > table will be around 30 billion rows. > > The lookup table itself will be around 9k rows so its not too bad. A > character's range will be from 1 to 4. > > I suppose I really should worry about it too much. > > > > > > On Sun, Sep 16, 2012 at 6:16 PM, Stack <[EMAIL PROTECTED]> wrote: > >> On Sat, Sep 15, 2012 at 8:09 AM, Rita <[EMAIL PROTECTED]> wrote: >> > I am debating if a lookup table would help my situation. >> > >> > I have a bunch of codes which map with timestamp (unsigned int). The >> codes >> > look like this >> > >> > AA4 >> > AAA5 >> > A21 >> > A4 >> > ... >> > Z435 >> > >> > The size range from 1 character to 4 characters (1 to 4 bytes, >> > respectively). >> > >> > >> > Would adding a lookup table for all my codes help in reducing space? If >> so, >> > what would be the best way to hash something like this? >> > >> >> You are trying to save on disk space? You could make your keys binary >> four bytes max null prefixed if < 4 characters? Why are you trying to >> save disk space? You want a lookup table so you can have a code that >> is smaller than that of the 1-4 character codes? >> >> St.Ack >> St.Ack >> > > > > -- > --- Get your facts first, then you can distort them as you please.-- >
+
Tom Brown 2012-09-17, 01:57
On Sun, Sep 16, 2012 at 4:27 PM, Rita <[EMAIL PROTECTED]> wrote: > Yes, I am trying to save on disk space because of limited resouces and the > table will be around 30 billion rows. > > The lookup table itself will be around 9k rows so its not too bad. A > character's range will be from 1 to 4. > > I suppose I really should worry about it too much. >
I'd agree (See Tom Brown's comment in previous mail on this thread). St.Ack
+
Stack 2012-09-17, 16:23
|
|