-Re: Tables vs CFs vs Cs
Jean-Marc Spaggiari 2013-01-27, 17:37
What I would like is to have a faster (direct?) access to the number
of entries starting with "058".
For IPv4 it's 0 to 255, so working fine. For for IPv6, it can take a
while to scan the full range and aggregate.
2013/1/27, lars hofhansl <[EMAIL PROTECTED]>:
> I might be missing something. Why don't just have a counter per IP and then
> aggregate at read time?
> If you wanted the total of the 058 group you'd start a scanner with "058" as
> start row and "058\0" as stop row. On the client you sum up the counter
> Similarly for the 109.169 group. Start with "109.169" and stop "109.169\0".
> -- Lars
> From: Jean-Marc Spaggiari <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Sent: Sunday, January 27, 2013 8:51 AM
> Subject: Tables vs CFs vs Cs
> Let's imagine this scenario.
> I want to store IPs with counters. And I want to have counters by
> groups of IPs. All of that will be calculated with MR jobs and stored
> in HBase.
> Let's take some IPs and make sure they are ordered by adding some "0"
> when required.
> I want to have counters for all "levels" of those IPs. Which mean for
> those groups.
> Group 1:
> Group 2:
> Group 3:
> And group 4 is the complete IPs list.
> Each time I see an IP, I will increment the required values into the 4
> What's the bests way to store that knowing that I want to be able to
> easily list all the entries (ranged based) from one group.
> Option 1 is to have one table per group. 1CF, 1C
> Pros: Very easy to access, retrieve, etc.
> Cons: Will generate 4 tables
> Option 2 is to have one table, but 1 CF per group.
> Pros: Only one table, easy access.
> Cons: Heard that we should try to keep CFs under 3. Might have bad
> performances impacts.
> Option 3 is to have one table, one CF and one C per group.
> Pros: Only one table, only one CF.
> Cons: Access is less easy than option 1 and 2.
> I think Option 2 is the worst one. Option 1 is very easy to implement.
> And for option 3, I don't see any benefit compared to option 1.
> So I'm tempted to go with option 1, but I don't like the idea of
> multiplying the table.
> Does anyone have any comment on which options might be the best one,
> or even proposed another option?