Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Tables vs CFs vs Cs


Copy link to this message
-
Re: Tables vs CFs vs Cs
What I would like is to have a faster (direct?) access to the number
of entries starting with "058".

For IPv4 it's 0 to 255, so working fine. For for IPv6, it can take a
while to scan the full range and aggregate.

JM

2013/1/27, lars hofhansl <[EMAIL PROTECTED]>:
> I might be missing something. Why don't just have a counter per IP and then
> aggregate at read time?
> If you wanted the total of the 058 group you'd start a scanner with "058" as
> start row and "058\0" as stop row. On the client you sum up the counter
> values.
> Similarly for the 109.169 group. Start with "109.169" and stop "109.169\0".
>
> -- Lars
>
>
>
> ________________________________
>  From: Jean-Marc Spaggiari <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Sent: Sunday, January 27, 2013 8:51 AM
> Subject: Tables vs CFs vs Cs
>
> Hi,
>
> Let's imagine this scenario.
>
> I want to store IPs with counters. And I want to have counters by
> groups of IPs. All of that will be calculated with MR jobs and stored
> in HBase.
>
> Let's take some IPs and make sure they are ordered by adding some "0"
> when required.
>
> 037.113.031.119
> 058.022.018.176
> 058.022.159.151
> 109.169.201.076
> 109.169.201.150
> 109.254.019.140
> 122.031.039.016
> 122.224.005.210
> 178.137.167.041
>
> I want to have counters for all "levels" of those IPs. Which mean for
> those groups.
>
> Group 1:
> 037
> 058
> 109
> 122
> 178
>
> Group 2:
>
> 037.113
> 058.022
> 109.169
> 109.254
> 122.031
> 122.224
> 178.167
>
> Group 3:
>
> 037.113.031
> 058.022.018
> 058.022.159
> 109.169.201
> 109.254.019
> 122.031.039
> 122.224.005
> 178.137.167
>
> And group 4 is the complete IPs list.
>
> Each time I see an IP, I will increment the required values into the 4
> groups.
>
> What's the bests way to store that knowing that I want to be able to
> easily list all the entries (ranged based) from one group.
>
> Option 1 is to have one table per group. 1CF, 1C
> Pros: Very easy to access, retrieve, etc.
> Cons: Will generate 4  tables
>
> Option 2 is to have one table, but 1 CF per group.
> Pros: Only one table, easy access.
> Cons: Heard that we should try to keep CFs under 3. Might have bad
> performances impacts.
>
> Option 3 is to have one table, one CF and one C per group.
> Pros: Only one table, only one CF.
> Cons: Access is less easy than option 1 and 2.
>
> I think Option 2 is the worst one. Option 1 is very easy to implement.
> And for option 3, I don't see any benefit compared to option 1.
>
> So I'm tempted to go with option 1, but I don't like the idea of
> multiplying the table.
>
> Does anyone have any comment on which options might be the best one,
> or even proposed another option?
>
> JM