OpenTSDB uses the "natural" availability of the metrics ID to bucket the metrics by type. After that it relies on scanner batching, and block loads.
For your uses case you could bin by time frames, say for example hash the start of each hour into an MD5 and concatenate it with the actual epoch like so
That way you have the ability to range scan the bin but also distribute the bins across the server evenly.
On May 21, 2012, at 7:56 AM, mete wrote:
> Hello folks,
> i am trying to come up with a nice key design for storing logs in the
> company. I am planning to index them and store row key in the index for
> random reads.
> I need to balance the writes equally between the R.S. and i could not
> understand how opentsdb does that with prefixing the metric id. (i related
> metric id with the log type) In my log storage case a log line just has a
> type and a date and the rest of it is not really very useful information.
> So i think that i can create a table for every distinct log type and i need
> a random salt to route to a different R.S. similar to this:
> But with this approach i believe i will lose the ability to do effective
> partial scans to a specific date. (if for some reason i need that) What do
> you think? And for the salt approach do you use randomly generated salts or
> hashes that actually mean something? (like the hash of the date)
> I am using random uuids at the moment but i am trying to find a better
> approach, any feedback is welcome