I know this is a question that has been asked a lot... Here is our user
case that seems neither wide nor tall can win obviously.
We have a table. Here are the two schema design of the simplified version:
Wide table-- rowKey: domainName. Column1: date (which expand to
multiple years). Column2: action(8 actions)
Tall table-- rowKey: domainName+date. Column1: actions
The value is just a counter.
We have considered several factors, but seems either way works fine.
1. The column count is only thousands, not millions. So this does not kill
2. Row atomicity. We don't really need atomicity for different date. So
this does not kill the Tall option.
3. Bloom filter size. Seems not big difference. So does not kill the Tall
4. Write performance. We do mostly batched write. So faster write by tall
table is not enough to kill Wide table.
5. Scan performance. Even for wide row, scan/get does not load the whole
row, the column specified can return the value on the needed date. So
either tall or wide is the same.
Based on the consideration I can think of, it seems either tall or wide
have no much performance difference? When no side win obviously , is the
guideline better to use wide or tall?