Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> old question regarding wide vs tall schema design


Copy link to this message
-
old question regarding wide vs tall schema design
Hi, All

I know this is a question that has been asked a lot... Here is our user
case that seems neither wide nor tall can win obviously.

We have a table.  Here are the two schema design of the simplified version:
      Wide table-- rowKey: domainName. Column1: date (which expand to
multiple years). Column2: action(8 actions)

       Tall table-- rowKey: domainName+date. Column1: actions
The value is just a counter.

We have considered several factors, but seems either way works fine.
1. The column count is only thousands, not millions. So this does not kill
Wide option.
2. Row atomicity. We don't really need atomicity for different date. So
this does not kill the Tall option.
3. Bloom filter size. Seems not big difference. So does not kill the Tall
option.
4. Write performance. We do mostly batched write. So faster write by tall
table is not enough to kill Wide table.
5. Scan performance. Even for wide row, scan/get does not load the whole
row, the column specified can return the value on the needed date. So
either tall or wide is the same.

Based on the consideration I can think of, it seems either tall or wide
have no much performance difference? When no side win obviously , is the
guideline better to use wide or tall?

Thanks
Tian-Ying