If the row Key is just the customer ID, then a simple MD5 hash or SHA-1 hash would suffice.
That would clear up any risk of hot spotting, once you do your initial load of data.
And that's probably a key point... hot spotting when you're first loading a very large table is really a moot point. It may be painful, but the pain lasts for less than an hour.
On Nov 26, 2012, at 4:28 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Hello sir,
> You might become a victim of RS hotspotting, since the cutomerIDs will
> be sequential(I assume). To keep things simple Hbase puts all the rows with
> similar keys to the same RS. But, it becomes a bottleneck in the long run
> as all the data keeps on going to the same region.
> Mohammad Tariq
> On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
> [EMAIL PROTECTED]> wrote:
>> Thanks! Can we have the customer number as the RowKey for the customer
>> (client) master table? Please help in educating me on the advantage and
>> disadvantage of having customer number as the Row key...
>> Also SCD2 we may need to implement in that table.. will it work if I have
>> like that?
>> SCD2 is not needed instead we can achieve the same by increasing the
>> version number that it will hold?
>> pls suggest...
>> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <[EMAIL PROTECTED]> wrote:
>>> When 1 cf need to do split, other 599 cfs will split at the same time. So
>>> many fragments will be produced when you use so many column families.
>>> Actually, many cfs can be merge to only one cf with specific tags in
>>> rowkey. For example, rowkey of customer address can be uid+'AD', and
>>> customer profile can be uid+'PR'.
>>> -----Original Message-----
>>> From: Ramasubramanian Narayanan [mailto:
>>> [EMAIL PROTECTED]]
>>> Sent: Monday, November 26, 2012 3:05 PM
>>> To: [EMAIL PROTECTED]
>>> Subject: Expert suggestion needed to create table in Hbase - Banking
>>> I have a requirement of physicalising the logical model... I have a
>>> client model which has 600+ entities...
>>> Need suggestion how to go about physicalising it...
>>> I have few other doubts :
>>> 1) Whether is it good to create a single table for all the 600+
>>> 2) To have different column families for different groups or can it be
>>> under a single column family? For example, customer address can we have
>>> a different column family?
>>> Please help on this..