Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Number of tables


Copy link to this message
-
Re: Number of tables
Sonal Goyal 2011-08-21, 17:49
If your data size is big enough to warrant 3 tables, go for it. This would
be the case where there are really lots of entries for user#type.

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Sun, Aug 21, 2011 at 11:09 PM, Mark <[EMAIL PROTECTED]> wrote:

> Almost all use cases require type.. ie
>
> Retrieve all searches performed by user 'foo':  scan "history", {STARTROW
> => "search/foo"}
> Retrieve all product views performed by user 'foo': scan "history",
> {STARTROW => "view/foo"}
>
>
> On 8/21/11 10:25 AM, Sonal Goyal wrote:
>
>> Hi Mark,
>>
>> When you say that your use case does not require searching across multiple
>> types, what do you mean? Do you have cases when you search with type?
>>
>> Best Regards,
>> Sonal
>> Crux: Reporting for HBase<https://github.com/**sonalgoyal/crux<https://github.com/sonalgoyal/crux>
>> >
>> Nube Technologies<http://www.**nubetech.co <http://www.nubetech.co>>
>>
>> <http://in.linkedin.com/in/**sonalgoyal<http://in.linkedin.com/in/sonalgoyal>
>> >
>>
>>
>>
>>
>>
>>
>> On Sun, Aug 21, 2011 at 9:29 PM, Mark<[EMAIL PROTECTED]**>
>>  wrote:
>>
>>  We are logging all user actions into hbase. These actions include
>>> searches,
>>> product views and clicks.
>>>
>>> We are currently storing them in one table with row keys like so:
>>> "#{type}/#{user}/#{time}", where type is either click, search, view and
>>> user
>>> is the current user logged in. Obviously using this method lead to region
>>> hot spotting as the start of each key is fairly static. This got me to
>>> thinking on what alternatives ways I could model this type of data and I
>>> was
>>> hoping I could get some suggestions from the community.
>>>
>>> Which would be more advisable?
>>>
>>> 1) Keep the current all logs go to one table pattern that is describe
>>> above.
>>> 2) Keep the current all logs go to one table pattern that is describe
>>> above
>>> but switch the type and user fields which would lead to more randomized
>>> keys
>>> thus reducing hot spots
>>> 3) Create separate tables for each type of log we are saving... ie have
>>> search table, click table, view table.
>>>
>>> Our use case does not require us searching across multiple types so I'm
>>> leaning towards #3 now but I was wondering if there were any cons to
>>> using
>>> this method? Is it worse to have more tables than less?
>>>
>>> Thanks for help
>>>
>>> -M
>>>
>>>
>>>
>>>
>>>
>>>