Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Schema Design for clickstream data


+
Mohit Anchlia 2012-06-26, 17:34
+
Dhaval Shah 2012-06-26, 17:52
+
Amandeep Khurana 2012-06-27, 18:01
Copy link to this message
-
Re: HBase Schema Design for clickstream data
Analysis include:

Visitor level
Session level - visitors could have multiple levels
Page hits, conversions - popular pages, sequence of pages hit in one session
Orders purchased - mostly determined by URL and query parameters

How should I go about designing schema?

Thanks
Sent from my iPad

On Jun 27, 2012, at 2:01 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:

> Mohit,
>
> What would be your read patterns later on? Are you going to read per
> session, or for a time period, or for a set of users, or process through
> the entire dataset every time? That would play an important role in
> defining your keys and columns.
>
> -Amandeep
>
> On Tue, Jun 26, 2012 at 1:34 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> I am starting out with a new application where I need to store users
>> clickstream data. I'll have Visitor Id, session id along with other page
>> related data. I am wondering if I should just key off randomly generated
>> session id and store all the page related data as columns inside that row
>> assuming that this would also give good distribution accross region
>> servers. In a session user could send 100s of HTML requests and get
>> responses. If someone is already doing this in HBase I would like to learn
>> more about it as to how they have designed the schema.
>>
+
Amandeep Khurana 2012-06-27, 18:20