Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Schema Design for clickstream data


+
Mohit Anchlia 2012-06-26, 17:34
+
Dhaval Shah 2012-06-26, 17:52
+
Amandeep Khurana 2012-06-27, 18:01
Copy link to this message
-
Re: HBase Schema Design for clickstream data
Analysis include:

Visitor level
Session level - visitors could have multiple levels
Page hits, conversions - popular pages, sequence of pages hit in one session
Orders purchased - mostly determined by URL and query parameters

How should I go about designing schema?

Thanks
Sent from my iPad

On Jun 27, 2012, at 2:01 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:

> Mohit,
>
> What would be your read patterns later on? Are you going to read per
> session, or for a time period, or for a set of users, or process through
> the entire dataset every time? That would play an important role in
> defining your keys and columns.
>
> -Amandeep
>
> On Tue, Jun 26, 2012 at 1:34 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> I am starting out with a new application where I need to store users
>> clickstream data. I'll have Visitor Id, session id along with other page
>> related data. I am wondering if I should just key off randomly generated
>> session id and store all the page related data as columns inside that row
>> assuming that this would also give good distribution accross region
>> servers. In a session user could send 100s of HTML requests and get
>> responses. If someone is already doing this in HBase I would like to learn
>> more about it as to how they have designed the schema.
>>
+
Amandeep Khurana 2012-06-27, 18:20
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB