Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Schema Design for clickstream data

Copy link to this message
Re: HBase Schema Design for clickstream data

What would be your read patterns later on? Are you going to read per
session, or for a time period, or for a set of users, or process through
the entire dataset every time? That would play an important role in
defining your keys and columns.


On Tue, Jun 26, 2012 at 1:34 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> I am starting out with a new application where I need to store users
> clickstream data. I'll have Visitor Id, session id along with other page
> related data. I am wondering if I should just key off randomly generated
> session id and store all the page related data as columns inside that row
> assuming that this would also give good distribution accross region
> servers. In a session user could send 100s of HTML requests and get
> responses. If someone is already doing this in HBase I would like to learn
> more about it as to how they have designed the schema.