Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> What is the best hbase table schema for following json data?


Copy link to this message
-
Re: What is the best hbase table schema for following json data?
bq. 1) Suppose If I want search on key of click, It will be full scan

You can utilize MultipleColumnPrefixFilter or ColumnPrefixFilter to speed
up scan.

How many key / value pairs does each 'click' have ? Among these pairs, are
you going to search for a subset of keys ?

Cheers

On Wed, May 29, 2013 at 8:47 PM, AnilKumar B <[EMAIL PROTECTED]> wrote:

> Hi,
>
> What is the best hbase table schema for following json data?
> I need to store following JSON data in hbase.
> {"Session"":{"Header" :
> {"key1":"value1","key2":"value2","key3":"value3","key4":"value4",....},
> "clicks" : [{"click" " : {"key1":"value1","key2":"value2",
> "key3":"value3"....}, {"click" : {"key1":"value1", "key2":"value2",
> ....}}]}}
>
> I have created the schema as below, but there seems to some issues.
> rowkey -> compositeKey of session fields
> ColumnFamily 1 -> "Header" which consists of following columns
> 1) Header:HeaderFields which stores  "{"Header" :
> {"key1":"value1","key1":"value1","key1":"value1","key1":"value1",....}" in
> one cell
> 2) other columns
>
> ColumnFamily 2 -> "clicks" and each "click" will be one column
>
> The problem here is
> 1) Suppose If I want search on key of click, It will be full scan, how can
> I optimize my schema for such search requirement?
> 2) If I want to provide some secondary index for keys of clicks, how can
> Implement it?
>
> Thanks,
> B Anil Kumar.
>