Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> What is the best hbase table schema for following json data?


+
AnilKumar B 2013-05-30, 03:47
+
Ted Yu 2013-05-30, 04:12
+
AnilKumar B 2013-05-30, 06:13
+
Ted Yu 2013-05-30, 16:48
Copy link to this message
-
Re: What is the best hbase table schema for following json data?
But you should be able to write a custom column filter that handles JSON records within a cell.

On May 30, 2013, at 11:48 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. Still these ColumnPrefixFilter will work in this case?
>
> Probably not. Can you group the subset of keys at the beginning of the
> column (assuming the subset of keys is known and doesn't change) ?
>
> bq. I am storing each click(set of key value pairs) in one cell say
> "clicks:event1". Is this OK?
>
> This should be Okay.
>
> On Wed, May 29, 2013 at 11:13 PM, AnilKumar B <[EMAIL PROTECTED]> wrote:
>
>> Hi Ted,
>>
>> @You can utilize MultipleColumnPrefixFilter or ColumnPrefixFilter to speed
>> up scan.
>> [Anil] Thanks for the info. But I am storing all the key value pairs
>> corresponding to one click in one column. Still these ColumnPrefixFilter
>> will work in this case?
>>
>> @How many key / value pairs does each 'click' have ?
>> [Anil] number of key value pairs are not fixed. It can vary from 20-200
>>
>> @Among these pairs, are you going to search for a subset of keys ?
>> [Anil] Yes.
>>
>>
>>
>> In my schema, I am storing each click(set of key value pairs) in one cell
>> say "clicks:event1". Is this OK? or do I need to change schema design in
>> such a way that each key-value pair as one column? What is the better way
>> to store Json data?
>>
>>
>> Thanks,
>> B Anil Kumar.
>>
>>
>> On Thu, May 30, 2013 at 9:42 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>>> bq. 1) Suppose If I want search on key of click, It will be full scan
>>>
>>> You can utilize MultipleColumnPrefixFilter or ColumnPrefixFilter to speed
>>> up scan.
>>>
>>> How many key / value pairs does each 'click' have ? Among these pairs,
>> are
>>> you going to search for a subset of keys ?
>>>
>>> Cheers
>>>
>>> On Wed, May 29, 2013 at 8:47 PM, AnilKumar B <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> What is the best hbase table schema for following json data?
>>>> I need to store following JSON data in hbase.
>>>> {"Session"":{"Header" :
>>>> {"key1":"value1","key2":"value2","key3":"value3","key4":"value4",....},
>>>> "clicks" : [{"click" " : {"key1":"value1","key2":"value2",
>>>> "key3":"value3"....}, {"click" : {"key1":"value1", "key2":"value2",
>>>> ....}}]}}
>>>>
>>>> I have created the schema as below, but there seems to some issues.
>>>> rowkey -> compositeKey of session fields
>>>> ColumnFamily 1 -> "Header" which consists of following columns
>>>> 1) Header:HeaderFields which stores  "{"Header" :
>>>> {"key1":"value1","key1":"value1","key1":"value1","key1":"value1",....}"
>>> in
>>>> one cell
>>>> 2) other columns
>>>>
>>>> ColumnFamily 2 -> "clicks" and each "click" will be one column
>>>>
>>>> The problem here is
>>>> 1) Suppose If I want search on key of click, It will be full scan, how
>>> can
>>>> I optimize my schema for such search requirement?
>>>> 2) If I want to provide some secondary index for keys of clicks, how
>> can
>>>> Implement it?
>>>>
>>>> Thanks,
>>>> B Anil Kumar.
>>>>
>>>
>>
+
AnilKumar B 2013-06-01, 14:36