Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - observer coprocessor question regarding puts


Copy link to this message
-
Re: observer coprocessor question regarding puts
Michael Segel 2013-06-14, 14:45
Not to beat a dead horse...

I did want to touch a bit more on the schema design issues and considerations.

If you have a really wide composite key and you're only storing a single cell, you will end up with a very long (tall) table.

Does this make sense?

Would it make more sense in using a smaller key and then storing multiple cells with part of the rowkey as a column qualifier?

Using your example... you have [A,B,C] as your rowkey and then Column1 with a value.

You could make the row key [A, B] with the column qualifier [C] storing the value there.

Does that make sense?

-Mike

On Jun 13, 2013, at 9:51 PM, Michel Segel <[EMAIL PROTECTED]> wrote:

> Ok...
>
> But then you are duplicating the data, so you will have to reconcile the two sets and there is a possibility that the data sets are out of sync.
>
> I don't know your entire Schema, but if the row key is larger than the value, you may want to think about changing the Schema.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jun 13, 2013, at 9:34 PM, rob mancuso <[EMAIL PROTECTED]> wrote:
>
>> Thx Mike, for the most part.
>>
>> My key is substantially larger than my value, so I was thinking of leaving
>> the cq->value stuff as is and just inverting the rowkey.
>>
>> So the original table would have
>>
>> [A, B, C] cf1:cq1 val1
>>
>> And the secondary table would have
>>
>> [C, B, A] cf1:cq1 val1
>> On Jun 10, 2013 3:42 PM, "Michael Segel" <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> If I understand you ...
>>>
>>> You have the row key = [A,B,C]
>>> You want to create an inverted mapping of  Key [C] => {[A,B,C]}
>>>
>>> That is to say that your inverted index would be all of the rows where the
>>> value of C = x  .
>>> And x is some value.
>>>
>>> You should have to worry about column qualifiers just the values of A , B
>>> and C.
>>>
>>> In this case, the columns in your index will also be the values of the
>>> tuples.
>>> You really don't need C because you already have it, but then you'd need
>>> to remember to add it to the pair (A, B) that you are storing.
>>> I'd say waste the space and store (A,B,C) but that's just me.
>>>
>>>
>>> Is that what you want to do?
>>>
>>> -Mike
>>>
>>> On Jun 9, 2013, at 12:16 PM, rob mancuso <[EMAIL PROTECTED]> wrote:
>>>
>>>> Thx Anoop, I believe this is what I'm looking for.
>>>>
>>>> Regarding my use case,  my rowkey is [A,B,C], but i also have a
>>> requirement
>>>> to access data by [C] only.  So I'm looking to use a post-put coprocessor
>>>> to maintain one secondary index table where the rowkey starts with [C].
>>> My
>>>> cqs are numerics representing time and can be any number btw 1 and 3600
>>> (ie
>>>> seconds within an hour). Because I won't know the cq value for each
>>>> incoming put (just the cf), I need something to deconstruct the put into
>>> a
>>>> list of cqs ...which I believe you've provided with getFamilyMap.
>>>>
>>>> Thx again!
>>>> On Jun 9, 2013 12:47 AM, "Anoop John" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> You want to have an index per every CF+CQ right?  You want to maintain
>>> diff
>>>>> tables for diff columns?
>>>>>
>>>>> Put is having getFamilyMap method Map CF vs List KVs.  From this List of
>>>>> KVs you can get all the CQ names and values etc..
>>>>>
>>>>> -Anoop-
>>>>>
>>>>> On Sat, Jun 8, 2013 at 11:24 PM, rob mancuso <[EMAIL PROTECTED]>
>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm looking to write a post-put observer coprocessor to maintain a
>>>>>> secondary index.  Basically, my current rowkey design is a composite of
>>>>>> A,B,C and I want to be able to also access data by C.  So all i'm
>>> looking
>>>>>> to do is invert the rowkey and apply it for all cf:cq values that come
>>>>> in.
>>>>>>
>>>>>> My problem (i think), is that in all the good examples i've seen, they
>>>>> all
>>>>>> deconstruct the Put by calling put.get(<cf>,<cq>)...implying they know
>>>>> the
>>>>>> qualifier ahead of time.  I'm looking to specify the family and