Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - Re: Reverse Index Timestamp


Copy link to this message
-
Re: Reverse Index Timestamp
Roshan Punnoose 2012-11-27, 18:22
Thanks!

The fact that you are using a binary tree behind the scenes makes perfect
sense. Btw, what do you use in the standalone (non native) implementation?
Does it use a TreeMap?
On Tue, Nov 27, 2012 at 12:57 PM, Keith Turner <[EMAIL PROTECTED]> wrote:

>
>
> On Tue, Nov 27, 2012 at 12:21 PM, Roshan Punnoose <[EMAIL PROTECTED]>wrote:
>
>> The <string> would most likely be a fixed set of strings that do not
>> change over time.
>>
>> My question is if it is bad to use a reverse index timestamp in the row
>> id? Will it cause problems with the tablet splitting, compaction, and
>> performance if the data is always being sent to the top of the tablet? If I
>> define a split as everything prefixed with <string>, then the ingest will
>> go to one tablet, but then I add a reverse timestamp in the row, and that
>> would mean I am always copying data to the top of the tablet. Will this
>> cause performance issues? Or is it better to append to a tablet?
>>
>
> I do not think it should matter. Inserts go into a C++ STL map on the
> tablet server if using the nativemap.   I think the implementation of that
> is a balanced binary tree.  So I do not think inserting at the beginning vs
> the end would make difference.  That being said, I do not think I have
> tried this so I do not know if there would be any suprises.  I would be
> interested in hearing about your experiences.
>
>
>>
>>
>> On Tue, Nov 27, 2012 at 11:51 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>
>>>
>>>
>>> Keith
>>>
>>> On Tue, Nov 27, 2012 at 10:41 AM, Roshan Punnoose <[EMAIL PROTECTED]>wrote:
>>>
>>>> I want to have a table where the row will consist of "<string>-<reverse
>>>> index timestamp>". But this means that the data is always being prefixed to
>>>> the beginning of the row (or tablet if the row is large). Will this be a
>>>> problem for compaction or performance?
>>>
>>>
>>> Can you tell me more about what <string> is?  For example is it a hash
>>> or does it come from the set "foo1","foo2","foo3".   How does it change
>>> over time?  I think the answer to your question depends on what <string> is.
>>>
>>>
>>>>
>>>> I don't know if I heard this correctly, but someone once mentioned that
>>>> making the row id the direct timestamp could cause performance issues
>>>> because data is always going to one tablet, but also because there is
>>>> trouble splitting since it always appends to the tablet. Is this true, is
>>>> it similar to what could happen if I am always prefixing to a tablet?
>>>>
>>>
>>> Yes using a timestamp for a row could cause data from many clients to
>>> always go to the same tablet, which would be bad for performance on a
>>> cluster.
>>>
>>>
>>>>
>>>> Thanks!
>>>> Roshan
>>>>
>>>
>>>
>>
>
+
Keith Turner 2012-11-27, 18:25
+
Jim Klucar 2012-11-27, 21:45
+
Roshan Punnoose 2012-11-27, 22:53
+
Jim Klucar 2012-12-03, 14:02