Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Re: [jira] [Commented] (ACCUMULO-452) Generalize locality groups


Copy link to this message
-
Re: [jira] [Commented] (ACCUMULO-452) Generalize locality groups
If column families aren't a good fit, then look at the rows.

If neither rows nor column families work, then one can always create another table, that has time embedded in the rows.

Something that a lot of users of this technology forget is that space is no longer something to optimize, it's cheap. Keeping several different organizations of the same data around is totally fine.

If users can somehow avoid work by getting the infrastructure to do it for them, they will. The right thing for the infrastructure to do is avoid trying to be everything to everyone and do the 70% that everyone needs really well. The original BigTable design pointed this out - their choices were based on what would satisfy a majority of user needs well, without introducing too much complexity.
On Mar 8, 2012, at 4:19 PM, Keith Turner (Commented) (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/ACCUMULO-452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225531#comment-13225531 ]
>
> Keith Turner commented on ACCUMULO-452:
> ---------------------------------------
>
> Users do want this capability.  They keep asking for it.  We do turn around them and tell them to sort their data differently.  They don't always like that answer.  The intent of this is to meet a user need.
>
> Storing temporal information in the column family is a possibility. It would work well for some cases, like having two locality groups one thats the current month and another thats everything else.  You put the month in the column family and reconfigure the locality groups every month.
>
> However, if you would like something like LG1 = < day old, LG2 = < month old, LG3 = < year old this would not be possible w/ the current locality group implementation. However ACCUMULO-164 may make this possible.  Store time to the day in the column family.  John pointed out one problem w/ this, its hard to automatically determine that patterns match disjoint sets.  I need to think through ACCUMULO-164 some more and see what the possible gotchas are.
>
> If you have to duplicate the data in the timestamp into your column family to accomplish your goals, does this indicate a problem with the model?  It do not think its clean, but its ok w/ me.
>
>
>
>
>
>> Generalize locality groups
>> --------------------------
>>
>>                Key: ACCUMULO-452
>>                URL: https://issues.apache.org/jira/browse/ACCUMULO-452
>>            Project: Accumulo
>>         Issue Type: New Feature
>>           Reporter: Keith Turner
>>            Fix For: 1.5.0
>>
>>        Attachments: PartitionerDesign.txt
>>
>>
>> Locality groups are a neat feature, but there is no reason to limit partitioning to column families.  Data could be partitioned based on any criteria.  For example if a user is interested in querying recent data and ageing off old data partitioning locality groups based in timestamp would be useful.  This could be accomplished by letting users specify a partitioner plugin that is used at compaction and scan time.  Scans would need an ability to pass options to the partitioner.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB