Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Filtering on column qualifier


+
Marc Reichman 2013-08-21, 14:00
+
John Vines 2013-08-21, 14:49
+
Slater, David M. 2013-08-21, 23:58
+
John Vines 2013-08-22, 00:38
+
Marc Reichman 2013-08-22, 14:19
+
David Medinets 2013-08-22, 16:16
+
Marc Reichman 2013-08-22, 16:33
+
David Medinets 2013-08-22, 17:10
Copy link to this message
-
Re: Filtering on column qualifier
We've done similar with Clojure as a lark, passing in custom map or filter
functions. But we've never deployed it because of the security risk of
running arbitrary user code on tservers, unsandboxed.
On Thu, Aug 22, 2013 at 1:10 PM, David Medinets <[EMAIL PROTECTED]>wrote:

> The advantage is that you'd only write the iterator once and deploy it to
> the cluster. Then the groovy snippet changes its behavior. You'd save
> passing the data to your client code, but more work would be done by the
> accumulo cluster.
>
>
> On Thu, Aug 22, 2013 at 12:33 PM, Marc Reichman <
> [EMAIL PROTECTED]> wrote:
>
>> I haven't considered that. Would that allow me to specify it in the
>> client-side code and not worry about spreading JARs around? It is a very
>> basic need, in my scan iterator loop right now is:
>>
>>             String matchScoreString = key.getColumnQualifier().toString();
>>             Double score = Double.parseDouble(matchScoreString);
>>
>>             if (threshold != null && threshold > score) {
>>                 // TODO: figure out if this is possible to do via
>> data-local scan iterator
>>                 continue;
>>             }
>>
>> What is the pattern for including a groovy snippet for a scan iterator?
>>
>>
>> On Thu, Aug 22, 2013 at 11:16 AM, David Medinets <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Have you thought of writing a filter class that takes some bit of groovy
>>> for execution inside the accept method, depending on how efficient you need
>>> to be and how changeable your constraints are.
>>>
>>>
>>> On Thu, Aug 22, 2013 at 10:19 AM, Marc Reichman <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Extending looked like a bit of a boondoggle, because all of the useful
>>>> fields in the class are private, not protected. I also ran into another
>>>> architectural question, how does one pass a value (a-la constructor) into
>>>> one of these classes? If I'm going to use this to filter based on a
>>>> threshold, I'd need to pass that threshold in somehow.
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 21, 2013 at 9:49 AM, John Vines <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> There's no way to extend the ColumnQualietyFilter via configuration,
>>>>> but it sounds like you are on top of it. You just need to extend the class,
>>>>> possibly copy a bit of code, and change the equality check to a compareTo
>>>>> after converting the Strings to Doubles.
>>>>>
>>>>>
>>>>> On Wed, Aug 21, 2013 at 10:00 AM, Marc Reichman <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> I have some data stored in Accumulo with some scores stored as column
>>>>>> qualifiers (there was an older thread about this). I would like to find a
>>>>>> way to do thresholding when retrieving the data without retrieving it all
>>>>>> and then manually filtering out items below my threshold.
>>>>>>
>>>>>> I know I can "fetch" column qualifiers which are exact.
>>>>>>
>>>>>> I've seen the ColumnQualifierFilter, which I assume is what's in play
>>>>>> when I fetch qualifiers. Is there a reasonable pattern to extend this and
>>>>>> try to use it as a scan iterator so I can do things like "greater than" a
>>>>>> value which will be interpreted as a Double vs. the string equality going
>>>>>> on now?
>>>>>>
>>>>>> Thanks,
>>>>>> Marc
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
--
John Stoneham
[EMAIL PROTECTED]
+
John Vines 2013-08-22, 23:16
+
Marc Reichman 2013-08-22, 17:35
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB