Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> egrep usage - 1.3.4


Copy link to this message
-
Re: egrep usage - 1.3.4
On Mon, Aug 6, 2012 at 3:13 PM, John Vines <[EMAIL PROTECTED]> wrote:
> Yeah, that was the case I thought of as well. However, I think it would be
> worthwhile to support the improved behavior. Unfortunately, I'm stuck on
> trying to think of a better command for it, since egrep itself is the
> appropriate command and we just have a bit of a misnomer.
>
> I hate this convention, but one option is to introduce egrep2 which is the
> improved behavior, and then put in warning messages informing users that the
> egrep command will be superceded by the egrep2 functionality in the
> following release. Or we could just stick with the two egrep commands in
> perpetuity.

I was mainly thinking of the iterator when thinking of preserving
behavior, because its used by code.   An option could be added to the
RegExFilter to support find().

If you assume that just people use the egrep command in the shell,
then it may be ok to change its behavior because a person could adapt.
 However, this is probably a poor assumption.  I try to think of the
shell as part of the public API.  Scripts could call the egrep
command, and scripts would not automatically adapt to a change in
behavior.  Also this would make it hard to use the same script that
uses egrep against Accumulo 1.4 and 1.5.

Instead of a new command, we could add an option to the egrep command,
like -f.  When the -f option is present it will set the option on the
RegExFilter to use find().

>
> John
>
>
> On Mon, Aug 6, 2012 at 3:01 PM, Michael Flester <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> You are right. I had inadvertently constrained my thinking
>> to patterns of the form match(".*{x}.*") == find(".*{x}.*") == find("{x}")
>> but that isn't everything someone
>> might be using it for.
>>
>>
>>
>> On Mon, Aug 6, 2012 at 9:26 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>>
>>> I was thinking find() will select everything that match() does and
>>> more.  So it may return data that someone used to the current behavior
>>> is not expecting, which could break existing code that uses it.   For
>>> example ".*foo" would select "cfooa" with find() but not with match().
>>>
>>> On Sun, Aug 5, 2012 at 7:16 PM, Michael Flester <[EMAIL PROTECTED]>
>>> wrote:
>>> > Keith --
>>> >
>>> > Switching from match to find should be no change for anyone that is
>>> > currently using it.
>>> > All patterns that "match" will equally "find". But new users would be
>>> > able
>>> > to take advantage
>>> > of not adding the wildcards on both ends.
>>> >
>>> > Mike
>>> >
>>> >
>>> > On Tue, Jul 31, 2012 at 11:21 AM, Keith Turner <[EMAIL PROTECTED]>
>>> > wrote:
>>> >>
>>> >> On Sun, Jul 29, 2012 at 9:47 PM, Michael Flester <[EMAIL PROTECTED]>
>>> >> wrote:
>>> >> >
>>> >> >
>>> >> > On Sat, Jul 28, 2012 at 7:57 PM, John Vines <[EMAIL PROTECTED]>
>>> >> > wrote:
>>> >> >>
>>> >> >> And when dealing with java, it does full matches, so adding the .*
>>> >> >> to
>>> >> >> start and end is necessary.
>>> >> >>
>>> >> >
>>> >> > Java has both Matcher#matches and Matcher#find. The latter would
>>> >> > operate
>>> >> > more
>>> >> > like the egrep(1) command without requiring the wildcards on both
>>> >> > ends.
>>> >>
>>> >> Ah, It should have used the find() call when it was first written.
>>> >> Changing it now would be tricky because people who expect the current
>>> >> behavior could get unexpected results.  I think we are kinda stuck
>>> >> with the current behavior.   Could possibly add an option to use
>>> >> find() instead of match().
>>> >
>>> >
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB