Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Confirming a Bug

Copy link to this message
Re: Confirming a Bug
Michel Segel 2012-03-23, 11:55
Peter, that doesnt make sense.
I mean I believe you in what you are saying, but don't see how a VPN in would cause this variance in results.

Do you have any speculative execution turned on?

Are you counting just the numbers of rows in the result set, or are you using counters in the map reduce? (I'm assuming that you are running a map/reduce, and not just a simple connection and single threaded scan...).

I apologize if this had already been answered, I hadn't been following this too closely.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 22, 2012, at 8:01 PM, Peter Wolf <[EMAIL PROTECTED]> wrote:

> Hello again Lars and Lars,
> Here is some additional information that may help you track this down.
> I think this behavior has something to do with my VPN.  My servers are on the Amazon Cloud and I normally run my client on my laptop via a VPN (Tunnelblick: OS X 10.7.3; Tunnelblick 3.2.3 (build 2891.2932)).  This is where I see the buggy behavior I describe.
> However, when my Client is running on an EC2 machine, then I get different behavior.  I can not prove that it is always correct, but in at least one case my current code does not work on my laptop, but gets the correct number of results on an EC2 machine.  Note that my scans are also much faster on the EC2 machine.
> I will do more tests to see if I can localize it further.
> Hope this helps
> Thank you again
> Peter
> On 3/19/12 2:24 PM, Peter Wolf wrote:
>> Hello Lars and Lars,
>> Thank you for you help and attention.
>> I wrote a standalone test that exhibits the bug.
>> http://dl.dropbox.com/u/68001072/HBaseScanCacheBug.java
>> Here is the output.  It shows how the number of results and key value pairs varies as caching in changed, and families are included.  It shows the bug starting with 3 families and 5000 caching.  It also shows a new bug, where the query always fails with an IOException with 4 families.
>> CacheSize FamilyCount ResultCount KeyValueCount
>> 1000 1 10000 10
>> 5000 1 10000 10