Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Issue reading consistently from an hbase test client app


Copy link to this message
-
Re: Issue reading consistently from an hbase test client app
What version of HBase are you on? Did you see anything out of place in the
master or regionserver logs? This should be happening...!
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Fri, Apr 16, 2010 at 10:27 AM, Charles Glommen <[EMAIL PROTECTED]> wrote:

> For a slightly unrelated reason, I needed to write a quick app to test some
> code running on our hadoop/hbase cluster. However, I seem to be having
> issues with getting consistent reads.
>
> Here's the scenario:
>
> This application scans some directories in hdfs, and reads lines of text
> from each file. A user ID is extracted from the line, then hbase is checked
> to see that the ID exists. In *all* cases the ID should exist in hbase.
> However, only about the first 100 or so (of about 1000) return valid
> results. After about 100 reads or so, the rest return null for
> Result.getValue(). You can see from the code that it takes a userID as a
> parameter. This is to illustrate that data is in fact in hbase.
> Setting*any*
> of the userIDs that produced null results as a parameter will result in a
> valid hbase read. Here is an abbreviated output that illustrates this
> oddity:
>
> First execution of application:
> ...(many 'good' output lines, like the following 2)
> bytes for user 139|754436243196115533|c: 1920
> bytes for user 139|754436243113796511|c: 1059
> bytes for user 141|754999187733044577|c: 0
> 1/171 FILE MAY HAVE LINE MISSING FROM HBASE!:
>
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T00-0700/fiqgvrl.events
> bytes for user *141|754717712663942409|c*: 0
> 2/172 FILE MAY HAVE LINE MISSING FROM HBASE!:
>
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T00-0700/fwesvqn.events
> bytes for user 141|755280633926232247|c: 0
> 3/173 FILE MAY HAVE LINE MISSING FROM HBASE!:
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T01-0700/wydfvn.events
> bytes for user 141|754436237930862231|c: 0
> 4/174 FILE MAY HAVE LINE MISSING FROM HBASE!:
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T01-0700/zpjyod.events
> byte
>
> ...and this continues for the remaining files.
>
> Second execution with *any* of the seemingly missing userIDs yields the
> following sample:
>
> Count bytes for commandline user 141|754717712663942409|c: 855
> ...(many 'good' output lines, like the following 1)
> bytes for user 141|qfbvndelauretis|a: 2907001
> bytes for user 141|754436240987076893|c: 0
> 1/208 FILE MAY HAVE LINE MISSING FROM HBASE!:
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T14-0700/hehvln.events
> bytes for user 141|754436241315533944|c: 0
> bytes for user 141|754436241215573999|c: 0
> 2/210 FILE MAY HAVE LINE MISSING FROM HBASE!:
>
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T15-0700/fvkeert.events
> ...
>
> Notice that the 'zeros' don't occur until file 208 this time. This is not
> random either, rerunning the above two will produce the exact same results,
> all day long. Its as if selecting the initial user allows its region to be
> read more consistently for the remainder of the run. Three last points: No
> exceptions are ever thrown, all region servers are up throughout the
> execution, and no other reads or writes are occurring on the cluster during
> the execution.
>
> Any thoughts of advice? This is really causing me pain at the moment.
>
> Oh, and here's the quick and dirty class that produces this:
>
> package com.touchcommerce.data.jobs.misc.partitioning_debug;
>
> import java.io.BufferedReader;
> import java.io.IOException;
> import java.io.InputStreamReader;
>
> import org.apache.hadoop.fs.FileStatus;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HTable;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
>
> import com.touchcommerce.data.Constants;
> import com.touchcommerce.data.services.resources.HDFSService;
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB