Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> schema help


+
Rita 2011-08-25, 14:03
+
Rita 2011-08-25, 14:53
+
Ian Varley 2011-08-25, 15:03
+
Rita 2011-08-25, 15:12
+
Jimson K. James 2011-08-26, 03:34
+
Sonal Goyal 2011-08-26, 05:08
+
Jimson K. James 2011-08-26, 06:51
Copy link to this message
-
Re: schema help
Hi Jimson,

Are you talking about hbase.regionserver.blockCacheHitRatio ?

http://hbase.apache.org/book/rs_metrics.html

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Fri, Aug 26, 2011 at 12:21 PM, Jimson K. James <
[EMAIL PROTECTED]> wrote:

> Hi Sonal,
>
> Nice references, thank you :)
> What I'm currently after is the data distribution in Hbase, Is there any
> hbase hit ratio measuring tool?
> Searching for some ways to get hit ratio per region, Is it possible?
>
> Thanks,
>
> -----Original Message-----
> From: Sonal Goyal [mailto:[EMAIL PROTECTED]]
> Sent: Friday, August 26, 2011 10:38 AM
> To: [EMAIL PROTECTED]
> Subject: Re: schema help
>
> Hi Jimson,
>
> Here are a few links that talk about the sorted architecture:
>
> http://wiki.apache.org/hadoop/Hbase/DataModel
> http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
>
> i think the original BigTable paper ought to have some details too, I am
> sorry I havent read it recently to quote with authority.
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Fri, Aug 26, 2011 at 9:04 AM, Jimson K. James
> <[EMAIL PROTECTED]
> > wrote:
>
> > Hi Ian,
> >
> > Can you just get me some reference to the key sorted architecture in
> > hbase?
> > Seems there is not much documentation out there.
> >
> >
> > -----Original Message-----
> > From: Ian Varley [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, August 25, 2011 8:33 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: schema help
> >
> > The rows don't need to be inserted in order; they're maintained in
> > key-sorted order on the disk based on the architecture of HBase, which
> > stores data sorted in memory and periodically flushes to immutable
> files
> > in HDFS (which are later compacted to make read access more
> efficient).
> > HBase keeps track of which physical files might contain a given key
> > range, and only reads the ones it needs to.
> >
> > To do a query through the java API, you could create a scanner with a
> > startrow that is the concatenation of your value for fieldA and the
> > start time, and an endrow that has the current time.
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
> >
> > Ian
> >
> > On Aug 25, 2011, at 9:53 AM, Rita wrote:
> >
> > Thanks for your reponse.
> >
> > 30 million rows is the best case :-)
> >
> > Couple of questions about doing, [fieldA][time] as my key:
> >  Would I have to insert in order?
> >  If no, how would hbase know to stop scanning the entire table?
> >  How would a query actually look like, if my key was [fieldA time]?
> >
> > As a matter of fact, I can do 100% of my queries. I will leave the 5%
> > out of my project/schema.
> >
> >
> > On Thu, Aug 25, 2011 at 10:13 AM, Ian Varley
> > <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> > Rita,
> >
> > There's no need to create separate tables here--the table is really
> just
> > a "namespace" for keys. A better option would probably be having one
> > table with "[fieldA][time]" (the two fields concatenated) as your row
> > key. Then, you can seek directly to the start of your records in
> > constant time, and then scan forward until you get to the end of the
> > data (linear time in the size of data you expect to get back).
> >
> > The downside of this is that for the 5% of your queries that aren't in
> > this form, you may have to do a full table scan. (Alternately, you
> could
> > also maintain secondary indexes that help you get the data back with
> > less than a full table scan; that would depend on the nature of the
> > queries).
> >
> > In general, a good rule of thumb when designing a schema in HBase is,
> > think first about how you'd ideally like to access the data. Then
+
Jimson K. James 2011-08-26, 07:17
+
Jimson K. James 2011-08-26, 07:26
+
Sheng Chen 2011-08-26, 06:08
+
Buttler, David 2011-08-26, 16:08
+
lars hofhansl 2011-08-26, 18:50
+
Doug Meil 2011-08-26, 19:09
+
Sheng Chen 2011-08-29, 02:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB