Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> flushing + compactions after config change


Copy link to this message
-
Re: 答复: flushing + compactions after config change
bq.
"> On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria <[EMAIL PROTECTED]>
> wrote:
> It's not random, it picks the region with the most data in its memstores.
>

That's weird, because I see some of my regions which receive the least
amount of data in a given time period flushing before the regions that are
receiving data continuously."

I agree with Viral here. When max logs are reached, we look at the oldest
wal and see which regions should be flushed in order to get that first wal
(read oldest) archived. In your case Viral, these regions could be those
which are not receiving many edits when 32 logs have been rolled.
It may be very specific to your use case, but you could try playing with
max number of logs? May be make them 16, 40, etc?

On Fri, Jun 28, 2013 at 4:53 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> On Fri, Jun 28, 2013 at 2:39 PM, Viral Bajaria <[EMAIL PROTECTED]>
> wrote:
> > On Fri, Jun 28, 2013 at 9:31 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]
> >wrote:
> >
> >> On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria <[EMAIL PROTECTED]
> >
> >> wrote:
> >> It's not random, it picks the region with the most data in its
> memstores.
> >>
> >
> > That's weird, because I see some of my regions which receive the least
> > amount of data in a given time period flushing before the regions that
> are
> > receiving data continuously. The reason I know this is because of the
> write
> > pattern. Some of my tables are in catch-up mode i.e. I am ingesting data
> > from the past and they always have something to do. While some tables are
> > not in catch-up mode and are just sitting idle for most of the time. Yet
> I
> > see high number of flushes for those regions too.
> >
> >
> >>
> >> I doubt that it's the fact that it's a major compaction that it's
> >> making everything worse. When a minor gets promoted into a major it's
> >> because we're already going to compact all the files, so we might as
> >> well get rid of some deletes at the same time. They are all getting
> >> selected because the files are within the selection ratio. I would not
> >> focus on this to resolve your problem.
> >>
> >
> > I meant worse for my writes not for HBase as a whole.
> >
> >
> >>
> >> I haven't been closely following this thread, but have you posted a
> >> log snippet somewhere? It's usually much more telling and we eliminate
> >> a few levels of interpretation. Make sure it's at DEBUG, and that you
> >> grab a few hours of activity. Get the GC log for the same time as
> >> well. Drop this on a web server or pastebin if it fits.
> >>
> >
> > The only log snippet that I posted was the flushing action. Also that log
> > was not everything, I had grep'd a few lines out. Let me collect some
> more
> > stats here and post it again. I just enabled GC logging on this server,
> > deployed the wrong config out initially which had no GC logging. I am not
> > sure how GC logs will help here given that I am at less than 50% heap
> space
> > used and so I would doubt a stop the world GC happening. Are you trying
> to
> > look for some other information ?
>
> Just trying to cover all the bases.
>
> J-D
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB