Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Uneven write request to regions


Copy link to this message
-
Re: Uneven write request to regions
First I forgot to mention that <customerId> in our case is
MD5(<customerId>).
In our case, we have so much data flowing in, that we end up having a
region per <customerId><bucket> pretty quickly and even that, is splitted
into different regions by specific date duration (timestamp).

We're not witnessing a hotspot issue. I built some scripts in java and awk,
and saw that 66% of our customers use more than 1Rs.

We have two main serious issues: primary and secondary.

Our primary issue being the slow-region vs fast-region. First let's be
reminded that a region represents as I detailed before a specific
<customerId><bucket>. Some customers gets x50 times more data that other
customers at a specific time frame (2hrs - 1 day). So in a one RS, we have
regions getting 10 write requests per hour, vs 50k write requests per hour.
So the region mapped to the slow-filling customer id, doesn't get to the
256MB flush limit and hence isn't flushed, while the regions mapped to the
fast-filling customer id, are flushing very quickly since they are filling
very quickly.
Let's say the 1st WAL file contains the put of a slow-filling customerId.
the fast-filling customerId, fills up the rest of that file. After 20-30
seconds, the file gets rolled, and another file fills up with fast filling
customerId. After a while, we get to 32 WAL Files. The 1st file wasn't
deleted since its region wasn't flushed. The 32 limit makes HBase go into
stress mode, and dump all involving regions contains in those 32 WAL Files.
In our case, we saw that it flushes 111 regions. Lots of the store files
are 3k-3mb sized. So our compaction queue start filling up with those store
files needs to be compacted.
At the of the road, the RS gets dead.

Our secondary issue is those of empty regions - we get to a situation where
a region is mapped to a specific <customerId>, <bucket>, and date range
(1/7 - 3/7). Those when we are in August (we TTL set to 30 days), those
regions gets empty and will never get filled again.
We assume this somehow wrecks havoc in the load balancer, and also MSLAB
probably steals 1-2 GB of memory for those empty regions.

Thanks!

On Sat, Nov 16, 2013 at 7:25 PM, Mike Axiak <[EMAIL PROTECTED]> wrote:

> Hi,
>
> One new key pattern that we're starting to use is a salt based on a shard.
> For example, let's take your key:
>
>   <customerId><bucket><timestampInMs><uniqueId>
>
> Consider a shard between 0 and 15 inclusive. We determine this with:
>
>  <shard> = abs(hash32(uniqueId) % 16)
>
> We can then define a salt to be based on customerId and the shard:
>
>  <salt> = hash32(<shard><customerId>)
>
> So then the new key becomes:
>
>  <salt><customerId><timestampInMs><uniqueId>
>
> This will distribute the data for a given customer across the N shards that
> you pick, while having a deterministic function for a given row key (so
> long as the # of shards you pick is fixed, otherwise you can migrate the
> data). Placing the bucket after the customerId doesn't help distribute the
> single customer's data at all. Furthermore, by using a separate hash
> (instead of just <shard><customerId>),  you're guaranteeing that new data
> will appear in a somewhat random location (i.e., solving the problem of
> adding a bunch of new data for a new customer).
>
> I have a key simulation script in python that I can start tweaking and
> share with people if they'd like.
>
> Hope this helps,
> Mike
>
>
> On Sat, Nov 16, 2013 at 1:16 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. all regions of that customer
> >
> > Since the rowkey starts with <customerId>, any single customer would only
> > span few regions (normally 1 region), right ?
> >
> >
> > On Fri, Nov 15, 2013 at 9:56 PM, Asaf Mesika <[EMAIL PROTECTED]>
> > wrote:
> >
> > > But when you read, you have to approach all regions of that customer,
> > > instead of pinpointing just one which contains that hour you want for
> > > example.
> > >
> > > On Friday, November 15, 2013, Ted Yu wrote:
> > >
> > > > bq. you must have your customerId, timestamp in the rowkey since you
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB