Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - Number of partitions for sharded table


+
Krishmin Rai 2012-10-30, 19:06
+
Krishmin Rai 2012-10-30, 19:28
+
dlmarion@... 2012-10-30, 19:43
Copy link to this message
-
Re: Number of partitions for sharded table
Krishmin Rai 2012-10-30, 19:57
Hi Dave,
  Our example is simpler than the wiki-search one, and I had forgotten exactly how wiki-search is structured: We're using a simple, single-table layout, everything sharded. We also have some other stuff in the column family, but simplified, it's just:

row: ShardID
colFam: Term
colQual: Document

And then searches will use an iterator extending IntersectingIterator to find results matching specified terms etc.

Thanks,
Krishmin

On Oct 30, 2012, at 3:43 PM, [EMAIL PROTECTED] wrote:

> Krishmin,
>
>   In the wikisearch example there is a non-sharded index table and a sharded document table. The index table is used to reduce the number of tablets that need to be searched for a given set of terms. Is your setup similar? I'm a little confused since you mention using a sharded index table and that all queries will have an infinite range.
>
> Dave Marion
>
> From: "Krishmin Rai" <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Tuesday, October 30, 2012 3:28:15 PM
> Subject: Re: Number of partitions for sharded table
>
> I should clarify that I've been pre-splitting tables at each shard so that each tablet consists of a single row.
>
> On Oct 30, 2012, at 3:06 PM, Krishmin Rai wrote:
>
> > Hi All,
> >  We're working with an index table whose row is a shardId (an integer, like the wiki-search or IndexedDoc examples). I was just wondering what the right strategy is for choosing a number of partitions, particularly given a cluster that could potentially grow.
> >
> >  If I simply set the number of shards equal to the number of slave nodes, additional nodes would not improve query performance (at least over the data already ingested). But starting with more partitions than slave nodes would result in multiple tablets per tablet server… I'm not really sure how that would impact performance, particularly given that all queries against the table will be batchscanners with an infinite range.
> >
> >  Just wondering how others have addressed this problem, and if there are any performance rules of thumb regarding the ratio of tablets to tablet servers.
> >
> > Thanks!
> > Krishmin
>

+
Adam Fuchs 2012-10-30, 20:08
+
Krishmin Rai 2012-10-30, 20:38