|
|
-
querying the tablet server for given row (to get locality)?
Sukant Hajra 2012-07-01, 02:23
I've been considering using distributed messaging service (Akka in my case). To get some throughput on ingesting data, I was going to shard computation across multiple servers, but the backend is still Accumulo.
What bothers me is that I don't know the mapping from row IDs to tablet servers, so every one of my nodes is talking ostensibly to every tablet server, which is a lot of needless network traffic.
What I'd really like to do is collocate my computation on the relevant tablet server to get the same benefits of locality Accumulo gets with HDFS.
I feel Accumulo has to have this information internally, but I haven't dug deeply into the source to see if it's exposed to Accumulo clients. Is it there? If it is exposed, is it supported?
Thanks for the help, Sukant
-
Re: querying the tablet server for given row (to get locality)?
William Slacum 2012-07-01, 18:19
A tablet will contain at minimum one row. So, if you shard/partition, eventually your data will grow to the point that each tablet will essentially be one row. On Jul 1, 2012 2:17 PM, "Sukant Hajra" <[EMAIL PROTECTED]> wrote:
> I've been considering using distributed messaging service (Akka in my > case). > To get some throughput on ingesting data, I was going to shard computation > across multiple servers, but the backend is still Accumulo. > > What bothers me is that I don't know the mapping from row IDs to tablet > servers, so every one of my nodes is talking ostensibly to every tablet > server, > which is a lot of needless network traffic. > > What I'd really like to do is collocate my computation on the relevant > tablet > server to get the same benefits of locality Accumulo gets with HDFS. > > I feel Accumulo has to have this information internally, but I haven't dug > deeply into the source to see if it's exposed to Accumulo clients. Is it > there? If it is exposed, is it supported? > > Thanks for the help, > Sukant >
-
Re: querying the tablet server for given row (to get locality)?
John Vines 2012-07-01, 18:37
The tablet location is stored in the !METADATA table with the column family loc. You can use that information to have locality for your external processes. Keep in mind that the master will migrate tablets around, so you should have to periodically recheck to make sure your locality is still present.
John
On Sun, Jul 1, 2012 at 2:20 PM, William Slacum <[EMAIL PROTECTED]> wrote:
> A tablet will contain at minimum one row. So, if you shard/partition, > eventually your data will grow to the point that each tablet will > essentially be one row. > On Jul 1, 2012 2:17 PM, "Sukant Hajra" <[EMAIL PROTECTED]> wrote: > >> I've been considering using distributed messaging service (Akka in my >> case). >> To get some throughput on ingesting data, I was going to shard computation >> across multiple servers, but the backend is still Accumulo. >> >> What bothers me is that I don't know the mapping from row IDs to tablet >> servers, so every one of my nodes is talking ostensibly to every tablet >> server, >> which is a lot of needless network traffic. >> >> What I'd really like to do is collocate my computation on the relevant >> tablet >> server to get the same benefits of locality Accumulo gets with HDFS. >> >> I feel Accumulo has to have this information internally, but I haven't dug >> deeply into the source to see if it's exposed to Accumulo clients. Is it >> there? If it is exposed, is it supported? >> >> Thanks for the help, >> Sukant >> >
-
Re: querying the tablet server for given row (to get locality)?
Eric Newton 2012-07-01, 18:40
The class you can use to find the location of a tablet is TabletLocator.
You can get the table name to tableId mapping from TableOperations (TabletLocator takes a tabletId).
You might want to try just ingesting with the BatchWriter... even without locality, it's pretty fast. If you need to go faster, think about using BulkImport.
-Eric
On Sat, Jun 30, 2012 at 10:23 PM, Sukant Hajra <[EMAIL PROTECTED]> wrote: > I've been considering using distributed messaging service (Akka in my case). > To get some throughput on ingesting data, I was going to shard computation > across multiple servers, but the backend is still Accumulo. > > What bothers me is that I don't know the mapping from row IDs to tablet > servers, so every one of my nodes is talking ostensibly to every tablet server, > which is a lot of needless network traffic. > > What I'd really like to do is collocate my computation on the relevant tablet > server to get the same benefits of locality Accumulo gets with HDFS. > > I feel Accumulo has to have this information internally, but I haven't dug > deeply into the source to see if it's exposed to Accumulo clients. Is it > there? If it is exposed, is it supported? > > Thanks for the help, > Sukant
|
|