-Re: ISAM file location vs. read performance
William Slacum 2014-01-12, 22:42
Some data on short circuit reads would be great to have.
I'm unsure of how correct the "compaction leading to eventual locality"
postulation is. It seems, to me at least, that in the case of a multi-block
file, the file system would eventually try to distribute those blocks
rather than leave them all on a single host.
One quick correction: "not splittable" means that the file can't be
processed (ie, MapReduce'd over) in chunks, not that the file won't be
split into blocks.
On Sun, Jan 12, 2014 at 1:58 PM, Arshak Navruzyan <[EMAIL PROTECTED]> wrote:
> Thanks for the explanation. I had to look up the HDFS block distribution
> documentation and it now makes complete sense.
> "the 1st replica is placed on the local machine"
> So since the compacted RFile is not splittable by HDFS, this ensures that
> the whole thing will be available where the Accumulo tablet is running.
> Maybe I can test out the shortcircuit reads and report back.
> On Sun, Jan 12, 2014 at 9:36 AM, John Vines <[EMAIL PROTECTED]> wrote:
>> So I'm not certain on our performance with short circuit reads, aside
>> from them being better.
>> But because of the way hdfs writes get distributed, a tablet server has a
>> strong probability of being a local read, so that is there. This is because
>> a tserver with ultimately end up major compacting it's files, ensuring
>> locality. So simply constantly ingesting will lead to eventual locality if
>> it wasn't there before. It just so happens those reads go through a
>> datanode, but not necessarily through the network.
>> Sent from my phone, please pardon the typos and brevity.
>> On Jan 12, 2014 12:29 PM, "Arshak Navruzyan" <[EMAIL PROTECTED]> wrote:
>>> One aspect of Accumulo architecture is still unclear to me. Would you
>>> achieve better scan performance if you could guarantee that the tablet and
>>> its ISAM file lived on the same node? Guessing ISAM files are not
>>> splittable so they pretty much stay on one HDFS data node (plus the replica
>>> copy). Or is the theory that SATA and a 10GBps network provide more or less
>>> the same throughput?
>>> I generally understand that as the table grows and Accumulo creates more
>>> splits (tablets) you get better distribution over the cluster but seems
>>> like data location would still be important. HBase folks seem to think
>>> that you can approx. double your throughput if let the region server
>>> directly read the file (dfs.client.read.shortcircuit=true) as opposed to
>>> going through the data node. (
>>> http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf). Perhaps this
>>> is due more to HDFS overhead?
>>> I do get that one really nice thing about Accumulo's architecture is
>>> that it costs almost nothing to reassign tablet to a different tserver and
>>> this is a huge problem for other systems.