Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: why my test result on dfs short circuit read is slower?


+
Harsh J 2013-02-16, 05:44
+
Liu, Raymond 2013-02-16, 05:53
Copy link to this message
-
Re: why my test result on dfs short circuit read is slower?
Arpit Gupta 2013-02-16, 06:05
Another way to check if short circuit read is configured correctly.

As the user who is configured for short circuit read issue the following command on a node where you expect the data to be read locally.

export HADOOP_ROOT_LOGGER=debug,console; hadoop dfs -cat /path/to/file_on_hdfs

On the console you should see something like "hdfs.DFSClient: New BlockReaderLocal for file...."

This would confirm that short circuit read is happening.

--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/

On Feb 15, 2013, at 9:53 PM, "Liu, Raymond" <[EMAIL PROTECTED]> wrote:

> Hi Harsh
>
> Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml.
>
> And I have double confirmed that local reads are performed, since there are no Error in datanode logs, and by watching lo network IO.
>
>>
>> If you want HBase to leverage the shortcircuit, the DN config
>> "dfs.block.local-path-access.user" should be set to the user running HBase (i.e.
>> hbase, for example), and the hbase-site.xml should have
>> "dfs.client.read.shortcircuit" defined in all its RegionServers. Doing this wrong
>> could result in performance penalty and some warn-logging, as local reads will
>> be attempted but will begin to fail.
>>
>> On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond <[EMAIL PROTECTED]>
>> wrote:
>>> Hi
>>>
>>>        I tried to use short circuit read to improve my hbase cluster MR
>> scan performance.
>>>
>>>        I have the following setting in hdfs-site.xml
>>>
>>>        dfs.client.read.shortcircuit set to true
>>>        dfs.block.local-path-access.user set to MR job runner.
>>>
>>>        The cluster is 1+4 node and each data node have 16cpu/4HDD,
>> with all hbase table major compact thus all data is local.
>>>        I have hoped that the short circuit read will improve the
>> performance.
>>>
>>>        While the test result is that with short circuit read enabled, the
>> performance actually dropped 10-15%. Say scan a 50G table cost around 100s
>> instead of 90s.
>>>
>>>        My hadoop version is 1.1.1, any idea on this? Thx!
>>>
>>> Best Regards,
>>> Raymond Liu
>>>
>>>
>>
>>
>>
>> --
>> Harsh J

+
Liu, Raymond 2013-02-16, 06:22
+
Liu, Raymond 2013-02-16, 07:54
+
谢良 2013-02-16, 04:04