|
Weihua JIANG
2012-01-05, 06:06
praveenesh kumar
2012-01-05, 06:19
Arun C Murthy
2012-01-06, 05:24
Yves Langisch
2012-01-06, 06:25
Harsh J
2012-01-06, 06:44
zizon
2012-01-06, 06:48
Stack
2012-01-06, 15:27
Yves Langisch
2012-01-08, 15:45
Zizon Qiu
2012-01-09, 02:35
Bryan Keller
2012-02-17, 21:21
Jean-Daniel Cryans
2012-02-17, 21:28
Bryan Keller
2012-02-17, 21:48
Jean-Daniel Cryans
2012-02-17, 22:12
|
-
Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Weihua JIANG 2012-01-05, 06:06
Hi all,
Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether hadoop 1.0.0 is compatible with HBase 0.90.5. And whether this combination is the best choice for production cluster now? Thanks Weihua
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?praveenesh kumar 2012-01-05, 06:19
Don't know about Hadoop 1.0.0 but Hadoop 0.20.205 and Hadoop 0.90.5 are
playing with each other very fine. Thanks, Praveenesh On Thu, Jan 5, 2012 at 11:36 AM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > Hi all, > > Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether hadoop > 1.0.0 is compatible with HBase 0.90.5. And whether this combination is the > best choice for production cluster now? > > Thanks > Weihua >
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Arun C Murthy 2012-01-06, 05:24
I know we've done integration testing with hadoop-1.0.0 and hbase-0.90.4 and things work well, not sure about hbase-0.90.5 (I don't imagine there are issues, but ymmv).
In fact, you get a nice perf boost with hadoop-1.0.0 for hbase if you enable the local-read-optimization. Arun On Jan 4, 2012, at 10:19 PM, praveenesh kumar wrote: > Don't know about Hadoop 1.0.0 but Hadoop 0.20.205 and Hadoop 0.90.5 are > playing with each other very fine. > > Thanks, > Praveenesh > > On Thu, Jan 5, 2012 at 11:36 AM, Weihua JIANG <[EMAIL PROTECTED]>wrote: > >> Hi all, >> >> Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether hadoop >> 1.0.0 is compatible with HBase 0.90.5. And whether this combination is the >> best choice for production cluster now? >> >> Thanks >> Weihua >>
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Yves Langisch 2012-01-06, 06:25
How can you enable the mentioned local-read-optimization for hadoop-1.0.0? I could not find any related information.
- Yves On Jan 6, 2012, at 6:24 AM, Arun C Murthy wrote: > I know we've done integration testing with hadoop-1.0.0 and hbase-0.90.4 and things work well, not sure about hbase-0.90.5 (I don't imagine there are issues, but ymmv). > > In fact, you get a nice perf boost with hadoop-1.0.0 for hbase if you enable the local-read-optimization. > > Arun > > On Jan 4, 2012, at 10:19 PM, praveenesh kumar wrote: > >> Don't know about Hadoop 1.0.0 but Hadoop 0.20.205 and Hadoop 0.90.5 are >> playing with each other very fine. >> >> Thanks, >> Praveenesh >> >> On Thu, Jan 5, 2012 at 11:36 AM, Weihua JIANG <[EMAIL PROTECTED]>wrote: >> >>> Hi all, >>> >>> Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether hadoop >>> 1.0.0 is compatible with HBase 0.90.5. And whether this combination is the >>> best choice for production cluster now? >>> >>> Thanks >>> Weihua >>> >
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Harsh J 2012-01-06, 06:44
Yves,
Take a look at the release notes section of https://issues.apache.org/jira/browse/HDFS-2246 for the configs that are relevant to that feature addition. On 06-Jan-2012, at 11:55 AM, Yves Langisch wrote: > How can you enable the mentioned local-read-optimization for hadoop-1.0.0? I could not find any related information. > > - > Yves > > On Jan 6, 2012, at 6:24 AM, Arun C Murthy wrote: > >> I know we've done integration testing with hadoop-1.0.0 and hbase-0.90.4 and things work well, not sure about hbase-0.90.5 (I don't imagine there are issues, but ymmv). >> >> In fact, you get a nice perf boost with hadoop-1.0.0 for hbase if you enable the local-read-optimization. >> >> Arun >> >> On Jan 4, 2012, at 10:19 PM, praveenesh kumar wrote: >> >>> Don't know about Hadoop 1.0.0 but Hadoop 0.20.205 and Hadoop 0.90.5 are >>> playing with each other very fine. >>> >>> Thanks, >>> Praveenesh >>> >>> On Thu, Jan 5, 2012 at 11:36 AM, Weihua JIANG <[EMAIL PROTECTED]>wrote: >>> >>>> Hi all, >>>> >>>> Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether hadoop >>>> 1.0.0 is compatible with HBase 0.90.5. And whether this combination is the >>>> best choice for production cluster now? >>>> >>>> Thanks >>>> Weihua >>>> >> >
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?zizon 2012-01-06, 06:48
<property>
<name>*dfs.client.read.shortcircuit*</name> <value>true</value> <description>set this to true to enable DFSClient short circuit read</description> </property> <property> <name>*dfs.block.local-path-access.user*</name> <value>hadoop</value> <description>add users that need perform short circuit read here,datanode will do security check before the read</description> </property> On Fri, Jan 6, 2012 at 2:25 PM, Yves Langisch <[EMAIL PROTECTED]> wrote: > How can you enable the mentioned local-read-optimization for hadoop-1.0.0? > I could not find any related information. > > - > Yves > > On Jan 6, 2012, at 6:24 AM, Arun C Murthy wrote: > > > I know we've done integration testing with hadoop-1.0.0 and hbase-0.90.4 > and things work well, not sure about hbase-0.90.5 (I don't imagine there > are issues, but ymmv). > > > > In fact, you get a nice perf boost with hadoop-1.0.0 for hbase if you > enable the local-read-optimization. > > > > Arun > > > > On Jan 4, 2012, at 10:19 PM, praveenesh kumar wrote: > > > >> Don't know about Hadoop 1.0.0 but Hadoop 0.20.205 and Hadoop 0.90.5 are > >> playing with each other very fine. > >> > >> Thanks, > >> Praveenesh > >> > >> On Thu, Jan 5, 2012 at 11:36 AM, Weihua JIANG <[EMAIL PROTECTED] > >wrote: > >> > >>> Hi all, > >>> > >>> Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether hadoop > >>> 1.0.0 is compatible with HBase 0.90.5. And whether this combination is > the > >>> best choice for production cluster now? > >>> > >>> Thanks > >>> Weihua > >>> > > > >
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Stack 2012-01-06, 15:27
On Thu, Jan 5, 2012 at 10:48 PM, zizon <[EMAIL PROTECTED]> wrote:
> <property> > <name>*dfs.client.read.shortcircuit*</name> > <value>true</value> > <description>set this to true to enable DFSClient short circuit > read</description> > </property> > You must set the above on both client and server side; i.e. in hbase-site.xml and in hdfs-site.xml. > <property> > <name>*dfs.block.local-path-access.user*</name> > <value>hadoop</value> > <description>add users that need perform short circuit read here,datanode > will do security check before the read</description> > </property> > This you set server-side only. Adjust the value 'hadoop' accordingly. I believe the only way to tell local reads are working is jstacking your hbase and looking for local block accesses. We''ll fix our documentation to include the above. St.Ack > > On Fri, Jan 6, 2012 at 2:25 PM, Yves Langisch <[EMAIL PROTECTED]> wrote: > >> How can you enable the mentioned local-read-optimization for hadoop-1.0.0? >> I could not find any related information. >> >> - >> Yves >> >> On Jan 6, 2012, at 6:24 AM, Arun C Murthy wrote: >> >> > I know we've done integration testing with hadoop-1.0.0 and hbase-0.90.4 >> and things work well, not sure about hbase-0.90.5 (I don't imagine there >> are issues, but ymmv). >> > >> > In fact, you get a nice perf boost with hadoop-1.0.0 for hbase if you >> enable the local-read-optimization. >> > >> > Arun >> > >> > On Jan 4, 2012, at 10:19 PM, praveenesh kumar wrote: >> > >> >> Don't know about Hadoop 1.0.0 but Hadoop 0.20.205 and Hadoop 0.90.5 are >> >> playing with each other very fine. >> >> >> >> Thanks, >> >> Praveenesh >> >> >> >> On Thu, Jan 5, 2012 at 11:36 AM, Weihua JIANG <[EMAIL PROTECTED] >> >wrote: >> >> >> >>> Hi all, >> >>> >> >>> Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether hadoop >> >>> 1.0.0 is compatible with HBase 0.90.5. And whether this combination is >> the >> >>> best choice for production cluster now? >> >>> >> >>> Thanks >> >>> Weihua >> >>> >> > >> >>
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Yves Langisch 2012-01-08, 15:45
I have no special settings for the hadoop security. Is it necessary to specify dfs.block.local-path-access.user then? If yes, is it the same user my hadoop daemon is running under?
Thanks Yves On Jan 6, 2012, at 4:27 PM, Stack wrote: > On Thu, Jan 5, 2012 at 10:48 PM, zizon <[EMAIL PROTECTED]> wrote: >> <property> >> <name>*dfs.client.read.shortcircuit*</name> >> <value>true</value> >> <description>set this to true to enable DFSClient short circuit >> read</description> >> </property> >> > > You must set the above on both client and server side; i.e. in > hbase-site.xml and in hdfs-site.xml. > >> <property> >> <name>*dfs.block.local-path-access.user*</name> >> <value>hadoop</value> >> <description>add users that need perform short circuit read here,datanode >> will do security check before the read</description> >> </property> >> > > This you set server-side only. Adjust the value 'hadoop' accordingly. > > I believe the only way to tell local reads are working is jstacking > your hbase and looking for local block accesses. > > We''ll fix our documentation to include the above. > > St.Ack > >> >> On Fri, Jan 6, 2012 at 2:25 PM, Yves Langisch <[EMAIL PROTECTED]> wrote: >> >>> How can you enable the mentioned local-read-optimization for hadoop-1.0.0? >>> I could not find any related information. >>> >>> - >>> Yves >>> >>> On Jan 6, 2012, at 6:24 AM, Arun C Murthy wrote: >>> >>>> I know we've done integration testing with hadoop-1.0.0 and hbase-0.90.4 >>> and things work well, not sure about hbase-0.90.5 (I don't imagine there >>> are issues, but ymmv). >>>> >>>> In fact, you get a nice perf boost with hadoop-1.0.0 for hbase if you >>> enable the local-read-optimization. >>>> >>>> Arun >>>> >>>> On Jan 4, 2012, at 10:19 PM, praveenesh kumar wrote: >>>> >>>>> Don't know about Hadoop 1.0.0 but Hadoop 0.20.205 and Hadoop 0.90.5 are >>>>> playing with each other very fine. >>>>> >>>>> Thanks, >>>>> Praveenesh >>>>> >>>>> On Thu, Jan 5, 2012 at 11:36 AM, Weihua JIANG <[EMAIL PROTECTED] >>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether hadoop >>>>>> 1.0.0 is compatible with HBase 0.90.5. And whether this combination is >>> the >>>>>> best choice for production cluster now? >>>>>> >>>>>> Thanks >>>>>> Weihua >>>>>> >>>> >>> >>> >
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Zizon Qiu 2012-01-09, 02:35
It should be the same as hbase daemon user.
the check perform by datanode are implement as follow, inside a RPC call. the "current user" refer to the remote user,in this case, should the same as your hbase user private void checkBlockLocalPathAccess() throws IOException { checkKerberosAuthMethod("getBlockLocalPathInfo()"); *String currentUser UserGroupInformation.getCurrentUser().getShortUserName();* if (!*currentUser*.equals(this.userWithLocalPathAccess)) { throw new AccessControlException( "Can't continue with getBlockLocalPathInfo() " + "authorization. The user " + currentUser + " is not allowed to call getBlockLocalPathInfo"); } } On Sun, Jan 8, 2012 at 11:45 PM, Yves Langisch <[EMAIL PROTECTED]> wrote: > I have no special settings for the hadoop security. Is it necessary to > specify dfs.block.local-path-access.user then? If yes, is it the same user > my hadoop daemon is running under? > > Thanks > Yves > > On Jan 6, 2012, at 4:27 PM, Stack wrote: > > > On Thu, Jan 5, 2012 at 10:48 PM, zizon <[EMAIL PROTECTED]> wrote: > >> <property> > >> <name>*dfs.client.read.shortcircuit*</name> > >> <value>true</value> > >> <description>set this to true to enable DFSClient short circuit > >> read</description> > >> </property> > >> > > > > You must set the above on both client and server side; i.e. in > > hbase-site.xml and in hdfs-site.xml. > > > >> <property> > >> <name>*dfs.block.local-path-access.user*</name> > >> <value>hadoop</value> > >> <description>add users that need perform short circuit read > here,datanode > >> will do security check before the read</description> > >> </property> > >> > > > > This you set server-side only. Adjust the value 'hadoop' accordingly. > > > > I believe the only way to tell local reads are working is jstacking > > your hbase and looking for local block accesses. > > > > We''ll fix our documentation to include the above. > > > > St.Ack > > > >> > >> On Fri, Jan 6, 2012 at 2:25 PM, Yves Langisch <[EMAIL PROTECTED]> wrote: > >> > >>> How can you enable the mentioned local-read-optimization for > hadoop-1.0.0? > >>> I could not find any related information. > >>> > >>> - > >>> Yves > >>> > >>> On Jan 6, 2012, at 6:24 AM, Arun C Murthy wrote: > >>> > >>>> I know we've done integration testing with hadoop-1.0.0 and > hbase-0.90.4 > >>> and things work well, not sure about hbase-0.90.5 (I don't imagine > there > >>> are issues, but ymmv). > >>>> > >>>> In fact, you get a nice perf boost with hadoop-1.0.0 for hbase if you > >>> enable the local-read-optimization. > >>>> > >>>> Arun > >>>> > >>>> On Jan 4, 2012, at 10:19 PM, praveenesh kumar wrote: > >>>> > >>>>> Don't know about Hadoop 1.0.0 but Hadoop 0.20.205 and Hadoop 0.90.5 > are > >>>>> playing with each other very fine. > >>>>> > >>>>> Thanks, > >>>>> Praveenesh > >>>>> > >>>>> On Thu, Jan 5, 2012 at 11:36 AM, Weihua JIANG < > [EMAIL PROTECTED] > >>>> wrote: > >>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> Hadoop 1.0.0 and HBase 0.90.5 are released. I am curious whether > hadoop > >>>>>> 1.0.0 is compatible with HBase 0.90.5. And whether this combination > is > >>> the > >>>>>> best choice for production cluster now? > >>>>>> > >>>>>> Thanks > >>>>>> Weihua > >>>>>> > >>>> > >>> > >>> > > > >
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Bryan Keller 2012-02-17, 21:21
I have been experimenting with local reads. For me, enabling did not help improve read performance at all, I get the same performance either way. I can see in the data node logs it is passing back the local path, so it is enabled properly.
Perhaps the benefits of local reads are dependent on the type of data and the workload? In my test I'm scanning through the entire table via a map reduce job. It's a wide table with maybe 20k columns per row on average. I have scanner caching set to 10. My read performance is about 10% of the disk max read throughput, i.e. my disks can get 100 mb/sec tested with hdparm and scan performance is about 10 mb/sec. Not too bad I suppose. On Jan 8, 2012, at 6:35 PM, Zizon Qiu wrote: > It should be the same as hbase daemon user. > > the check perform by datanode are implement as follow, inside a RPC call. > the "current user" refer to the remote user,in this case, should the same > as your hbase user > > private void checkBlockLocalPathAccess() throws IOException { > checkKerberosAuthMethod("getBlockLocalPathInfo()"); > *String currentUser > UserGroupInformation.getCurrentUser().getShortUserName();* > if (!*currentUser*.equals(this.userWithLocalPathAccess)) { > throw new AccessControlException( > "Can't continue with getBlockLocalPathInfo() " > + "authorization. The user " + currentUser > + " is not allowed to call getBlockLocalPathInfo"); > } > } >
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Jean-Daniel Cryans 2012-02-17, 21:28
On Fri, Feb 17, 2012 at 1:21 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:
> I have been experimenting with local reads. For me, enabling did not help improve read performance at all, I get the same performance either way. I can see in the data node logs it is passing back the local path, so it is enabled properly. I was surprised when I read this until I saw this: > > Perhaps the benefits of local reads are dependent on the type of data and the workload? In my test I'm scanning through the entire table via a map reduce job. It's a wide table with maybe 20k columns per row on average. I have scanner caching set to 10. It's definitely not going to help make sequential reads faster. > > My read performance is about 10% of the disk max read throughput, i.e. my disks can get 100 mb/sec tested with hdparm and scan performance is about 10 mb/sec. Not too bad I suppose. Maybe you're not pushing it enough? J-D
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Bryan Keller 2012-02-17, 21:48
I was thinking (wrongly it seems) that having the region server read directly from the local file system would be faster than going through the data node, even with sequential access.
On Feb 17, 2012, at 1:28 PM, Jean-Daniel Cryans wrote: > On Fri, Feb 17, 2012 at 1:21 PM, Bryan Keller <[EMAIL PROTECTED]> wrote: >> I have been experimenting with local reads. For me, enabling did not help improve read performance at all, I get the same performance either way. I can see in the data node logs it is passing back the local path, so it is enabled properly. > > I was surprised when I read this until I saw this: > >> >> Perhaps the benefits of local reads are dependent on the type of data and the workload? In my test I'm scanning through the entire table via a map reduce job. It's a wide table with maybe 20k columns per row on average. I have scanner caching set to 10. > > It's definitely not going to help make sequential reads faster. > >> >> My read performance is about 10% of the disk max read throughput, i.e. my disks can get 100 mb/sec tested with hdparm and scan performance is about 10 mb/sec. Not too bad I suppose. > > Maybe you're not pushing it enough? > > J-D
-
Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?Jean-Daniel Cryans 2012-02-17, 22:12
The gist of the answer is that, unlike random reads, the blocks we
read sequentially from the fs are wholly consumed so you end up doing less fs calls thus the total proportion of the time spent talking to datanodes is lessened (which is what local reads help). Also the dfs client keeps a block reader opened so that every time you read from the same hdfs block it doesn't have to setup the socket to the datanode again (which is what random reading does if you don't setup local reads). J-D On Fri, Feb 17, 2012 at 1:48 PM, Bryan Keller <[EMAIL PROTECTED]> wrote: > I was thinking (wrongly it seems) that having the region server read directly from the local file system would be faster than going through the data node, even with sequential access. > > On Feb 17, 2012, at 1:28 PM, Jean-Daniel Cryans wrote: > >> On Fri, Feb 17, 2012 at 1:21 PM, Bryan Keller <[EMAIL PROTECTED]> wrote: >>> I have been experimenting with local reads. For me, enabling did not help improve read performance at all, I get the same performance either way. I can see in the data node logs it is passing back the local path, so it is enabled properly. >> >> I was surprised when I read this until I saw this: >> >>> >>> Perhaps the benefits of local reads are dependent on the type of data and the workload? In my test I'm scanning through the entire table via a map reduce job. It's a wide table with maybe 20k columns per row on average. I have scanner caching set to 10. >> >> It's definitely not going to help make sequential reads faster. >> >>> >>> My read performance is about 10% of the disk max read throughput, i.e. my disks can get 100 mb/sec tested with hdparm and scan performance is about 10 mb/sec. Not too bad I suppose. >> >> Maybe you're not pushing it enough? >> >> J-D > |