|
Vikas Ashok Patil
2010-06-10, 09:37
Allen Wittenauer
2010-06-10, 13:57
Owen O'Malley
2010-06-10, 14:39
Vikas Ashok Patil
2010-06-11, 03:27
Allen Wittenauer
2010-06-11, 19:02
Vikas Ashok Patil
2010-06-12, 03:35
Vikas Ashok Patil
2010-06-15, 10:31
Allen Wittenauer
2010-06-15, 17:24
|
-
Integrating Lustre and HDFSVikas Ashok Patil 2010-06-10, 09:37
Hello All,
I would like to try out a hadoop configuration involving both lustre and hdfs. Hence I would like to know any thoughts/criticisms on the idea. In my cluster I have the lustre parallel file system which mainly exposes storage over a network. Also there is some local space on each node of the cluster. This space is not part of the lustre file system. My hadoop installation currently makes use of this local file system. However, I would like to make use of the space available through lustre(exposed over the network). Hence I was thinking of a way to integrate HDFS and lustre, where HDFS would manage the local storage and lustre would provide the storage over the network. Please let me know your thoughts on this. Thanks, Vikas A Patil
-
Re: Integrating Lustre and HDFSAllen Wittenauer 2010-06-10, 13:57
On Jun 10, 2010, at 2:37 AM, Vikas Ashok Patil wrote: > In my cluster I have the lustre parallel file system which mainly exposes > storage over a network. Also there is some local space on each node of the > cluster. This space is not part of the lustre file system. My hadoop > installation currently makes use of this local file system. However, I would > like to make use of the space available through lustre(exposed over the > network). Hence I was thinking of a way to integrate HDFS and lustre, where > HDFS would manage the local storage and lustre would provide the storage > over the network. > > Please let me know your thoughts on this. Your local storage should get used for MR. Use Lustre via file:// (LocalFileSystem, iirc) instead of HDFS via hdfs:// (DistributedFileSystem, irrc) as the default file system type.
-
Re: Integrating Lustre and HDFSOwen O'Malley 2010-06-10, 14:39
> Your local storage should get used for MR. Use Lustre via file:// (LocalFileSystem, iirc)
> instead of HDFS via hdfs:// (DistributedFileSystem, irrc) as the default file system type. If Lustre has integrated checksums, you'll want to use the RawLocalFileSystem instead of LocalFileSystem. You'll want to make it accessible via: fs.raw.impl = org.apache.hadoop.fs.RawLocalFileSystem so that urls like raw:///my/path won't go through the Hadoop checksum code. -- Owen
-
Re: Integrating Lustre and HDFSVikas Ashok Patil 2010-06-11, 03:27
Thanks for the replies.
If I have fs.default.name = file://my_lustre_mount_point , then only the lustre filesystem will be used. I would like to have something like fs.default.name=file://my_lustre_mount_point , hdfs://localhost:9123 so that both local filesystem and lustre are in use. Kindly correct me if I am missing something here. Thanks, Vikas On Thu, Jun 10, 2010 at 8:09 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > Your local storage should get used for MR. Use Lustre via file:// > (LocalFileSystem, iirc) > > instead of HDFS via hdfs:// (DistributedFileSystem, irrc) as the default > file system type. > > If Lustre has integrated checksums, you'll want to use the > RawLocalFileSystem instead of LocalFileSystem. You'll want to make it > accessible via: > > fs.raw.impl = org.apache.hadoop.fs.RawLocalFileSystem > > so that urls like raw:///my/path won't go through the Hadoop checksum code. > > -- Owen >
-
Re: Integrating Lustre and HDFSAllen Wittenauer 2010-06-11, 19:02
On Jun 10, 2010, at 8:27 PM, Vikas Ashok Patil wrote:
> Thanks for the replies. > > If I have fs.default.name = file://my_lustre_mount_point , then only the > lustre filesystem will be used. I would like to have something like > > fs.default.name=file://my_lustre_mount_point , hdfs://localhost:9123 > > so that both local filesystem and lustre are in use. > > Kindly correct me if I am missing something here. I guess we're all confused as to your use case. Why do you want to run two distributed file systems on the same nodes? Why can't you use Lustre for all your needs? As to fs.default.name, you can only have one. [That's why it is a default. *smile*] If you want to access more than one file system from within MapReduce, you'll need to specify it explicitly.
-
Re: Integrating Lustre and HDFSVikas Ashok Patil 2010-06-12, 03:35
Hello Allen,
Thanks for the reply. You are right about trying to run two distributed filesystems. The reason being, there are certain restrictions (in our cluster environment) to include the local file system into lustre. Please tell me how would I make mapreduce access more than one file system. At least the configs don't seem to allow it. Thanks, Vikas A Patil On Sat, Jun 12, 2010 at 12:32 AM, Allen Wittenauer <[EMAIL PROTECTED] > wrote: > On Jun 10, 2010, at 8:27 PM, Vikas Ashok Patil wrote: > > > Thanks for the replies. > > > > If I have fs.default.name = file://my_lustre_mount_point , then only the > > lustre filesystem will be used. I would like to have something like > > > > fs.default.name=file://my_lustre_mount_point , hdfs://localhost:9123 > > > > so that both local filesystem and lustre are in use. > > > > Kindly correct me if I am missing something here. > > I guess we're all confused as to your use case. Why do you want to run two > distributed file systems on the same nodes? Why can't you use Lustre for > all your needs? > > As to fs.default.name, you can only have one. [That's why it is a > default. *smile*] If you want to access more than one file system from > within MapReduce, you'll need to specify it explicitly.
-
Re: Integrating Lustre and HDFSVikas Ashok Patil 2010-06-15, 10:31
Hello Allen,
Sorry for bugging you regarding the same problem again. If you say "we need to be explicit having multiple file-systems" for map reduce jobs, are you hinting on code changes to be made to hadoop ? Please provide more details on this if possible. Thanks, Vikas On Sat, Jun 12, 2010 at 9:05 AM, Vikas Ashok Patil <[EMAIL PROTECTED]>wrote: > Hello Allen, > > Thanks for the reply. > > You are right about trying to run two distributed filesystems. The reason > being, there are certain restrictions (in our cluster environment) to > include the local file system into lustre. Please tell me how would I make > mapreduce access more than one file system. At least the configs don't seem > to allow it. > > Thanks, > Vikas A Patil > > > On Sat, Jun 12, 2010 at 12:32 AM, Allen Wittenauer < > [EMAIL PROTECTED]> wrote: > >> On Jun 10, 2010, at 8:27 PM, Vikas Ashok Patil wrote: >> >> > Thanks for the replies. >> > >> > If I have fs.default.name = file://my_lustre_mount_point , then only >> the >> > lustre filesystem will be used. I would like to have something like >> > >> > fs.default.name=file://my_lustre_mount_point , hdfs://localhost:9123 >> > >> > so that both local filesystem and lustre are in use. >> > >> > Kindly correct me if I am missing something here. >> >> I guess we're all confused as to your use case. Why do you want to run >> two distributed file systems on the same nodes? Why can't you use Lustre >> for all your needs? >> >> As to fs.default.name, you can only have one. [That's why it is a >> default. *smile*] If you want to access more than one file system from >> within MapReduce, you'll need to specify it explicitly. > > >
-
Re: Integrating Lustre and HDFSAllen Wittenauer 2010-06-15, 17:24
No, i'm saying your mapreduce code needs to explicitly reference every file system that it needs to access. You can't rely upon fs.default.name*. The distcp code could provide some guidance on how to do this. * maybe it isn't clear why this is, so let me spell it out a bit: fs.default.name is just that--a default. When you run hadoop dfs -ls with no qualifying file system url, it uses fs.default.name to figure out where that file system is actually at. Since you need to access two different file systems, you cannot make any such assumptions safely. This is also why you can't list two file systems in fs.default.name. When you run 'hadoop dfs -ls', it wouldn't be logical as to what exactly Hadoop should do, especially if the paths requested *conflict*. On Jun 15, 2010, at 3:31 AM, Vikas Ashok Patil wrote: > Hello Allen, > > Sorry for bugging you regarding the same problem again. If you say "we need > to be explicit having multiple file-systems" for map reduce jobs, are you > hinting on code changes to be made to hadoop ? Please provide more details > on this if possible. > > Thanks, > Vikas > > On Sat, Jun 12, 2010 at 9:05 AM, Vikas Ashok Patil <[EMAIL PROTECTED]>wrote: > >> Hello Allen, >> >> Thanks for the reply. >> >> You are right about trying to run two distributed filesystems. The reason >> being, there are certain restrictions (in our cluster environment) to >> include the local file system into lustre. Please tell me how would I make >> mapreduce access more than one file system. At least the configs don't seem >> to allow it. >> >> Thanks, >> Vikas A Patil >> >> >> On Sat, Jun 12, 2010 at 12:32 AM, Allen Wittenauer < >> [EMAIL PROTECTED]> wrote: >> >>> On Jun 10, 2010, at 8:27 PM, Vikas Ashok Patil wrote: >>> >>>> Thanks for the replies. >>>> >>>> If I have fs.default.name = file://my_lustre_mount_point , then only >>> the >>>> lustre filesystem will be used. I would like to have something like >>>> >>>> fs.default.name=file://my_lustre_mount_point , hdfs://localhost:9123 >>>> >>>> so that both local filesystem and lustre are in use. >>>> >>>> Kindly correct me if I am missing something here. >>> >>> I guess we're all confused as to your use case. Why do you want to run >>> two distributed file systems on the same nodes? Why can't you use Lustre >>> for all your needs? >>> >>> As to fs.default.name, you can only have one. [That's why it is a >>> default. *smile*] If you want to access more than one file system from >>> within MapReduce, you'll need to specify it explicitly. >> >> >> |