|
|
Keith Wiley 2010-05-21, 19:09
My Java mapper hands its processing off to C++ through JNI. On the C++ side I need to access a file. I have already implemented a version of this interface in which the file is read entirely into RAM on the Java side and is handed through JNI as a byte[] (received as a char[] of course). However, it would simplify things if on the C++ side my code had access to a conventional FILE* or file path, not a char[] in memory. The reason for this is that I will be relying on an existing set of C and C++ code which assumes it will be handed a filename (or perhaps a FILE*). Handing it a char[] is not ideal for my use.
...so, can I take a file from HDFS and reference it via a conventional path for fopen() or ifstream() usage?
If I can't do this directly because HDFS is too unconventional (what with the distributed blocks and all) can I at least do this from the distributed cache perhaps? Could I load the file into the distributed cache on the Java side and then tell the C/C++ side where it is in the distributed cache? What would that look like? It would have to be a "path" or some sort. I admit, I a bit vague on the details.
Thank you.
________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] www.keithwiley.com
"Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland ________________________________________________________________________________
-
Re: Ordinary file pointer?
Mikhail Yakshin 2010-05-21, 23:07
On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote: > My Java mapper hands its processing off to C++ through JNI. On the C++ side I need to access a file. I have already implemented a version of this interface in which the file is read entirely into RAM on the Java side and is handed through JNI as a byte[] (received as a char[] of course). However, it would simplify things if on the C++ side my code had access to a conventional FILE* or file path, not a char[] in memory. The reason for this is that I will be relying on an existing set of C and C++ code which assumes it will be handed a filename (or perhaps a FILE*). ��Handing it a char[] is not ideal for my use. > > ...so, can I take a file from HDFS and reference it via a conventional path for fopen() or ifstream() usage? > > If I can't do this directly because HDFS is too unconventional (what with the distributed blocks and all) can I at least do this from the distributed cache perhaps? Could I load the file into the distributed cache on the Java side and then tell the C/C++ side where it is in the distributed cache? What would that look like? It would have to be a "path" or some sort. I admit, I a bit vague on the details.
Try using distributed cache: this way you'll get your HDFS file pre-distributed to local file system of all nodes that would be executing your job. This way you can get full local file name from using DistributedCache java object and open it normally using normal fopen().
-- WBR, Mikhail Yakshin
-
Re: Ordinary file pointer?
Keith Wiley 2010-05-21, 23:31
On May 21, 2010, at 16:07 , Mikhail Yakshin wrote:
> On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote: >> My Java mapper hands its processing off to C++ through JNI. On the C++ side I need to access a file. I have already implemented a version of this interface in which the file is read entirely into RAM on the Java side and is handed through JNI as a byte[] (received as a char[] of course). However, it would simplify things if on the C++ side my code had access to a conventional FILE* or file path, not a char[] in memory. The reason for this is that I will be relying on an existing set of C and C++ code which assumes it will be handed a filename (or perhaps a FILE*). Handing it a char[] is not ideal for my use. >> >> ...so, can I take a file from HDFS and reference it via a conventional path for fopen() or ifstream() usage? >> >> If I can't do this directly because HDFS is too unconventional (what with the distributed blocks and all) can I at least do this from the distributed cache perhaps? Could I load the file into the distributed cache on the Java side and then tell the C/C++ side where it is in the distributed cache? What would that look like? It would have to be a "path" or some sort. I admit, I a bit vague on the details. > > Try using distributed cache: this way you'll get your HDFS file > pre-distributed to local file system of all nodes that would be > executing your job. This way you can get full local file name from > using DistributedCache java object and open it normally using normal > fopen(). Ah, excellent. The only question that remains is how to get a local path to a file in the distributed cache. ________________________________________________________________________________ Keith Wiley [EMAIL PROTECTED] www.keithwiley.com
"I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use." -- Galileo Galilei ________________________________________________________________________________
-
Re: Ordinary file pointer?
Hemanth Yamijala 2010-05-22, 10:37
Keith,
On Sat, May 22, 2010 at 5:01 AM, Keith Wiley <[EMAIL PROTECTED]> wrote: > On May 21, 2010, at 16:07 , Mikhail Yakshin wrote: > >> On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote: >>> My Java mapper hands its processing off to C++ through JNI. On the C++ side I need to access a file. I have already implemented a version of this interface in which the file is read entirely into RAM on the Java side and is handed through JNI as a byte[] (received as a char[] of course). However, it would simplify things if on the C++ side my code had access to a conventional FILE* or file path, not a char[] in memory. The reason for this is that I will be relying on an existing set of C and C++ code which assumes it will be handed a filename (or perhaps a FILE*). Handing it a char[] is not ideal for my use. >>> >>> ...so, can I take a file from HDFS and reference it via a conventional path for fopen() or ifstream() usage? >>> >>> If I can't do this directly because HDFS is too unconventional (what with the distributed blocks and all) can I at least do this from the distributed cache perhaps? Could I load the file into the distributed cache on the Java side and then tell the C/C++ side where it is in the distributed cache? What would that look like? It would have to be a "path" or some sort. I admit, I a bit vague on the details. >> >> Try using distributed cache: this way you'll get your HDFS file >> pre-distributed to local file system of all nodes that would be >> executing your job. This way you can get full local file name from >> using DistributedCache java object and open it normally using normal >> fopen(). > > > Ah, excellent. The only question that remains is how to get a local path to a file in the distributed cache.
You can use DistributedCache.getLocalCacheFiles or JobContext#getLocalCacheFiles in newer versions. Also would libhdfs help in reading directly from DFS ?
Thanks Hemanth
|
|