On Sat, May 22, 2010 at 5:01 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
> On May 21, 2010, at 16:07 , Mikhail Yakshin wrote:
>> On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote:
>>> My Java mapper hands its processing off to C++ through JNI. On the C++ side I need to access a file. I have already implemented a version of this interface in which the file is read entirely into RAM on the Java side and is handed through JNI as a byte (received as a char of course). However, it would simplify things if on the C++ side my code had access to a conventional FILE* or file path, not a char in memory. The reason for this is that I will be relying on an existing set of C and C++ code which assumes it will be handed a filename (or perhaps a FILE*). Handing it a char is not ideal for my use.
>>> ...so, can I take a file from HDFS and reference it via a conventional path for fopen() or ifstream() usage?
>>> If I can't do this directly because HDFS is too unconventional (what with the distributed blocks and all) can I at least do this from the distributed cache perhaps? Could I load the file into the distributed cache on the Java side and then tell the C/C++ side where it is in the distributed cache? What would that look like? It would have to be a "path" or some sort. I admit, I a bit vague on the details.
>> Try using distributed cache: this way you'll get your HDFS file
>> pre-distributed to local file system of all nodes that would be
>> executing your job. This way you can get full local file name from
>> using DistributedCache java object and open it normally using normal
> Ah, excellent. The only question that remains is how to get a local path to a file in the distributed cache.
You can use DistributedCache.getLocalCacheFiles or
JobContext#getLocalCacheFiles in newer versions. Also would libhdfs
help in reading directly from DFS ?