Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Ordinary file pointer?

Copy link to this message
Re: Ordinary file pointer?
Hemanth Yamijala 2010-05-22, 10:37

On Sat, May 22, 2010 at 5:01 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
> On May 21, 2010, at 16:07 , Mikhail Yakshin wrote:
>> On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote:
>>> My Java mapper hands its processing off to C++ through JNI.  On the C++ side I need to access a file.  I have already implemented a version of this interface in which the file is read entirely into RAM on the Java side and is handed through JNI as a byte[] (received as a char[] of course).  However, it would simplify things if on the C++ side my code had access to a conventional FILE* or file path, not a char[] in memory.  The reason for this is that I will be relying on an existing set of C and C++ code which assumes it will be handed a filename (or perhaps a FILE*).  Handing it a char[] is not ideal for my use.
>>> ...so, can I take a file from HDFS and reference it via a conventional path for fopen() or ifstream() usage?
>>> If I can't do this directly because HDFS is too unconventional (what with the distributed blocks and all) can I at least do this from the distributed cache perhaps?  Could I load the file into the distributed cache on the Java side and then tell the C/C++ side where it is in the distributed cache?  What would that look like?  It would have to be a "path" or some sort.  I admit, I a bit vague on the details.
>> Try using distributed cache: this way you'll get your HDFS file
>> pre-distributed to local file system of all nodes that would be
>> executing your job. This way you can get full local file name from
>> using DistributedCache java object and open it normally using normal
>> fopen().
> Ah, excellent.  The only question that remains is how to get a local path to a file in the distributed cache.

You can use DistributedCache.getLocalCacheFiles or
JobContext#getLocalCacheFiles in newer versions. Also would libhdfs
help in reading directly from DFS ?