Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Ordinary file pointer?


Copy link to this message
-
Re: Ordinary file pointer?
Keith,

On Sat, May 22, 2010 at 5:01 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
> On May 21, 2010, at 16:07 , Mikhail Yakshin wrote:
>
>> On Fri, May 21, 2010 at 11:09 PM, Keith Wiley wrote:
>>> My Java mapper hands its processing off to C++ through JNI.  On the C++ side I need to access a file.  I have already implemented a version of this interface in which the file is read entirely into RAM on the Java side and is handed through JNI as a byte[] (received as a char[] of course).  However, it would simplify things if on the C++ side my code had access to a conventional FILE* or file path, not a char[] in memory.  The reason for this is that I will be relying on an existing set of C and C++ code which assumes it will be handed a filename (or perhaps a FILE*).  Handing it a char[] is not ideal for my use.
>>>
>>> ...so, can I take a file from HDFS and reference it via a conventional path for fopen() or ifstream() usage?
>>>
>>> If I can't do this directly because HDFS is too unconventional (what with the distributed blocks and all) can I at least do this from the distributed cache perhaps?  Could I load the file into the distributed cache on the Java side and then tell the C/C++ side where it is in the distributed cache?  What would that look like?  It would have to be a "path" or some sort.  I admit, I a bit vague on the details.
>>
>> Try using distributed cache: this way you'll get your HDFS file
>> pre-distributed to local file system of all nodes that would be
>> executing your job. This way you can get full local file name from
>> using DistributedCache java object and open it normally using normal
>> fopen().
>
>
> Ah, excellent.  The only question that remains is how to get a local path to a file in the distributed cache.

You can use DistributedCache.getLocalCacheFiles or
JobContext#getLocalCacheFiles in newer versions. Also would libhdfs
help in reading directly from DFS ?

Thanks
Hemanth
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB