-Re: DistributedCache - why not read directly from HDFS?
Arun C Murthy 2013-03-25, 22:30
More importantly, second and subsequent access of the file in DC is guaranteed to be local disk i/o.
On Mar 24, 2013, at 3:00 AM, Alberto Cordioli wrote:
> Thanks for your reply Harsh.
> So if I want to read a simple text file, choosing whether to use
> DistributedCachce or HDFS it becomes just a matter of performance.
> On 23 March 2013 16:17, Harsh J <[EMAIL PROTECTED]> wrote:
>> A DistributedCache is not used just to distribute simple files but
>> also native libraries and such which cannot be loaded by certain if
>> its on HDFS.
>> Also, keeping it on HDFS could provide less performant as non-local
>> reads could happen (depending on the files' replication factor).
>> On Sat, Mar 23, 2013 at 8:23 PM, Alberto Cordioli
>> <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>> I was not able to find an answer to the following question. If the
>>> question has already been answered please give me the pointer to the
>>> right thread.
>>> Which are actually the differences between read file from HDFS in one
>>> mapper and use DistributedCache.
>>> I saw that with DistributedCache you can give an hdfs path and the
>>> task nodes will get the data on local file system. But which
>>> advantages we have compared with a simple HDFS read with
>>> FSDataInputStream.open() method?
>>> Thank you very much,
>>> Alberto Cordioli
>> Harsh J
> Alberto Cordioli
Arun C. Murthy