Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> DistributedCache - why not read directly from HDFS?


Copy link to this message
-
Re: DistributedCache - why not read directly from HDFS?
More importantly, second and subsequent access of the file in DC is guaranteed to be local disk i/o.

On Mar 24, 2013, at 3:00 AM, Alberto Cordioli wrote:

> Thanks for your reply Harsh.
> So if I want to read a simple text file, choosing whether to use
> DistributedCachce or HDFS it becomes just a matter of performance.
>
>
> Alberto
>
> On 23 March 2013 16:17, Harsh J <[EMAIL PROTECTED]> wrote:
>> A DistributedCache is not used just to distribute simple files but
>> also native libraries and such which cannot be loaded by certain if
>> its on HDFS.
>>
>> Also, keeping it on HDFS could provide less performant as non-local
>> reads could happen (depending on the files' replication factor).
>>
>> On Sat, Mar 23, 2013 at 8:23 PM, Alberto Cordioli
>> <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>>
>>> I was not able to find an answer to the following question. If the
>>> question has already been answered please give me the pointer to the
>>> right thread.
>>>
>>> Which are actually the differences between read file from HDFS in one
>>> mapper and use DistributedCache.
>>>
>>> I saw that with DistributedCache you can give an hdfs path and the
>>> task nodes will get the data on local file system. But which
>>> advantages we have compared with a simple HDFS read with
>>> FSDataInputStream.open() method?
>>>
>>> Thank you very much,
>>> Alberto
>>>
>>>
>>> --
>>> Alberto Cordioli
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Alberto Cordioli

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB