Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> FUSE HDFS significantly slower


Copy link to this message
-
Re: FUSE HDFS significantly slower

On Oct 26, 2010, at 11:25 AM, Hazem Mahmoud wrote:

> That raises a question that I am currently looking into and would appreciate any and all advice people have.
>
> We are replacing our current NetApp solution, which has served us well but we have outgrown it.
>
> I am looking at either upgrading to a bigger and meaner NetApp or possibly going with Hadoop (HDFS and Fuse ).

You'd probably better looking at something like Ceph or Lustre which are meant to be fully POSIX compliant.  

> I need to mount the "storage solution" (HDFS or SAN) to about 5 or 6 systems. I'm a little concerned about utilizing HDFS/Fuse for a couple of reasons:
> 1. Performance of Fuse (how does it compare to an iSCSI SAN solution for example)...i know, it probably depends on a lot of things, but just generally-speaking or any experiences anyone has had

FUSE in general (regardless of what you're using with it) is going to be significantly slower vs. a kernel-level file system.
> 2. Security/permissions (owner of all files show up as "nobody"

I doubt anyone has spent any time adding security the HDFS FUSE port.  So even though NetApp's Kerberos stack is pretty crappy (3DES only... seriously?) , you're going to get a better security model with it.

> Another question: Are there other options for mounting HDFS on these 5 or 6 systems for pure filesystem access ? (using NFS, etc)

No.  I keep hoping someone builds a pNFS/NFSv4.1 server on top of Hadoop, but alas not yet.

>
> Thanks everyone!
>
> -Hazem
>
> On Oct 26, 2010, at 5:43 AM, Brian Bockelman wrote:
>
>> In general, unless you run newer kernels and versions of FUSE as that ticket suggests, it is significantly slower in raw throughput.
>>
>> However, we generally don't have a day go by at my site where we don't push FUSE over 30Gbps, as the bandwidth is spread throughout nodes.  Additionally, as we are limited by the latency of spinning disk and random reads, we don't particularly hurt by going "only" 60MB/s on our nodes.  If we wanted to go faster, we use the native clients.
>>
>> Of course, if anyone wants to donate a lowly university 1.5PB of SSDs, I'm all ears :)
>>
>> Brian
>>
>> On Oct 26, 2010, at 12:40 AM, Ted Yu wrote:
>>
>>> https://issues.apache.org/jira/browse/HADOOP-3805 tried to mitigate this
>>> problem.
>>>
>>> On Mon, Oct 25, 2010 at 10:17 PM, aniket ray <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm seeing in my experiments that Fuse-HDFS is significantly slower (around
>>>> 3x slower) than using the Java hdfs API directly.
>>>> Wanted to ask if this slowness the norm? Or is there something wrong with
>>>> my
>>>> configuration.
>>>> Also is this purely JNI slowness or is there something deeper to it?
>>>>
>>>>
>>>> My experiment is basically opening a file in write mode and calling writes
>>>> multiple times  (close to 5GB data) to write to that file.
>>>>
>>>> Thanks for the help,
>>>> aniket ray
>>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB