I need to copy log files from our web servers (only 14 servers) to HDFS.
Before deploying Kfaka or Flume, I just want to provide a simple solution
to our TechOp guys using NFS or HTTP with which they are familiar.
My questions are:
1. Compared with NFS/HttpFS, which one is faster?
2. If I start a NFS or HttpFS gateway on each DataNode, can I setup a
load balancer for those gateways? Does NFS work with load balancer?
3. If use NFS and load balancer doesn't work with NFS, can "automounter
+ DNS round robin" help?
1. Different servers will write different files into HDFS.
2. A cron job will be invoked every hour to copy the archived log
file of the previous hour to HDFS.
3. I expect it works like this if it is possible:
1. The cron job tries to copy a file into the mounted NFS
2. autofs get one of NFS gateway's IP address and mount it,
3. After copying the log file, the NFS directory will be idle. And
autofs umount the directory.
4. Next hour another NFS gateway is mounted.
5. Different servers will mount from different NFS gateways at the
same time so that throughput will be better.
If you setup a system like this, using either NFS or HttpFS, could you
share what need to be done by TechOp guys? I can setup those gateways on
Hadoop Cluster, but I am not familiar with load balancer and DNS stuff.
Thanks a lot.