Hi All, I have around 20-30 geographically distant clients that need to write data to a centralized HBase server. I have dedicated VPN for the communication and hence bandwidth won't be a issue. is it a good idea to make the clients directly send data to the centralized server?. Or a geographically distributed Hadoop cluster is more efficient for this scenario? Has anybody come across such use case?. Need suggestions on the viability and efficiency of the setup to follow
I mean directly interacting with the remote zookeeper. Say I'm able to access the zk server and hbase server externally. On 3 April 2014 18:54, Ted Yu <[EMAIL PROTECTED]> wrote: Cheers, Manthosh Kumar. T
Efficient? Probably not ;) But it's like if you are connecting your client app to a webserver local to the cluster and then the webserver connects to the cluster.
I don't like the idea of having the cluster accessible from the outsite and usuall prefer to have kind of a gateway, but that's your call.
You efficiency will mainly depend on the RPC calls you are doing. If you send or retreive big bunch of data at a time should not be that bad. But if you get cells one by one and send edits one by one, might not be very good.
JM 2014-04-03 9:57 GMT-04:00 Manthosh Kumar T <[EMAIL PROTECTED]>:
Hi Jean, Thanks. I might sound a bit lame. Can you just elaborate on the gateway part?. What is the best practice? On 3 April 2014 19:30, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote: Cheers, Manthosh Kumar. T
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext