Sensors' may send tcpip data to server. Each sensor may send tcpip data like a stream to the server, the quatity of the sensors and the data rate of the data is high.
Firstly, how the data from tcpip can be put into hadoop. It need to do some process and store in hbase. Does it need through save to data files and put into hadoop or can be done in some direct ways from tcpip. Is there any software module can take care of this. Searched that Ganglia Nagios and Flume may do it. But when looking into details, ganglia and nagios are more for monitoring hadoop cluster itself. Flume is for log files.
Secondly, if the total network traffic from sensors are over the limit of one lan port, how to share the loads, is there any component in hadoop to make this done automatically.
I will suggest that don't pipe the sensor data to the HDFS directly instead you can have some program(either java,python etc) on the server itself to process the incoming sensor data and writing it to the text/binary file(don't know the data format which you are currently receiving).now you can put your data file on the HDFS alternatively you can directly process the data and save to your HBASE managed on the HDFS.
if your sensor data is log data then you can use flume to load that data into the HDFS directly.
If I were you I would ask following questions to get the answer
being stored - in fs/rdbmbs? time streaming solution - there is a storm (from linkedin) that can go well with kafka (messaging queue) or spark streaming (which is in memory map-reduce) and takes real time streams - has in built twitter api but you need to write your own service to poll data every few seconds and send it in RDD format allow you to do both offline and real time data analytics On Tue, May 6, 2014 at 10:48 PM, Alex Lee <[EMAIL PROTECTED]> wrote:
Flume is not just for log files, you can wire up Flume's source for this purpose. Also there are alternative open-source solutions for data streaming, e.g. Apache Storm or Kafka. On Tue, May 6, 2014 at 10:48 PM, Alex Lee <[EMAIL PROTECTED]> wrote:
Whether you use Storm/kafka or any other realtime processing or not, you may still need to persist the data which can be done directly to hbase from any of these realtime system or from the source. On Thu, May 8, 2014 at 9:25 PM, Hardik Pandya <[EMAIL PROTECTED]>wrote:
Use the Flume. On Wed, May 7, 2014 at 8:18 AM, Alex Lee <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext