In would use some data ingestion tool like Apache Flume to make the task
easier without much human intervention. Create sources for your different
systems and rest will be taken care of by Fume. However, it is not a must
to use something like Flume. But it will definitely make your life easier
and will help you in developing a more sophisticated system, IMHO.
You need HBase when you need rea-time random read/access to your data.
Basically when you intend to have low latency access to small amounts of
data from within a large data set and you have a flexible schema.
And for the last part of your question, use Apache Hive. It provides us
warehousing capabilities on top of an existing Hadoop cluster with an
SQLish interface to query the stored data. Also, it will be of help while
On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <[EMAIL PROTECTED]>wrote: