First of all, that might not be the right approach to choose the underlying
storage. You should choose HDFS or HBase depending on whether the data is
going to be used for batch processing or you need random access on top of
it. HBase is just another layer on top of HDFS. So obviously the queries
running on top of HBase are going to be less efficient. So if you can get
away with using HDFS, I would say that is the best and simplest approach.
On Wed, Jul 17, 2013 at 12:40 PM, Hamza Asad <[EMAIL PROTECTED]> wrote:
> Please let me knw which approach is better. Either i save my data directly
> to HDFS and run hive (shark) queries over it OR store my data in HBASE, and
> then query it.. as i want to ensure efficient data retrieval and data
> remains safe and can easily recover if hadoop crashes.
> *Muhammad Hamza Asad*