Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> hive newb questions

Sai Sai 2013-03-04, 08:30
Copy link to this message
Re: hive newb questions
Select * from table without any where condition will never run MR job

Hive does not cache your query results. If you rerun your query everything
will be repeated for each repeatation

How many days you should keep data...as long as it means something to you
or it has some value in storing foe future. If your tables are created in
hadoop tmp dir then data will be removed with tmp cleanup policy in hadoop

Hive with hadoop is data warehousing system....ideally  historical data
aging back to years ..if it does not have any meaning to business then
remove it
 On Mar 4, 2013 2:01 PM, "Sai Sai" <[EMAIL PROTECTED]> wrote:

> Hi
> I was wondering if it is right to assume:
> 1. The first time we create a table in hive and load it followed by
> running the first query like
> Select * from Table1
> will result in a MR job running and will get the data to us.
> If we run the same query second time MR job will not run but will result
> in just fetch the data.
> 2. If the above assumption is not right is possible to cache the data in
> hive so the MR job will not run
> again for the subsequent queries and just fetch it right away.
> 3. Once we load the data in hive table how many days should we keep it.
> 4. Is it a good practise to remove the data in a certain period of time as
> it may take a large space.
> 5. Should this really be a concern or not as the memory today is not that
> expensive.
> Any inputs will be appreciated.
> Thanks
> Sai