Using the Hive sampling feature would also help. This is exactly what that feature is designed for.
From: Kyle B [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 05, 2013 1:45 PM
To: [EMAIL PROTECTED]
Subject: Hive sample test
I was wondering if there is a way to quick-verify a Hive query before it is run against a big dataset? The tables I am querying against have millions of records, and I'd like to verify my Hive query before I run it against all records.
Is there a way to test the query against a small subset of the data, without going into full MapReduce? As silly as this sounds, is there a way to MapReduce without the overhead of MapReduce? That way I can check my query is doing what I want before I run it against all records.