Here is one example. I want to make sure I don't have negative prices in my
data. I would like to write something like:
assert_empty(select * from Prices where price <= 0);
as part of my Hive script. I expect my job to fail if there are negative
prices in the data.
I understand there is high cost to doing this but some of our tables are not
that big and any kind of data checks would help.
On Thu, May 26, 2011 at 2:18 PM, Alex Kozlov <[EMAIL PROTECTED]> wrote:
> 1) Would `select count(1) from (query)` do the same thing? I am a bit
> confused what is the semantic of assert: is it just no rows or some kind of
> syntax error check?
> 2) Hive is not an OLTP and is not optimized for single row inserts (or
> updates for this matter). In a trivial implementation one would just do
> copy-on-write, i.e. overwrite the whole data file, or add a small file
> containing one or a few rows. I do not think indices have been implemented
> yet either.
> 3) You can probably do what you want with HBase (if you explain what you
> want to do in more detail)
> On Thu, May 26, 2011 at 2:00 PM, Igor Tatarinov <[EMAIL PROTECTED]> wrote:
>> I would like to implement some kind of assert functionality in Hive QL.
>> Here is how I do it in MySQL. I can assert that a given query returns no
>> (bad) rows by creating a table with one row containing '1' and a unique
>> index. Then, I try to insert into that table select 1 from (query). If the
>> query returns something, I have an assert failure. I've found such asserts
>> very helpful in catching data bugs early.
>> Can I do something similar in Hive? I suppose I could implement a UDF that
>> fails if it gets executed. This way I can ensure there is no records
>> matching a given query.
>> That doesn't sound too bad but perhaps there is a cleaner solution.