Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Verifying unordered output with PigUnit


+
Johannes Schwenk 2012-05-29, 12:35
+
Jonathan Coveney 2012-05-29, 17:42
Copy link to this message
-
Re: Verifying unordered output with PigUnit
Hello again!

I don't have to sort the output in normal operation of my script, so I
would rather not, as this prolongs running time unnecessarily...

So I still have the problem that I cannot compare the unsorted output of
the script to the expected one. I am doing this in PigUnit, so I had a
look at org.apache.pig.pigunit.PigTest and the only option I could see
is to override assertOutput and write a new version of readFile assuring
that those functions return sorted records, which I thought to be not
that elegant...

Has nobody had this problem with PigUnit to date?

Thanks!

Am 29.05.2012 19:42, schrieb Jonathan Coveney:
> Generally, sorting is the way to go. It's going to be difficult to get
> around doing some sort of processing in order to make it easier to evaluate
> equality.
>
> If you want something generally O(n) instead of O(n log n), you could
> calculate the hashCode for every tuple then SUM it (which is algebraic),
> and only in the case that these are not equal (exceedingly rare) would you
> sort and directly do the comparison.
>
> 2012/5/29 Johannes Schwenk <[EMAIL PROTECTED]>
>
>> Hello all,
>>
>> I'd like to verify output from a pig script that does not sort its
>> results prior to output. Thus the order of the tuples in the output is
>> non-deterministic. I would rather not add sorting to my script, because
>> I am potentially dealing with a lot of data here. As I have found
>> PigLatin does not support conditional statements like "if PIG_UNIT_TEST
>> do stepsA else do stepsB fi" - so this is also not an option (besides
>> from having duplicate and differing logic for test and non-test runs!).
>>
>> So how could I do this?
>>
>> Greetings,
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>>
>

Johannes Schwenk

--
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB