Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> question about pig commands implementation procedure and unit test result


Copy link to this message
-
question about pig commands implementation procedure and unit test result
Hello,
I have some opinion about pig commands implementation procedure:
For example:
pig commands(from TestNewPlanLogToPhyTranslationVisitor.java):
        a = load 'd1.txt' as (id, c);
        b = load 'd2.txt'as (id, c);
        c = load 'd3.txt' as (id, c);
        d = join a by id, b by c;      
        e = filter d by a::id==NULL AND b::c==NULL;
        f = join e by b::c, c by id;
        g = filter f by b::id==NULL AND c::c==NULL;
        store g into 'empty2';
Pig will use buildPlan method to get LogicalPlan like this:
|
|---g: Filter scope-24 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
    |   |
    |   And scope-23 FieldSchema: boolean Type: boolean
    |   |
    |   |---Equal scope-19 FieldSchema: boolean Type: boolean
    |   |   |
    |   |   |---Project scope-17 Projections: [2] Overloaded: false FieldSchema: e::b::id: bytearray Type: bytearray
    |   |   |   Input: f: LOJoin scope-16
    |   |   |
    |   |   |---Const scope-18( null ) FieldSchema: bytearray Type: bytearray
    |   |
    |   |---Equal scope-22 FieldSchema: boolean Type: boolean
    |       |
    |       |---Project scope-20 Projections: [5] Overloaded: false FieldSchema: c::c: bytearray Type: bytearray
    |       |   Input: f: LOJoin scope-16
    |       |
    |       |---Const scope-21( null ) FieldSchema: bytearray Type: bytearray
    |
    |---f: LOJoin scope-16 Schema: {e::a::id: bytearray,e::a::c: bytearray,e::b::id: bytearray,e::b::c: bytearray,c::id: bytearray,c::c: bytearray} Type: bag
        |   |
        |   Project scope-14 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
        |   Input: e: Filter scope-13
        |   |
        |   Project scope-15 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
        |   Input: c: Load scope-2
        |
        |---c: Load scope-2 Schema: {id: bytearray,c: bytearray} Type: bag
        |
        |---e: Filter scope-13 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
            |   |
            |   And scope-12 FieldSchema: boolean Type: boolean
            |   |
            |   |---Equal scope-8 FieldSchema: boolean Type: boolean
            |   |   |
            |   |   |---Project scope-6 Projections: [0] Overloaded: false FieldSchema: a::id: bytearray Type: bytearray
            |   |   |   Input: d: LOJoin scope-5
            |   |   |
            |   |   |---Const scope-7( null ) FieldSchema: bytearray Type: bytearray
            |   |
            |   |---Equal scope-11 FieldSchema: boolean Type: boolean
            |       |
            |       |---Project scope-9 Projections: [3] Overloaded: false FieldSchema: b::c: bytearray Type: bytearray
            |       |   Input: d: LOJoin scope-5
            |       |
            |       |---Const scope-10( null ) FieldSchema: bytearray Type: bytearray
            |
            |---d: LOJoin scope-5 Schema: {a::id: bytearray,a::c: bytearray,b::id: bytearray,b::c: bytearray} Type: bag
                |   |
                |   Project scope-3 Projections: [0] Overloaded: false FieldSchema: id: bytearray Type: bytearray
                |   Input: a: Load scope-0
                |   |
                |   Project scope-4 Projections: [1] Overloaded: false FieldSchema: c: bytearray Type: bytearray
                |   Input: b: Load scope-1
                |
                |---a: Load scope-0 Schema: {id: bytearray,c: bytearray} Type: bag
                |
                |---b: Load scope-1 Schema: {id: bytearray,c: bytearray} Type: bag

I assume the commands analysis and middle data storage are all based on HashMap structure. Is this correct?
I found some test cases result are based on the result of HashMap analysis. Then in my opinion, our test case output result should not be single. As we know the output of HashMap analysis is not  steadfast. Please give your opinion about my words. Thank you.
+
Daniel Dai 2011-08-23, 20:04