Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Python UDF got problems converting Strings to Integers


Copy link to this message
-
Re: Python UDF got problems converting Strings to Integers
Hi,

First of all, why can't you pass a tuple of integers to your udf in the
first place? Because then you don't have to cast strings to integers inside
your udf.

Here is how I got your udf working.

cheolsoo@localhost:~/workspace/pig-trunk $cat 1.txt
1,2,3
4,5,6

cheolsoo@localhost:~/workspace/pig-trunk $cat test.pig
register 'test.py' using jython as myfuncs;
a = load '1.txt' using PigStorage(',') as (i:int, j:int, k:int); // declare
as integers
b = group a all;
c = foreach b generate myfuncs.aggHisto(a);
dump c;

@outputSchema("res_histo:tuple()")
def aggHisto(aHistogramSet):
    if aHistogramSet is None:
        return None;

    hist_len = len(aHistogramSet[0])
    result=[0]*hist_len
    print(aHistogramSet);

    for aHistogram in aHistogramSet:
        for i in range(0, hist_len):
            result[i] = result[i] + aHistogram[i]; // vector addition
    return tuple(result)

I get the following result:
((5,7,9))

Thanks,
Cheolsoo

On Tue, Oct 30, 2012 at 10:22 AM, Björn-Elmar Macek <[EMAIL PROTECTED]>wrote:

> Hi together,
>
> i got a UDF that  sums up histograms in form of tuples. The function i
> wrote looks like this:
>
> @outputSchema("res_histo:**tuple()")
> def aggHisto(aHistogramSet):
>                 if aHistogramSet is None: return None;
>                 hist_len = len(aHistogramSet[0])
>                 result=[0]*hist_len
>
>                 for aHistogram in aHistogramSet:
>                         for i in range(0,hist_len):
>                                 value = int(''.join(map(str,**
> aHistogram[i])));
>                                 result[i] = result[i] + (value)
>                 return tuple(result)
>
> So for the following input {(1,23,45),(0,0,0)} i SHOULD get the following
> output: (1,23,45)
> But instead i get: (49,5051,52,5353)
> I played around with this for some time and found out this program does
> the following:
> The line "value = int(''.join(map(str,**aHistogram[i])));" does not
> convert the "23" to 23, but it does the following:
> It takes every single digit starting with the most siginificant one and
> adds 48 to it: 2+48=50 and 3+48=51 resulting in 5051
>
> Why does this happen? Can anybody help me here?
>
> Best regards,
> Elmar
>