Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Python UDF got problems converting Strings to Integers


+
Björn-Elmar Macek 2012-10-30, 17:22
Copy link to this message
-
Re: Python UDF got problems converting Strings to Integers
Hi,

First of all, why can't you pass a tuple of integers to your udf in the
first place? Because then you don't have to cast strings to integers inside
your udf.

Here is how I got your udf working.

cheolsoo@localhost:~/workspace/pig-trunk $cat 1.txt
1,2,3
4,5,6

cheolsoo@localhost:~/workspace/pig-trunk $cat test.pig
register 'test.py' using jython as myfuncs;
a = load '1.txt' using PigStorage(',') as (i:int, j:int, k:int); // declare
as integers
b = group a all;
c = foreach b generate myfuncs.aggHisto(a);
dump c;

@outputSchema("res_histo:tuple()")
def aggHisto(aHistogramSet):
    if aHistogramSet is None:
        return None;

    hist_len = len(aHistogramSet[0])
    result=[0]*hist_len
    print(aHistogramSet);

    for aHistogram in aHistogramSet:
        for i in range(0, hist_len):
            result[i] = result[i] + aHistogram[i]; // vector addition
    return tuple(result)

I get the following result:
((5,7,9))

Thanks,
Cheolsoo

On Tue, Oct 30, 2012 at 10:22 AM, Björn-Elmar Macek <[EMAIL PROTECTED]>wrote:

> Hi together,
>
> i got a UDF that  sums up histograms in form of tuples. The function i
> wrote looks like this:
>
> @outputSchema("res_histo:**tuple()")
> def aggHisto(aHistogramSet):
>                 if aHistogramSet is None: return None;
>                 hist_len = len(aHistogramSet[0])
>                 result=[0]*hist_len
>
>                 for aHistogram in aHistogramSet:
>                         for i in range(0,hist_len):
>                                 value = int(''.join(map(str,**
> aHistogram[i])));
>                                 result[i] = result[i] + (value)
>                 return tuple(result)
>
> So for the following input {(1,23,45),(0,0,0)} i SHOULD get the following
> output: (1,23,45)
> But instead i get: (49,5051,52,5353)
> I played around with this for some time and found out this program does
> the following:
> The line "value = int(''.join(map(str,**aHistogram[i])));" does not
> convert the "23" to 23, but it does the following:
> It takes every single digit starting with the most siginificant one and
> adds 48 to it: 2+48=50 and 3+48=51 resulting in 5051
>
> Why does this happen? Can anybody help me here?
>
> Best regards,
> Elmar
>
+
Björn-Elmar Macek 2012-10-31, 09:36
+
Björn-Elmar Macek 2012-10-31, 10:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB