Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Python UDF got problems converting Strings to Integers

Copy link to this message
Python UDF got problems converting Strings to Integers
Hi together,

i got a UDF that  sums up histograms in form of tuples. The function i
wrote looks like this:

def aggHisto(aHistogramSet):
                 if aHistogramSet is None: return None;
                 hist_len = len(aHistogramSet[0])

                 for aHistogram in aHistogramSet:
                         for i in range(0,hist_len):
                                 value =
                                 result[i] = result[i] + (value)
                 return tuple(result)

So for the following input {(1,23,45),(0,0,0)} i SHOULD get the
following output: (1,23,45)
But instead i get: (49,5051,52,5353)
I played around with this for some time and found out this program does
the following:
The line "value = int(''.join(map(str,aHistogram[i])));" does not
convert the "23" to 23, but it does the following:
It takes every single digit starting with the most siginificant one and
adds 48 to it: 2+48=50 and 3+48=51 resulting in 5051

Why does this happen? Can anybody help me here?

Best regards,