Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Bytes to Long/Interger conversions

Copy link to this message
Bytes to Long/Interger conversions
Daniel and myself were discussing the way Pig does these conversions
currently and possibly simplify/optimize it further.

        Long ret = null;
        if (sanityCheckIntegerLong(s)) {
            try {
                ret = Long.valueOf(s);
            } catch (NumberFormatException nfe) {
The code looks to see if all characters are numeric and then does a
conversion to Long.

    private static boolean sanityCheckIntegerLong(String number){
        for (int i=0; i < number.length(); i++){
            if (number.charAt(i) >= '0' && number.charAt(i) <='9' || i == 0
&& number.charAt(i) == '-'){
                // valid one
                // contains invalid characters, must not be a integer or
                return false;
        return true;

If the input is not numeric (1234abcd) the code calls
Double.valueOf(String) regardless before finally returning null. Any script
that inadvertently (user's mistake or not) tries to cast alpha-numeric
column to int or long would result in many wasteful calls.

I think we can avoid this and only handle the cases we find the input to be
a decimal number (1234.56) and return null otherwise even before trying

Thoughts/concerns? Just want to make sure such a change does not break