Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Bytes to Long/Interger conversions


Copy link to this message
-
Bytes to Long/Interger conversions
Daniel and myself were discussing the way Pig does these conversions
currently and possibly simplify/optimize it further.

        Long ret = null;
        if (sanityCheckIntegerLong(s)) {
            try {
                ret = Long.valueOf(s);
            } catch (NumberFormatException nfe) {
            }
        }
The code looks to see if all characters are numeric and then does a
conversion to Long.

    private static boolean sanityCheckIntegerLong(String number){
        for (int i=0; i < number.length(); i++){
            if (number.charAt(i) >= '0' && number.charAt(i) <='9' || i == 0
&& number.charAt(i) == '-'){
                // valid one
            }
            else{
                // contains invalid characters, must not be a integer or
long.
                return false;
            }
        }
        return true;
    }

If the input is not numeric (1234abcd) the code calls
Double.valueOf(String) regardless before finally returning null. Any script
that inadvertently (user's mistake or not) tries to cast alpha-numeric
column to int or long would result in many wasteful calls.

I think we can avoid this and only handle the cases we find the input to be
a decimal number (1234.56) and return null otherwise even before trying
Double.valueOf(String).

Thoughts/concerns? Just want to make sure such a change does not break
backward-compatibility.

-Prashant
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB