Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Java 7 and Pig, hijacked from PIG-2643


Copy link to this message
-
Re: Java 7 and Pig, hijacked from PIG-2643
Alan Gates 2012-04-16, 16:58
There are some exciting new features in Java 7.  However, realistically we can't start using it until Hadoop does.  I don't recall any discussion on it on their list, though I may have missed it.  But AFAIK they have no migration plans at this time.

Alan.

On Apr 12, 2012, at 11:55 AM, Jonathan Coveney wrote:

> Scott Carey brought Java 7 up in PIG-2643, and I think it's something we
> need to think about. When do we want to start taking advantage of new
> features that may not exist on Java 6? Do we ever?
>
> 2012/4/12 Scott Carey (Commented) (JIRA) <[EMAIL PROTECTED]>
>
>>
>>   [
>> https://issues.apache.org/jira/browse/PIG-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252706#comment-13252706]
>>
>> Scott Carey commented on PIG-2643:
>> ----------------------------------
>>
>> Another thought for this sort of thing:
>>
>> This might be achievable without bytecode generation and good performance
>> with Java 7 MethodHandles [1][2].  Of course, that would require Java 7,
>> but Java 6 support ends later year [3], about the time Pig 0.11 would be
>> out anyway.
>>
>>
>> [1]
>> http://docs.oracle.com/javase/7/docs/api/java/lang/invoke/MethodHandle.html
>> [2]
>> http://stackoverflow.com/questions/8823793/methodhandle-what-is-it-all-about
>> [3] https://blogs.oracle.com/henrik/entry/updated_java_6_eol_date
>>
>>> Use bytecode generation to make a performance replacement for
>> InvokeForLong, InvokeForString, etc
>>>
>> -------------------------------------------------------------------------------------------------
>>>
>>>                Key: PIG-2643
>>>                URL: https://issues.apache.org/jira/browse/PIG-2643
>>>            Project: Pig
>>>         Issue Type: Improvement
>>>           Reporter: Jonathan Coveney
>>>           Assignee: Jonathan Coveney
>>>           Priority: Minor
>>>             Labels: codegen
>>>            Fix For: 0.11, 0.10.1
>>>
>>>        Attachments: PIG-2643-0.patch
>>>
>>>
>>> This is basically to cut my teeth for much more ambitious code
>> generation down the line, but I think it could be performance and useful.
>>> the new syntax is:
>>> {code}a = load 'thing' as (x:chararray);
>>> define concat InvokerGenerator('java.lang.String','concat','String');
>>> define valueOf InvokerGenerator('java.lang.Integer','valueOf','String');
>>> define valueOfRadix
>> InvokerGenerator('java.lang.Integer','valueOf','String,int');
>>> b = foreach a generate x, valueOf(x) as vOf;
>>> c = foreach b generate x, vOf, valueOfRadix(x, 16) as vOfR;
>>> d = foreach c generate x, vOf, vOfR, concat(concat(x, (chararray)vOf),
>> (chararray)vOfR);
>>> dump d;
>>> {code}
>>> There are some differences between this version and Dmitriy's
>> implementation:
>>> - it is no longer necessary to declare whether the method is static or
>> not. This is gleaned via reflection.
>>> - as per the above, it is no longer necessary to make the first argument
>> be the type of the object to invoke the method on. If it is not a static
>> method, then the type will implicitly be the type you need. So in the case
>> of concat, it would need to be passed a tuple of two inputs: one for the
>> method to be called against (as it is not static), and then the 'string'
>> that was specified. In the case of valueOf, because it IS static, then the
>> 'String' is the only value.
>>> - The arguments are type sensitive. Integer means the Object Integer,
>> whereas int (or long, or float, or boolean, etc) refer to the primitive.
>> This is necessary to properly reflect the arguments. Values passed in WILL,
>> however, be properly unboxed as necessary.
>>> - The return type will be reflected.
>>> This uses the ASM API to generate the bytecode, and then a custom
>> classloader to load it in. I will add caching of the generated code based
>> on the input strings, etc, but I wanted to get eyes and opinions on this. I
>> also need to benchmark, but it should be native speed (excluding a little