Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Hadoop Job History Loader with PIG


+
Zebeljan, Nebojsa 2012-10-10, 12:23
+
Bill Graham 2012-10-11, 00:59
+
Cheolsoo Park 2012-10-11, 04:30
+
Zebeljan, Nebojsa 2012-10-11, 07:29
Copy link to this message
-
Re: Hadoop Job History Loader with PIG
Hi Cheolsoo,
I've found the reason why the "HadoopJobHistoryLoader" is not available.

In clouderas distro the class is excluded when building the piggybank
-> ./contrib/piggybank/java/build.xml
-> ./cloudera/patches/0001-CLOUDERA-BUILD.-CDHifying-Pig-0.9.1-build.patch

---
<!-- JobHistoryLoader currently does not support 0.23 -->
    <condition property="build.classes.excludes"
value="**/HadoopJobHistoryLoader.java" else="">
        <equals arg1="${hadoopversion}" arg2="23"/>
    </condition>
    <condition property="test.classes.excludes"
value="**/TestHadoopJobHistoryLoader.java" else="">
        <equals arg1="${hadoopversion}" arg2="23"/>
    </condition>
---
Do you know if this "exclude" is still needed for hadoop-2.x?

Thanks in advance!
Nebo

Am 11.10.12 09:29 schrieb "Zebeljan, Nebojsa" unter
<[EMAIL PROTECTED]>:

>Hi Cheolsoo,
>Yes, I've registered the piggybank jar in the pig script - see script
>below.
>
>---
>REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar
>
>a = load '/some_dir/some_aggregation/_logs/history' using
>HadoopJobHistoryLoader() as (j:map[], m:map[],
>r:map[]);
>b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user,
>j#'JOBNAME' as script_name,
>         (Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end;
>c = group b by (id, user, script_name)
>d = foreach c generate group.user, group.script_name, (MAX(b.end) -
>MIN(b.start)/1000;
>dump d;
>---
>
>I've also downloaded the PIG from cloudera version 4.0.1 again and greped
>the piggybank.jar for the "HadoopJobHistoryLoader" class - but I'm still
>not founding the class?!
>
>Greped also /usr/lib/pig/contrib/piggybank/java/piggybank.jar - same
>result Š
>
>
>What I'm doing wrong here?
>
>Thanks for any help!
>Nebo
>
>
>
>Am 11.10.12 06:30 schrieb "Cheolsoo Park" unter <[EMAIL PROTECTED]>:
>
>>Hi Nebojsa,
>>
>>Did you register piggybank.jar in your Pig script?
>>
>>REGISTER <path_to_piggibank.jar>;
>>
>>In CDH4.0.1, piggybank.jar can be found at
>>/usr/lib/pig/contrib/piggybank/java/piggybank.jar.
>>
>>Thanks,
>>Cheolsoo
>>
>>On Wed, Oct 10, 2012 at 5:23 AM, Zebeljan, Nebojsa <
>>[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>> I'm using cdh 4.0.1 with pig-0.9.2+26.
>>>
>>> I'v tried to gather some information about my result files aggregated
>>>by
>>> pig with the HadoopJobHistoryLoader() as described here
>>>
>>>http://archive.cloudera.com/cdh/3/pig/piglatin_ref1.html#Hadoop+Job+Hist
>>>o
>>>ry+Loader
>>>
>>> Running a simple pig script returns "ERROR 1070: Could not resolve
>>> org.apache.pig.piggybank.storage.HadoopJobHistoryLoader using imports:
>>>[,
>>> org.apache.pig.builtin., org.apache.pig.impl.builtin.]"
>>>
>>> Having this information, I've encountered that a HadoopJobHistoryLoader
>>> class in the piggybank does not exist!
>>>
>>> As by the API, this class should exist
>>>
>>>http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/H
>>>a
>>>doopJobHistoryLoader.html
>>>
>>> Can someone please lighten me up Š
>>>
>>> Thanks!
>>>
>>> Regards,
>>> Nebo
>>>
>>>
>