Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Hadoop Job History Loader with PIG


Copy link to this message
-
Re: Hadoop Job History Loader with PIG
Hi Cheolsoo,
I've found the reason why the "HadoopJobHistoryLoader" is not available.

In clouderas distro the class is excluded when building the piggybank
-> ./contrib/piggybank/java/build.xml
-> ./cloudera/patches/0001-CLOUDERA-BUILD.-CDHifying-Pig-0.9.1-build.patch

---
<!-- JobHistoryLoader currently does not support 0.23 -->
    <condition property="build.classes.excludes"
value="**/HadoopJobHistoryLoader.java" else="">
        <equals arg1="${hadoopversion}" arg2="23"/>
    </condition>
    <condition property="test.classes.excludes"
value="**/TestHadoopJobHistoryLoader.java" else="">
        <equals arg1="${hadoopversion}" arg2="23"/>
    </condition>
---
Do you know if this "exclude" is still needed for hadoop-2.x?

Thanks in advance!
Nebo

Am 11.10.12 09:29 schrieb "Zebeljan, Nebojsa" unter
<[EMAIL PROTECTED]>:

>Hi Cheolsoo,
>Yes, I've registered the piggybank jar in the pig script - see script
>below.
>
>---
>REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar
>
>a = load '/some_dir/some_aggregation/_logs/history' using
>HadoopJobHistoryLoader() as (j:map[], m:map[],
>r:map[]);
>b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user,
>j#'JOBNAME' as script_name,
>         (Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end;
>c = group b by (id, user, script_name)
>d = foreach c generate group.user, group.script_name, (MAX(b.end) -
>MIN(b.start)/1000;
>dump d;
>---
>
>I've also downloaded the PIG from cloudera version 4.0.1 again and greped
>the piggybank.jar for the "HadoopJobHistoryLoader" class - but I'm still
>not founding the class?!
>
>Greped also /usr/lib/pig/contrib/piggybank/java/piggybank.jar - same
>result Š
>
>
>What I'm doing wrong here?
>
>Thanks for any help!
>Nebo
>
>
>
>Am 11.10.12 06:30 schrieb "Cheolsoo Park" unter <[EMAIL PROTECTED]>:
>
>>Hi Nebojsa,
>>
>>Did you register piggybank.jar in your Pig script?
>>
>>REGISTER <path_to_piggibank.jar>;
>>
>>In CDH4.0.1, piggybank.jar can be found at
>>/usr/lib/pig/contrib/piggybank/java/piggybank.jar.
>>
>>Thanks,
>>Cheolsoo
>>
>>On Wed, Oct 10, 2012 at 5:23 AM, Zebeljan, Nebojsa <
>>[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>> I'm using cdh 4.0.1 with pig-0.9.2+26.
>>>
>>> I'v tried to gather some information about my result files aggregated
>>>by
>>> pig with the HadoopJobHistoryLoader() as described here
>>>
>>>http://archive.cloudera.com/cdh/3/pig/piglatin_ref1.html#Hadoop+Job+Hist
>>>o
>>>ry+Loader
>>>
>>> Running a simple pig script returns "ERROR 1070: Could not resolve
>>> org.apache.pig.piggybank.storage.HadoopJobHistoryLoader using imports:
>>>[,
>>> org.apache.pig.builtin., org.apache.pig.impl.builtin.]"
>>>
>>> Having this information, I've encountered that a HadoopJobHistoryLoader
>>> class in the piggybank does not exist!
>>>
>>> As by the API, this class should exist
>>>
>>>http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/H
>>>a
>>>doopJobHistoryLoader.html
>>>
>>> Can someone please lighten me up Š
>>>
>>> Thanks!
>>>
>>> Regards,
>>> Nebo
>>>
>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB