Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Fw: problem in Apache Pig Code query


Copy link to this message
-
Fw: problem in Apache Pig Code query

Hi Sir,
                 I have read your Blog about Apache Pig and my requirement meet with your Blog content..
>                 Actually, i want to read/parse xml file from HDFS using pig code script but initially i am working with local xml file.
>                 Pig code query is:--
>
>
>register /home/mobibiz2/Desktop/PiggyBank/trunk/contrib/piggybank/java/piggybank.jar;
>xml_file = LOAD '/home/mobibiz2/hadoop-1.0.3/pig-0.10.0/register.xml' using org.apache.pig.piggybank.storage.XMLLoader('expense') as (doc:chararray);
>loof_file = FOREACH xml_file 
 GENERATE FLATTEN
>(REGEX_EXTRACT_ALL(register,'\\s*<expense\\s+id="([^<"]*)">\\n\\s*<value>([^>]*)</value>\\n\\s*</expense>\\n\\s*')
>)
>AS
>(
>expense1: chararray,
>value1: chararray,
>expense2: chararray,
>value2: chararray);
>store_file = store loof_file into '/home/mobibiz2/Desktop/Sample/Get_Files/out.txt';
>
>
>
>
>and xml is:--
><register>
><expense
 id="productId">
><value>12354678</value>
></expense>
><expense id="AckLevel">
><value>LEVEL2</value>
></expense>
></register>
>
>
>I want to store xml element value and attribute but when i run this script, it gives this error:--
>2012-11-15 10:33:07,213 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve RegexExtractAll using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
>and  when i run this query:-
>register /home/mobibiz2/Desktop/PiggyBank/trunk/contrib/piggybank/java/piggybank.jar;
>xml_file = LOAD '/home/mobibiz2/hadoop-1.0.3/pig-0.10.0/register.xml' using org.apache.pig.piggybank.storage.XMLLoader('value') as
 (doc:chararray);
>loof_file = foreach xml_file generate doc;
>store_file = store loof_file into '/home/mobibiz2/Desktop/Sample/Get_Files/out.txt';
> it gives this output:--
><expense id="productId">
><value>12354678</value>
></expense>
><expense id="AckLevel">
><value>LEVEL2</value>
></expense>
>
>actually it splits out only expense element, but i want to get attribute and element node value..
>
>Please help me out from this because i have spend so many days to this work.
>Please reply soon
>
>
>Thanks
>Shikha Tyagi
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB