Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Fw: problem in Apache Pig Code query


Copy link to this message
-
Fw: problem in Apache Pig Code query
Shikha Tyagi 2012-11-16, 06:13

Hi Sir,
                 I have read your Blog about Apache Pig and my requirement meet with your Blog content..
>                 Actually, i want to read/parse xml file from HDFS using pig code script but initially i am working with local xml file.
>                 Pig code query is:--
>
>
>register /home/mobibiz2/Desktop/PiggyBank/trunk/contrib/piggybank/java/piggybank.jar;
>xml_file = LOAD '/home/mobibiz2/hadoop-1.0.3/pig-0.10.0/register.xml' using org.apache.pig.piggybank.storage.XMLLoader('expense') as (doc:chararray);
>loof_file = FOREACH xml_file 
 GENERATE FLATTEN
>(REGEX_EXTRACT_ALL(register,'\\s*<expense\\s+id="([^<"]*)">\\n\\s*<value>([^>]*)</value>\\n\\s*</expense>\\n\\s*')
>)
>AS
>(
>expense1: chararray,
>value1: chararray,
>expense2: chararray,
>value2: chararray);
>store_file = store loof_file into '/home/mobibiz2/Desktop/Sample/Get_Files/out.txt';
>
>
>
>
>and xml is:--
><register>
><expense
 id="productId">
><value>12354678</value>
></expense>
><expense id="AckLevel">
><value>LEVEL2</value>
></expense>
></register>
>
>
>I want to store xml element value and attribute but when i run this script, it gives this error:--
>2012-11-15 10:33:07,213 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve RegexExtractAll using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>
>and  when i run this query:-
>register /home/mobibiz2/Desktop/PiggyBank/trunk/contrib/piggybank/java/piggybank.jar;
>xml_file = LOAD '/home/mobibiz2/hadoop-1.0.3/pig-0.10.0/register.xml' using org.apache.pig.piggybank.storage.XMLLoader('value') as
 (doc:chararray);
>loof_file = foreach xml_file generate doc;
>store_file = store loof_file into '/home/mobibiz2/Desktop/Sample/Get_Files/out.txt';
> it gives this output:--
><expense id="productId">
><value>12354678</value>
></expense>
><expense id="AckLevel">
><value>LEVEL2</value>
></expense>
>
>actually it splits out only expense element, but i want to get attribute and element node value..
>
>Please help me out from this because i have spend so many days to this work.
>Please reply soon
>
>
>Thanks
>Shikha Tyagi
>
>
>
>