Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Help with XMLLoader


Copy link to this message
-
Re: Help with XMLLoader
Vivek Padmanabhan 2012-02-22, 05:57
Hi Mohit,
 XMLLoader looks for the start and end tag for a given string argument. In
the given input there are no end tags and hence it read 0 records.

Example:
raw = LOAD 'sample_xml' using
org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray);
dump raw;

cat sample_xml
<abc><def></def></abc>
<abc><def></def></abc>

Thanks
Vivek
On 2/21/12 11:02 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:

> I am trying to use XMLLoader to process the files but it doesn't seem to be
> quite working. For the first pass I am just trying to dump all the contents
> but it's saying 0 records found:
>
> bash-3.2$ hadoop fs -cat /examples/testfile.txt
>
> <abc><def></def><abc>
>
> <abc><def></def><abc>
>
> register 'pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar'
>
> raw = LOAD '/examples/testfile.txt' using
> org.apache.pig.piggybank.storage.XMLLoader('<abc>') as (document:chararray);
>
> dump raw;
>
> 2012-02-21 09:22:18,947 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 50% complete
>
> 2012-02-21 09:22:24,998 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
>
> 2012-02-21 09:22:24,999 [main] INFO org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
>
> 0.20.2-cdh3u3 0.8.1-cdh3u3 hadoop 2012-02-21 09:22:12 2012-02-21 09:22:24
> UNKNOWN
>
> Success!
>
> Job Stats (time in seconds):
>
> JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime
> MinReduceTime AvgReduceTime Alias Feature Outputs
>
> job_201202201638_0012 1 0 2 2 2 0 0 0 raw MAP_ONLY
> hdfs://dsdb1:54310/tmp/temp1968655187/tmp-358114646,
>
> Input(s):
>
> Successfully read 0 records (402 bytes) from: "/examples/testfile.txt"
>
> Output(s):
>
> Successfully stored 0 records in:
> "hdfs://dsdb1:54310/tmp/temp1968655187/tmp-358114646"
>
> Counters:
>
> Total records written : 0
>
> Total bytes written : 0
>
> Spillable Memory Manager spill count : 0
>
> Total bags proactively spilled: 0
>
> Total records proactively spilled: 0
>
> Job DAG:
>
> job_201202201638_0012
>
>
>
> 2012-02-21 09:22:25,004 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Success!
>
> 2012-02-21 09:22:25,011 [main] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
>
> 2012-02-21 09:22:25,011 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths to process : 1
>
> grunt> quit