Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Help with XMLLoader


Copy link to this message
-
Re: Help with XMLLoader
Hi Mohit,
 XMLLoader looks for the start and end tag for a given string argument. In
the given input there are no end tags and hence it read 0 records.

Example:
raw = LOAD 'sample_xml' using
org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray);
dump raw;

cat sample_xml
<abc><def></def></abc>
<abc><def></def></abc>

Thanks
Vivek
On 2/21/12 11:02 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:

> I am trying to use XMLLoader to process the files but it doesn't seem to be
> quite working. For the first pass I am just trying to dump all the contents
> but it's saying 0 records found:
>
> bash-3.2$ hadoop fs -cat /examples/testfile.txt
>
> <abc><def></def><abc>
>
> <abc><def></def><abc>
>
> register 'pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar'
>
> raw = LOAD '/examples/testfile.txt' using
> org.apache.pig.piggybank.storage.XMLLoader('<abc>') as (document:chararray);
>
> dump raw;
>
> 2012-02-21 09:22:18,947 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 50% complete
>
> 2012-02-21 09:22:24,998 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
>
> 2012-02-21 09:22:24,999 [main] INFO org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
>
> 0.20.2-cdh3u3 0.8.1-cdh3u3 hadoop 2012-02-21 09:22:12 2012-02-21 09:22:24
> UNKNOWN
>
> Success!
>
> Job Stats (time in seconds):
>
> JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime
> MinReduceTime AvgReduceTime Alias Feature Outputs
>
> job_201202201638_0012 1 0 2 2 2 0 0 0 raw MAP_ONLY
> hdfs://dsdb1:54310/tmp/temp1968655187/tmp-358114646,
>
> Input(s):
>
> Successfully read 0 records (402 bytes) from: "/examples/testfile.txt"
>
> Output(s):
>
> Successfully stored 0 records in:
> "hdfs://dsdb1:54310/tmp/temp1968655187/tmp-358114646"
>
> Counters:
>
> Total records written : 0
>
> Total bytes written : 0
>
> Spillable Memory Manager spill count : 0
>
> Total bags proactively spilled: 0
>
> Total records proactively spilled: 0
>
> Job DAG:
>
> job_201202201638_0012
>
>
>
> 2012-02-21 09:22:25,004 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Success!
>
> 2012-02-21 09:22:25,011 [main] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
>
> 2012-02-21 09:22:25,011 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths to process : 1
>
> grunt> quit
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB