Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Help with XMLLoader


Copy link to this message
-
Re: Help with XMLLoader
It looks like when I have a big file it doesn't read the records. Is it
because of how split is occurring that causes it to fail?

On Tue, Feb 21, 2012 at 9:32 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> I am trying to use XMLLoader to process the files but it doesn't seem to
> be quite working. For the first pass I am just trying to dump all the
> contents but it's saying 0 records found:
>
> bash-3.2$ hadoop fs -cat /examples/testfile.txt
>
> <abc><def></def><abc>
>
> <abc><def></def><abc>
>
> register 'pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar'
>
> raw = LOAD '/examples/testfile.txt' using
> org.apache.pig.piggybank.storage.XMLLoader('<abc>') as (document:chararray);
>
> dump raw;
>
> 2012-02-21 09:22:18,947 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 50% complete
>
> 2012-02-21 09:22:24,998 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
>
> 2012-02-21 09:22:24,999 [main] INFO org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
>
> 0.20.2-cdh3u3 0.8.1-cdh3u3 hadoop 2012-02-21 09:22:12 2012-02-21 09:22:24
> UNKNOWN
>
> Success!
>
> Job Stats (time in seconds):
>
> JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime
> MinReduceTime AvgReduceTime Alias Feature Outputs
>
> job_201202201638_0012 1 0 2 2 2 0 0 0 raw MAP_ONLY
> hdfs://dsdb1:54310/tmp/temp1968655187/tmp-358114646,
>
> Input(s):
>
> Successfully read 0 records (402 bytes) from: "/examples/testfile.txt"
>
> Output(s):
>
> Successfully stored 0 records in:
> "hdfs://dsdb1:54310/tmp/temp1968655187/tmp-358114646"
>
> Counters:
>
> Total records written : 0
>
> Total bytes written : 0
>
> Spillable Memory Manager spill count : 0
>
> Total bags proactively spilled: 0
>
> Total records proactively spilled: 0
>
> Job DAG:
>
> job_201202201638_0012
>
>
>
> 2012-02-21 09:22:25,004 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Success!
>
> 2012-02-21 09:22:25,011 [main] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
>
> 2012-02-21 09:22:25,011 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths to process : 1
>
> grunt> quit
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB