Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> hadoop File loading

Copy link to this message
Re: hadoop File loading

Your requirment is that your M/R will use  full xml file while operating.
(If it is write then please one of the approach bellow)
So you can put this xml file in  DistrubutedChache which will shared
accross the M/R . So that your will get whole xml instead of chunk of data.


On Tue, May 15, 2012 at 11:30 PM, @dataElGrande <[EMAIL PROTECTED]>wrote:

> You should check out Pentaho's howto's dealing with Hadoop and MapReducer.
> Hope this helps! http://wiki.pentaho.com/display/BAD/How+To%27s
> hari708 wrote:
> >
> > Hi,
> > I have a big file consisting of XML data.the XML is not represented as a
> > single line in the file. if we stream this file using ./hadoop dfs -put
> > command to a hadoop directory .How the distribution happens.?
> > Basically in My mapreduce program i am expecting a complete XML as my
> > input.i have a CustomReader(for XML) in my mapreduce job configuration.My
> > main confusion is if namenode distribute data to DataNodes ,there is a
> > chance that a part of xml can go to one data node and other half can go
> in
> > another datanode.If that is the case will my custom XMLReader in the
> > mapreduce be able to combine it(as mapreduce reads data locally only).
> > Please help me on this?
> >
> --
> View this message in context:
> http://old.nabble.com/hadoop-File-loading-tp32871902p33849683.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.