|
|
-
RE: using sequencefile generated by Sqoop in MapreduceKartashov, Andy 2012-10-09, 20:58
Gents, please ignore my below. Everything works as a glove.
conf.setInputFormat(SequenceFileInputFormat.class) indeed works well with Sqoop generated class. The reason why I was getting only the last line in my output is because I failed to notice that I am using fs.create() i/o fs.append(). *blash* Andy Kartashov MPAC Architecture R&D, Co-op 1340 Pickering Parkway, Pickering, L1V 0C4 * Phone : (905) 837 6269 * Mobile: (416) 722 1787 [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> From: Kartashov, Andy Sent: Tuesday, October 09, 2012 1:19 PM To: [EMAIL PROTECTED] Subject: using sequencefile generated by Sqoop in Mapreduce Guys, I have trouble using sequence file in Mar-Reduce. The output I get is very last record. I am creating sequence file while importing MySQL table into Hadoop using: $Sqoop import...... --as-sequencefile I am then are trying to read from this file into the mapper and create keys from object's Ids and values - the actual objects with attributes per each table's record. In the Reducer I am iterating those objects and outputting objects's attributes to a .txt file. My Mapreduce code: public static void main(String[] args) throws Exception { ... conf.setOutputKeyClass(Text.class); // this will be one of the fields of the exported table, say ids conf.setOutputValueClass(<Sqoop_class.class>); //say Sqoop_class.class generated_class_during_import conf.setInputFormat(SequenceFileInputFormat.class); conf.setOutputFormat(NullOutputFormat.class); // output will be to a .txt file .. //===================================================================================================================public static class MyMapper extends MapReduceBase implements Mapper<LongWritable, Sqoop_class, Text, Sqoop_class> { //-------------------------------------------------------------------------------------------------------------------- public void map(LongWritable key, Sqoop_class value, OutputCollector<Text, Sqoop_class> output, Reporter reporter) throws IOException { output.collect(new Text(value.get_foo_id ().toString()), value); } // end of map() } // end of static class MyMapper //=================================================================================================================== public static class MyReducer extends MapReduceBase implements Reducer<Text, Sqoop_class, Text, Sqoop_class> { //-------------------------------------------------------------------------------------------------------------------- public void reduce(Text key, Iterator<Sqoop_class> values, OutputCollector<Text, Sqoop_class> output, Reporter reporter) throws IOException { ....... while (values.hasNext()){ output.collect(key,epa); //output is to Null... out.writeBytes(" values.next().get_foo_name() + "!\n " ); } // end of while loop ...... }//end of reduce() } // end of static class MyReducer Would not Mapper create Keys from each Sqoop_class id value and Values will be instance of each Sqoop_class Then we group those instances in the reducer and Iterate through them retrieving attribute names from each object. Somehow the values of one last instance of the object is only written. Should not conf.setInputFormat(SequenceFileInputFormat.class); And Sqoop_class.class work together reading from Sequence file? Thanks, Andy From: nagarjuna kanamarlapudi [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 09, 2012 11:03 AM To: [EMAIL PROTECTED] Subject: Re: Hive-Site XML changing any proprty. by restarting the hive server your problem should be solved. Not sure if we have any other ways of starting the hive server other than . 1. bin/hive --service hiveserver 2. HIVE_PORT=xxxx ./hive --service hiveserver Regards, Nagarjuna On Tue, Oct 9, 2012 at 8:10 PM, Uddipan Mukherjee <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi hadoop, hive gurus, I have a requirement to change the path of the scratch folder of Hive. Hence I have added following property in Hive-Site.xml and changed its value as required. <property> <name>hive.exec.scratchdir</name> <value/tmp/example</value> <description>Scratch space for Hive jobs</description> </property> But still it is not reflecting as required. Do I need to restart Hive server to read the updated value in the file. Also is there any other way other than restarting Hive server? Any pointers will be helpful. Thanks And Regards Uddipan Mukherjee **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS*** NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisat |