|
|
-
Re: Need Infosudha sadhasivam 2009-10-15, 06:39
Dear Shwitzu
The steps are listed below: Kindly go through wordcount and multifile word count for you project. Modify the program to list the files containing the keywords along with fine names. Use file names as keys. Store the files in 4 different input directories – one for each file type if needed. Else you can also have it in a single input directory. Use word count example with extensions suggested to retrieve file names having the keywords and store the result in output directory or display the links. Map – parallelized reading of multiple files – Input key-value pair is filename–filecontents Output key-value pair is filename – keyword and count. Reduce – combining output from key-value pairs of map function Input key-value pair is filename – keyword and count. Output key-value pairs is keyword – filenames having the keywords The answers to your questions are: 1) How should I start with the design? Identify the files to be saved in the HDFS input disrectory. Go through the word count example. 2) Upload all the files and create Map, Reduce and Driver code and once I run my application will it automatically go the file system and get back the results to me? Move all the files from local file system to HDFS / save it directly to HDFS by using suitable DFS command like copyfromlocal() - Go through DFS commands 3) How do i handle the binary data? I want to store binary format data using MTOM in my databse. It can be handled in the same way as a conventional file G Sudha Sadasivam [EMAIL PROTECTED]> wrote: From: shwitzu <[EMAIL PROTECTED]> Subject: Need Info To: [EMAIL PROTECTED] Date: Thursday, October 15, 2009, 7:19 AM Hello Sir! I am new to hadoop. I have a project based on webservices. I have my information in 4 databases with different files in each one of them. Say, images in one, video, documents etc. My task is to develop a web service which accepts the keyword from the client and process the request and send back the actual requested file back to the user. Now I have to use Hadoop distributed file system in this project. I have the following questions: 1) How should I start with the design? 2) Should I upload all the files and create Map, Reduce and Driver code and once I run my application will it automatically go the file system and get back the results to me? 3) How do i handle the binary data? I want to store binary format data using MTOM in my databse. Please let me know how I should proceed. I dont know much about this hadoop and am I searching for some help. It would be great if you could assist me. Thanks again -- View this message in context: http://www.nabble.com/Need-Info-tp25901902p25901902.html Sent from the Hadoop core-user mailing list archive at Nabble.com. |