Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Hadoop file uploads


Copy link to this message
-
Re: Hadoop file uploads
Brock Noland 2011-10-13, 10:59
Hi,

The code is very similar, just create a SequenceFile reader.

Brock

On Thu, Oct 13, 2011 at 4:53 AM, visioner sadak <[EMAIL PROTECTED]>wrote:

> Hello Brock,
>
>                   Thanks a lot for your help man,should i run this code
> after doing the small file uploads i mean i have a java api which does the
> small file uploads and reads as well,how will be i able to read the files as
> well
>
>
>
> On Thu, Oct 13, 2011 at 2:26 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> This:  http://pastebin.com/YFzAh0Nj
>>
>> will convert a directory of small files to a sequence file. The key is the
>> filename, the value the file itself. This works if each individual file is
>> small enough to fit in memory. If you have some files which are larger and
>> those files can be split up, they can be split over multiple key value
>> pairs.
>>
>> Brock
>>
>> On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Hello guys,
>>>
>>>             Thanks a lot again for your previous guidance guys,i tried
>>> out java api to do file uploads its wrking fine,now i need to modify the
>>> code using sequence files so that i can handle large number of small files
>>> in hadoop.. for that i encountered 2 links
>>>
>>> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to
>>> sequence)
>>> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush)
>>>
>>> could you pls tell me which approach is better to follow or should i
>>> follow HAR(hadoop archive) approach,i came to know that in sequence file we
>>> can combine smaller files in to one big one but dunt know how to split and
>>> retrieve the small files again while reading files,,, as well..
>>>  Thanks and Gratitude
>>> On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak <[EMAIL PROTECTED]
>>> > wrote:
>>>
>>>> Thanks a lot wellington and bejoy for your inputs will try out this api
>>>> and sequence file....
>>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Yes, Sadak,
>>>>>
>>>>> Within this API, you'll copy your files into Hadoop HDFS as you do
>>>>> when writing to an OutputStream. It will be replicated in your
>>>>> cluster's HDFS then.
>>>>>
>>>>> Cheers.
>>>>>
>>>>> 2011/10/4 visioner sadak <[EMAIL PROTECTED]>:
>>>>>  > Hey thanks wellington just a thought will my data be replicated as
>>>>> well coz
>>>>> > i thought tht mapper does the job of breaking data in to pieces and
>>>>> > distribution and reducer will do the joining and combining while
>>>>> fetching
>>>>> > data back thts why was confused to use a MR..can i use this API for
>>>>> > uploading a large number of small files as well thru my application
>>>>> or
>>>>> > should i use sequence file class for that...because i saw the small
>>>>> file
>>>>> > problem in hadoop as well as mentioned in below link
>>>>> >
>>>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>> >
>>>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
>>>>> > <[EMAIL PROTECTED]> wrote:
>>>>> >>
>>>>> >> Hey Sadak,
>>>>> >>
>>>>> >> you don't need to write a MR job for that. You can make your java
>>>>> >> program use Hadoop Java API for that. You would need to use
>>>>> FileSystem
>>>>> >>
>>>>> >> (
>>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>>>>> )
>>>>> >> and Path
>>>>> >> (
>>>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
>>>>> )
>>>>> >> classes for that.
>>>>> >>
>>>>> >> Cheers,
>>>>> >> Wellington.
>>>>> >>
>>>>> >> 2011/10/4 visioner sadak <[EMAIL PROTECTED]>:
>>>>> >> > Hello guys,
>>>>> >> >
>>>>> >> >             I would like to know how to do file uploads in HDFS
>>>>> using
>>>>> >> > java,is it to be done using map reduce what if i have a large
>>>>> number of
>>>>> >> > small files should i use sequence file along with map reduce???,It