Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop file uploads


Copy link to this message
-
Re: Hadoop file uploads
Hi,

The code is very similar, just create a SequenceFile reader.

Brock

On Thu, Oct 13, 2011 at 4:53 AM, visioner sadak <[EMAIL PROTECTED]>wrote:

> Hello Brock,
>
>                   Thanks a lot for your help man,should i run this code
> after doing the small file uploads i mean i have a java api which does the
> small file uploads and reads as well,how will be i able to read the files as
> well
>
>
>
> On Thu, Oct 13, 2011 at 2:26 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> This:  http://pastebin.com/YFzAh0Nj
>>
>> will convert a directory of small files to a sequence file. The key is the
>> filename, the value the file itself. This works if each individual file is
>> small enough to fit in memory. If you have some files which are larger and
>> those files can be split up, they can be split over multiple key value
>> pairs.
>>
>> Brock
>>
>> On Wed, Oct 12, 2011 at 4:50 PM, visioner sadak <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Hello guys,
>>>
>>>             Thanks a lot again for your previous guidance guys,i tried
>>> out java api to do file uploads its wrking fine,now i need to modify the
>>> code using sequence files so that i can handle large number of small files
>>> in hadoop.. for that i encountered 2 links
>>>
>>> 1. http://stuartsierra.com/2008/04/24/a-million-little-files (tar to
>>> sequence)
>>> 2. http://www.jointhegrid.com/hadoop_filecrush/index.jsp (file crush)
>>>
>>> could you pls tell me which approach is better to follow or should i
>>> follow HAR(hadoop archive) approach,i came to know that in sequence file we
>>> can combine smaller files in to one big one but dunt know how to split and
>>> retrieve the small files again while reading files,,, as well..
>>>  Thanks and Gratitude
>>> On Wed, Oct 5, 2011 at 1:27 AM, visioner sadak <[EMAIL PROTECTED]
>>> > wrote:
>>>
>>>> Thanks a lot wellington and bejoy for your inputs will try out this api
>>>> and sequence file....
>>>>
>>>>
>>>> On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Yes, Sadak,
>>>>>
>>>>> Within this API, you'll copy your files into Hadoop HDFS as you do
>>>>> when writing to an OutputStream. It will be replicated in your
>>>>> cluster's HDFS then.
>>>>>
>>>>> Cheers.
>>>>>
>>>>> 2011/10/4 visioner sadak <[EMAIL PROTECTED]>:
>>>>>  > Hey thanks wellington just a thought will my data be replicated as
>>>>> well coz
>>>>> > i thought tht mapper does the job of breaking data in to pieces and
>>>>> > distribution and reducer will do the joining and combining while
>>>>> fetching
>>>>> > data back thts why was confused to use a MR..can i use this API for
>>>>> > uploading a large number of small files as well thru my application
>>>>> or
>>>>> > should i use sequence file class for that...because i saw the small
>>>>> file
>>>>> > problem in hadoop as well as mentioned in below link
>>>>> >
>>>>> > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>> >
>>>>> > On Wed, Oct 5, 2011 at 12:54 AM, Wellington Chevreuil
>>>>> > <[EMAIL PROTECTED]> wrote:
>>>>> >>
>>>>> >> Hey Sadak,
>>>>> >>
>>>>> >> you don't need to write a MR job for that. You can make your java
>>>>> >> program use Hadoop Java API for that. You would need to use
>>>>> FileSystem
>>>>> >>
>>>>> >> (
>>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>>>>> )
>>>>> >> and Path
>>>>> >> (
>>>>> http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/Path.html
>>>>> )
>>>>> >> classes for that.
>>>>> >>
>>>>> >> Cheers,
>>>>> >> Wellington.
>>>>> >>
>>>>> >> 2011/10/4 visioner sadak <[EMAIL PROTECTED]>:
>>>>> >> > Hello guys,
>>>>> >> >
>>>>> >> >             I would like to know how to do file uploads in HDFS
>>>>> using
>>>>> >> > java,is it to be done using map reduce what if i have a large
>>>>> number of
>>>>> >> > small files should i use sequence file along with map reduce???,It
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB