Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Hadoop noob question


Copy link to this message
-
Re: Hadoop noob question
@Nitin , parallel dfs to write to hdfs is great , but could not understand
the meaning of capable NN. As I know , the NN would not be a part of the
actual data write pipeline , means that the data would not travel through
the NN , the dfs would contact the NN from time to time to get locations of
DN as where to store the data blocks.

Thanks,
Rahul

On Sat, May 11, 2013 at 4:54 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> is it safe? .. there is no direct answer yes or no
>
> when you say , you have files worth 10TB files and you want to upload  to
> HDFS, several factors come into picture
>
> 1) Is the machine in the same network as your hadoop cluster?
> 2) If there guarantee that network will not go down?
>
> and Most importantly I assume that you have a capable hadoop cluster. By
> that I mean you have a capable namenode.
>
> I would definitely not write files sequentially in HDFS. I would prefer to
> write files in parallel to hdfs to utilize the DFS write features to speed
> up the process.
> you can hdfs put command in parallel manner and in my experience it has
> not failed when we write a lot of data.
>
>
> On Sat, May 11, 2013 at 4:38 PM, maisnam ns <[EMAIL PROTECTED]> wrote:
>
>> @Nitin Pawar , thanks for clearing my doubts .
>>
>> But I have one more question , say I have 10 TB data in the pipeline .
>>
>> Is it perfectly OK to use hadopo fs put command to upload these files of
>> size 10 TB and is there any limit to the file size  using hadoop command
>> line . Can hadoop put command line work with huge data.
>>
>> Thanks in advance
>>
>>
>> On Sat, May 11, 2013 at 4:24 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>
>>> first of all .. most of the companies do not get 100 PB of data in one
>>> go. Its an accumulating process and most of the companies do have a data
>>> pipeline in place where the data is written to hdfs on a frequency basis
>>> and  then its retained on hdfs for some duration as per needed and from
>>> there its sent to archivers or deleted.
>>>
>>> For data management products, you can look at falcon which is open
>>> sourced by inmobi along with hortonworks.
>>>
>>> In any case, if you want to write files to hdfs there are few options
>>> available to you
>>> 1) Write your dfs client which writes to dfs
>>> 2) use hdfs proxy
>>> 3) there is webhdfs
>>> 4) command line hdfs
>>> 5) data collection tools come with support to write to hdfs like flume
>>> etc
>>>
>>>
>>> On Sat, May 11, 2013 at 4:19 PM, Thoihen Maibam <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi All,
>>>>
>>>> Can anyone help me know how does companies like Facebook ,Yahoo etc
>>>> upload bulk files say to the tune of 100 petabytes to Hadoop HDFS cluster
>>>> for processing
>>>> and after processing how they download those files from HDFS to local
>>>> file system.
>>>>
>>>> I don't think they might be using the command line hadoop fs put to
>>>> upload files as it would take too long or do they divide say 10 parts each
>>>> 10 petabytes and  compress and use the command line hadoop fs put
>>>>
>>>> Or if they use any tool to upload huge files.
>>>>
>>>> Please help me .
>>>>
>>>> Thanks
>>>> thoihen
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>
+
Nitin Pawar 2013-05-11, 15:50
+
Mohammad Tariq 2013-05-11, 16:03
+
Nitin Pawar 2013-05-11, 16:05
+
Shahab Yunus 2013-05-11, 16:10
+
Mohammad Tariq 2013-05-11, 17:04
+
Rahul Bhattacharjee 2013-05-11, 17:16
+
Mohammad Tariq 2013-05-11, 17:22
+
Rahul Bhattacharjee 2013-05-12, 12:30
+
Mohammad Tariq 2013-05-12, 12:10