Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> One petabyte of data loading into HDFS with in 10 min.


Copy link to this message
-
Re: One petabyte of data loading into HDFS with in 10 min.
Hello Shailesh,

      Give distcp a shot. It runs a MR for copying data from source to
destination, so the data can be copied parallely.

Regards,
    Mohammad Tariq

On Wed, Sep 5, 2012 at 7:44 PM, Shailesh Dargude <
[EMAIL PROTECTED]> wrote:

> Sorry Prabhu for hijacking this discussion a bit..  I wonder , what is the
> best practice to load the data in HDFS in general. Considering the size of
> the data ( many times its in gbs or TBs generally),   how are storage  and
> time constraints handled.****
>
> ** **
>
> If anybody  can share your experiences or best practice it would great!***
> *
>
> ** **
>
> -Shailesh.****
>
> ** **
>
> *From:* Chen He [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, September 05, 2012 7:34 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: One petabyte of data loading into HDFS with in 10 min.****
>
> ** **
>
> If it is not a single file, you can upload them using multiple threads to
> HDFS.****
>
> On Wed, Sep 5, 2012 at 7:21 AM, prabhu K <[EMAIL PROTECTED]> wrote:*
> ***
>
> Hi Users,****
>
>  ****
>
> Please clarify the below questions.****
>
>  ****
>
> 1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many
> slave (Data Nodes) machines required.****
>
>  ****
>
> 2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what is
> the configuration setup for cloud computing.****
>
>  ****
>
> Please suggest and help me on this.****
>
>  ****
>
> Thanks&Regards,****
>
> Prabhu.****
>
>  ****
>
> ** **
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB