Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> TeraSort question.


Copy link to this message
-
Re: TeraSort question.
Ravi,

Please post the figures and graphs .. Figures for  large clusters (>
200 nodes) are certainly interesting ..

Thanks

On Tue, Jan 11, 2011 at 10:36 AM, Raj V <[EMAIL PROTECTED]> wrote:
> All,
>
> I have been running terasort on a 480 node hadoop cluster. I have also collected cpu,memory,disk, network statistics during this run. The system stats are quite intersting. I can post it when I have put them together in some presentable format ( if there is interest.). However while looking at the data, I noticed something interesting.
>
>  I thought, intutively, that the all the systems in the cluster would have more or less similar behaviour ( time translation was possible) but the overall graph would look the same.,
>
> Just to confirm it I took 5 random nodes and looked at the CPU, disk ,network etc. activity when the sort was running. Strangeley enough, it was not so., Two of the 5 systems were seriously busy, big IO with lots of disk and network activity. The other three systems, CPU was more or less 100% idle, slight network and I/O.
>
> Is that normal and/or expected? SHouldn't all the nodes be utilized in more or less manner over the length of the run?
>
> I generated the data forf the sort using teragen. ( 128MB bloick size, replication =3).
>
> I would also be interested in other people timings of sort. Is there some place where people can post sort numbers ( not just the record.)
>
> I will post the actual graphs of the 5 nodes, if there is interest, tomorrow. ( Some logistical issues abt. posting them tonight)
>
> I am using CDH3B3, even though I think this is not specific to CDH3B3.
>
> Sorry for the cross post.
>
> Raj
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB