Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Estimating the time of my hadoop jobs


Copy link to this message
-
Re: Estimating the time of my hadoop jobs
Hi Kandoi,
It depends on:
how many cores on each VNode
how complicated of your analysis application

But I don't think it's normal spent 3hr to process 30GB data even on your
*not good* hareware.
On Tue, Dec 17, 2013 at 6:39 PM, Kandoi, Nikhil <[EMAIL PROTECTED]>wrote:

> Hello everyone,
>
>
>
> I am new to Hadoop and would like to see if I’m on the right track.
>
> Currently I’m developing an application which would ingest logs of order
> of 60-70 GB of data/day and would then do
>
> Some analysis on them
>
> Now the infrastructure that I have is a 4 node cluster( all nodes on
> Virtual Machines) , all nodes have 4GB ram.
>
>
>
> But when I try to run the dataset (which is a sample dataset at this point
> ) of about 30 GB, it takes about 3 hrs to process all of it.
>
>
>
> I would like to know is it normal for this kind of infrastructure to take
> this amount of time.
>
>
>
>
>
> Thank you
>
>
>
> Nikhil Kandoi/
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB