Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Estimating the time of my hadoop jobs


+
Kandoi, Nikhil 2013-12-17, 10:39
Copy link to this message
-
Re: Estimating the time of my hadoop jobs
Azuryy Yu 2013-12-17, 10:44
Hi Kandoi,
It depends on:
how many cores on each VNode
how complicated of your analysis application

But I don't think it's normal spent 3hr to process 30GB data even on your
*not good* hareware.
On Tue, Dec 17, 2013 at 6:39 PM, Kandoi, Nikhil <[EMAIL PROTECTED]>wrote:

> Hello everyone,
>
>
>
> I am new to Hadoop and would like to see if I’m on the right track.
>
> Currently I’m developing an application which would ingest logs of order
> of 60-70 GB of data/day and would then do
>
> Some analysis on them
>
> Now the infrastructure that I have is a 4 node cluster( all nodes on
> Virtual Machines) , all nodes have 4GB ram.
>
>
>
> But when I try to run the dataset (which is a sample dataset at this point
> ) of about 30 GB, it takes about 3 hrs to process all of it.
>
>
>
> I would like to know is it normal for this kind of infrastructure to take
> this amount of time.
>
>
>
>
>
> Thank you
>
>
>
> Nikhil Kandoi/
>
+
Kandoi, Nikhil 2013-12-17, 11:26
+
Kandoi, Nikhil 2013-12-18, 08:33