Queues in the capacity scheduler are logical data structures into which
MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
framework, according to some capacity constraints that can be defined for a
So, given your use case, I don't think Capacity Scheduler is going to
directly help you (since you only spoke about data-in, and not processing)
So, yes something like Flume or Scribe
On Fri, Jan 11, 2013 at 11:34 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Your question in unclear: HDFS has no queues for ingesting data (it is
> a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
> components have queues for processing data purposes.
> On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <[EMAIL PROTECTED]>
> > Hello,
> > I have a hadoop cluster setup of 10 nodes and I an in need of
> > queues in the cluster for receiving high volumes of data.
> > Please suggest what will be more efficient to use in the case of
> > 24 Million Json files.. approx 5 KB each in every 24 hours :
> > 1. Using Capacity Scheduler
> > 2. Implementing RabbitMQ and receive data from them using Spring
> > Data pipe lines.
> > I cannot afford to loose any of the JSON files received.
> > Thanking You,
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> Harsh J