Hemanth Yamijala 2013-01-11, 10:30
He's got two different queues.
1) queue in capacity scheduler so he can have a set or M/R tasks running in the background to pull data off of...
2) a durable queue that receives the inbound json files to be processed.
You can have a customer written listener that pulls data from the queue and puts them either in HDFS or HBase, depending on the access patterns and the content of the files.
Then you would write a M/R job that actually processes the data to be used by ancillary processes not mentioned in the OP's question.
This is why he asked about RabbitMQ which is one option, there are others like ActiveMQ or something else....
On Jan 11, 2013, at 12:04 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Your question in unclear: HDFS has no queues for ingesting data (it is
> a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
> components have queues for processing data purposes.
> On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <[EMAIL PROTECTED]> wrote:
>> I have a hadoop cluster setup of 10 nodes and I an in need of implementing
>> queues in the cluster for receiving high volumes of data.
>> Please suggest what will be more efficient to use in the case of receiving
>> 24 Million Json files.. approx 5 KB each in every 24 hours :
>> 1. Using Capacity Scheduler
>> 2. Implementing RabbitMQ and receive data from them using Spring Integration
>> Data pipe lines.
>> I cannot afford to loose any of the JSON files received.
>> Thanking You,
>> Ouch Whisper
> Harsh J