I would suggest to work with Flume, in order to clollect a certain number
of files and store it to HDFS in larger chunk or write it directly to
HBase, this allows random access later on (if need) otherwise HBase could
be an overkill. You can collect data in an MySQL DB and than import
regularly via Sqoop.
"Every dat flow goes to Hadoop"
citation from an unkown source
2013/1/11 Hemanth Yamijala <[EMAIL PROTECTED]>
> Queues in the capacity scheduler are logical data structures into which
> MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
> framework, according to some capacity constraints that can be defined for a
> So, given your use case, I don't think Capacity Scheduler is going to
> directly help you (since you only spoke about data-in, and not processing)
> So, yes something like Flume or Scribe
> On Fri, Jan 11, 2013 at 11:34 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> Your question in unclear: HDFS has no queues for ingesting data (it is
>> a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
>> components have queues for processing data purposes.
>> On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <[EMAIL PROTECTED]>
>> > Hello,
>> > I have a hadoop cluster setup of 10 nodes and I an in need of
>> > queues in the cluster for receiving high volumes of data.
>> > Please suggest what will be more efficient to use in the case of
>> > 24 Million Json files.. approx 5 KB each in every 24 hours :
>> > 1. Using Capacity Scheduler
>> > 2. Implementing RabbitMQ and receive data from them using Spring
>> > Data pipe lines.
>> > I cannot afford to loose any of the JSON files received.
>> > Thanking You,
>> > --
>> > Regards,
>> > Ouch Whisper
>> > 010101010101
>> Harsh J