Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: queues in haddop


+
Mirko Kämpf 2013-01-11, 12:52
Copy link to this message
-
Re: queues in haddop
There is also kafka. http://kafka.apache.org
"A high-throughput, distributed, publish-subscribe messaging system."

But it does not push into HDFS, you need to launch a job to pull data in.

Regards

Bertrand

On Fri, Jan 11, 2013 at 1:52 PM, Mirko Kämpf <[EMAIL PROTECTED]> wrote:

> I would suggest to work with Flume, in order to clollect a certain number
> of files and store it to HDFS in larger chunk or write it directly to
> HBase, this allows random access later on (if need) otherwise HBase could
> be an overkill. You can collect data in an MySQL DB and than import
> regularly via Sqoop.
>
> Best
> Mirko
>
>
> "Every dat flow goes to Hadoop"
> citation from an unkown source
>
> 2013/1/11 Hemanth Yamijala <[EMAIL PROTECTED]>
>
>> Queues in the capacity scheduler are logical data structures into which
>> MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
>> framework, according to some capacity constraints that can be defined for a
>> queue.
>>
>> So, given your use case, I don't think Capacity Scheduler is going to
>> directly help you (since you only spoke about data-in, and not processing)
>>
>> So, yes something like Flume or Scribe
>>
>> Thanks
>> Hemanth
>>
>> On Fri, Jan 11, 2013 at 11:34 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> Your question in unclear: HDFS has no queues for ingesting data (it is
>>> a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
>>> components have queues for processing data purposes.
>>>
>>> On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <[EMAIL PROTECTED]>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a hadoop cluster setup of 10 nodes and I an in need of
>>> implementing
>>> > queues in the cluster for receiving high volumes of data.
>>> > Please suggest what will be more efficient to use in the case of
>>> receiving
>>> > 24 Million Json files.. approx 5 KB each in every 24 hours :
>>> > 1. Using Capacity Scheduler
>>> > 2. Implementing RabbitMQ and receive data from them using Spring
>>> Integration
>>> > Data pipe lines.
>>> >
>>> > I cannot afford to loose any of the JSON files received.
>>> >
>>> > Thanking You,
>>> >
>>> > --
>>> > Regards,
>>> > Ouch Whisper
>>> > 010101010101
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
--
Bertrand Dechoux
+
Tsuyoshi OZAWA 2013-01-11, 14:08