Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: queues in haddop


Copy link to this message
-
Re: queues in haddop
You can also use fluentd. http://fluentd.org/
"Fluentd receives logs as JSON streams, buffers them, and sends them
to other systems like Amazon S3, MongoDB, Hadoop, or other Fluentds."
It has a plugin for pushing into HDFS through fluent-plugin-webhdfs.
https://github.com/fluent/fluent-plugin-webhdfs
It can also handle JSON directly, so it fits in your case.

Thanks,
Tsuyoshi

On Fri, Jan 11, 2013 at 10:03 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> There is also kafka. http://kafka.apache.org
> "A high-throughput, distributed, publish-subscribe messaging system."
>
> But it does not push into HDFS, you need to launch a job to pull data in.
>
> Regards
>
> Bertrand
>
>
> On Fri, Jan 11, 2013 at 1:52 PM, Mirko Kämpf <[EMAIL PROTECTED]> wrote:
>>
>> I would suggest to work with Flume, in order to clollect a certain number
>> of files and store it to HDFS in larger chunk or write it directly to HBase,
>> this allows random access later on (if need) otherwise HBase could be an
>> overkill. You can collect data in an MySQL DB and than import regularly via
>> Sqoop.
>>
>> Best
>> Mirko
>>
>>
>> "Every dat flow goes to Hadoop"
>> citation from an unkown source
>>
>> 2013/1/11 Hemanth Yamijala <[EMAIL PROTECTED]>
>>>
>>> Queues in the capacity scheduler are logical data structures into which
>>> MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
>>> framework, according to some capacity constraints that can be defined for a
>>> queue.
>>>
>>> So, given your use case, I don't think Capacity Scheduler is going to
>>> directly help you (since you only spoke about data-in, and not processing)
>>>
>>> So, yes something like Flume or Scribe
>>>
>>> Thanks
>>> Hemanth
>>>
>>> On Fri, Jan 11, 2013 at 11:34 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Your question in unclear: HDFS has no queues for ingesting data (it is
>>>> a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
>>>> components have queues for processing data purposes.
>>>>
>>>> On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <[EMAIL PROTECTED]>
>>>> wrote:
>>>> > Hello,
>>>> >
>>>> > I have a hadoop cluster setup of 10 nodes and I an in need of
>>>> > implementing
>>>> > queues in the cluster for receiving high volumes of data.
>>>> > Please suggest what will be more efficient to use in the case of
>>>> > receiving
>>>> > 24 Million Json files.. approx 5 KB each in every 24 hours :
>>>> > 1. Using Capacity Scheduler
>>>> > 2. Implementing RabbitMQ and receive data from them using Spring
>>>> > Integration
>>>> > Data pipe lines.
>>>> >
>>>> > I cannot afford to loose any of the JSON files received.
>>>> >
>>>> > Thanking You,
>>>> >
>>>> > --
>>>> > Regards,
>>>> > Ouch Whisper
>>>> > 010101010101
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>
>
>
>
> --
> Bertrand Dechoux

--
OZAWA Tsuyoshi
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB