Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Is there any way to use a hdfs file as a Circular buffer?


Copy link to this message
-
Re: Is there any way to use a hdfs file as a Circular buffer?
Adam Faris 2013-08-15, 18:16
If every device can send it's information as a 'event', you could use a publish-subscribe messaging system like Apache Kafka. (http://kafka.apache.org/)  Kafka is designed to self-manage it's storage by saving the last 'n-events' of data, acting like a circular buffer.  The device would publish it's binary data to Kafka and Hadoop would act as a subscriber to Kafka by consuming events.   If you need a scheduler to make Hadoop process the Kafka events look at Azkaban as it supports both scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)  

Remember Hadoop is batch processing so reports won't happen in real time.   If you need to run reports in real-time, watch the Samza project which uses Yarn and Kafka to process real-time streaming data. (http://incubator.apache.org/projects/samza.html)

On Aug 7, 2013, at 9:59 AM, Wukang Lin <[EMAIL PROTECTED]> wrote:

> Hi Shekhar,
>     Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that?
>
> Many Thanks
> Lin Wukang
>
>
> 2013/8/8 Shekhar Sharma <[EMAIL PROTECTED]>
> Use CEP tool like Esper and Storm, you will be able to achieve that
> ...I can give you more inputs if you can provide me more details of what you are trying to achieve
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
>
> On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <[EMAIL PROTECTED]> wrote:
> Hi Niels and Bertrand,
>     Thank you for you great advices.
>     In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as  hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but  not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same.
>     we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives.
>
> Thank you.
> Lin Wukang
>
>
> 2013/7/25 Niels Basjes <[EMAIL PROTECTED]>
> A circular file on hdfs is not possible.
>
> Some of the ways around this limitation:
> - Create a series of files and delete the oldest file when you have too much.
> - Put the data into an hbase table and do something similar.
> - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection).
>
> Niels
>
> Hi all,
>    Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
>
>
>
>