Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Is there any way to use a hdfs file as a Circular buffer?


Copy link to this message
-
Re: Is there any way to use a hdfs file as a Circular buffer?
Hi Lin,

It might be worth checking out Apache Flume, which was built for highly
parallel ingest into HDFS.

-Sandy
On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <[EMAIL PROTECTED]> wrote:

> If every device can send it's information as a 'event', you could use a
> publish-subscribe messaging system like Apache Kafka. (
> http://kafka.apache.org/)  Kafka is designed to self-manage it's storage
> by saving the last 'n-events' of data, acting like a circular buffer.  The
> device would publish it's binary data to Kafka and Hadoop would act as a
> subscriber to Kafka by consuming events.   If you need a scheduler to make
> Hadoop process the Kafka events look at Azkaban as it supports both
> scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)
>
> Remember Hadoop is batch processing so reports won't happen in real time.
>   If you need to run reports in real-time, watch the Samza project which
> uses Yarn and Kafka to process real-time streaming data. (
> http://incubator.apache.org/projects/samza.html)
>
> On Aug 7, 2013, at 9:59 AM, Wukang Lin <[EMAIL PROTECTED]> wrote:
>
> > Hi Shekhar,
> >     Thank you for your replies.So far as I know, Storm is a distributed
> computing framework, but what we need is a storage system, high throughput
> and concurrency is matters.We have thousands of devices, each device will
> produce a steady stream of brinary data. The space for every device is
> fixed, so their should reuse the space on the disk.So, how can storm or
> esper achieve that?
> >
> > Many Thanks
> > Lin Wukang
> >
> >
> > 2013/8/8 Shekhar Sharma <[EMAIL PROTECTED]>
> > Use CEP tool like Esper and Storm, you will be able to achieve that
> > ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> > Regards,
> > Som Shekhar Sharma
> > +91-8197243810
> >
> >
> > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <[EMAIL PROTECTED]>
> wrote:
> > Hi Niels and Bertrand,
> >     Thank you for you great advices.
> >     In our scenario, we need to store a steady stream of binary data
> into a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as  hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but  not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
> >     we need a distributed storage system,with Highe throughput, HA,LB
> and secure. Maybe It act much like hbase, manager a lot of small
> file(hfile) as a large region. we manager a lot of small file as a large
> one. Perhaps we should develop it by ourselives.
> >
> > Thank you.
> > Lin Wukang
> >
> >
> > 2013/7/25 Niels Basjes <[EMAIL PROTECTED]>
> > A circular file on hdfs is not possible.
> >
> > Some of the ways around this limitation:
> > - Create a series of files and delete the oldest file when you have too
> much.
> > - Put the data into an hbase table and do something similar.
> > - Use completely different technology like mongodb which has built in
> support for a circular buffer (capped collection).
> >
> > Niels
> >
> > Hi all,
> >    Is there any way to use a hdfs file as a Circular buffer? I mean, if
> I set a quotas to a directory on hdfs, and writting data to a file in that
> directory continuously. Once the quotas exceeded, I can redirect the
> writter and write the data from the beginning of the file automatically .
> >
> >
> >
> >
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB