It might be worth checking out Apache Flume, which was built for highly
parallel ingest into HDFS.
On Thu, Aug 15, 2013 at 11:16 AM, Adam Faris <[EMAIL PROTECTED]> wrote:
> If every device can send it's information as a 'event', you could use a
> publish-subscribe messaging system like Apache Kafka. (
> http://kafka.apache.org/) Kafka is designed to self-manage it's storage
> by saving the last 'n-events' of data, acting like a circular buffer. The
> device would publish it's binary data to Kafka and Hadoop would act as a
> subscriber to Kafka by consuming events. If you need a scheduler to make
> Hadoop process the Kafka events look at Azkaban as it supports both
> scheduling and job dependencies. (http://azkaban.github.io/azkaban2/)
> Remember Hadoop is batch processing so reports won't happen in real time.
> If you need to run reports in real-time, watch the Samza project which
> uses Yarn and Kafka to process real-time streaming data. (
> On Aug 7, 2013, at 9:59 AM, Wukang Lin <[EMAIL PROTECTED]> wrote:
> > Hi Shekhar,
> > Thank you for your replies.So far as I know, Storm is a distributed
> computing framework, but what we need is a storage system, high throughput
> and concurrency is matters.We have thousands of devices, each device will
> produce a steady stream of brinary data. The space for every device is
> fixed, so their should reuse the space on the disk.So, how can storm or
> esper achieve that?
> > Many Thanks
> > Lin Wukang
> > 2013/8/8 Shekhar Sharma <[EMAIL PROTECTED]>
> > Use CEP tool like Esper and Storm, you will be able to achieve that
> > ...I can give you more inputs if you can provide me more details of what
> you are trying to achieve
> > Regards,
> > Som Shekhar Sharma
> > +91-8197243810
> > On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin <[EMAIL PROTECTED]>
> > Hi Niels and Bertrand,
> > Thank you for you great advices.
> > In our scenario, we need to store a steady stream of binary data
> into a circular storage,throughput and concurrency are the most important
> indicators.The first way seems work, but as hdfs is not friendly for small
> files, this approche may be not smooth enough.HBase is good, but not
> appropriate for us, both for throughput and storage.mongodb is quite good
> for web applications, but not suitable the scenario we meet all the same.
> > we need a distributed storage system,with Highe throughput, HA,LB
> and secure. Maybe It act much like hbase, manager a lot of small
> file(hfile) as a large region. we manager a lot of small file as a large
> one. Perhaps we should develop it by ourselives.
> > Thank you.
> > Lin Wukang
> > 2013/7/25 Niels Basjes <[EMAIL PROTECTED]>
> > A circular file on hdfs is not possible.
> > Some of the ways around this limitation:
> > - Create a series of files and delete the oldest file when you have too
> > - Put the data into an hbase table and do something similar.
> > - Use completely different technology like mongodb which has built in
> support for a circular buffer (capped collection).
> > Niels
> > Hi all,
> > Is there any way to use a hdfs file as a Circular buffer? I mean, if
> I set a quotas to a directory on hdfs, and writting data to a file in that
> directory continuously. Once the quotas exceeded, I can redirect the
> writter and write the data from the beginning of the file automatically .