Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> Hadoop for unstructured data storage


+
Hemant kulkarni 2011-10-06, 22:35
Copy link to this message
-
Re: Hadoop for unstructured data storage
HDFS does not really meet your needs.  I think that MapR's solution would.
 I will contact off-line to give details.

On Thu, Oct 6, 2011 at 3:35 PM, Hemant kulkarni <[EMAIL PROTECTED]>wrote:

> Hi all,
> We are a small software development firm working on data backup
> software. We have a backup product which copies data from client
> machine to data store. Currently we provide a specialized hardware to
> store data(1-3TB disks and servers). We want to provide solution to
> some customers(mining company) with following requirements
> 1] Huge data storage capacity(initially starting with 100 TB but
> should be easy to increase)
> 2] Initially this facility is used as data storage but in future
> company plans to add data processing software(some MapReduce jobs)
> 3] Most of data is unstructured (mostly images, text files and videos)
> 4] many times data is duplicate of some original. So need de duplication
> 5] Mostly data is added every time(daily backup) and occasionally
> read.(Write every day new data and read on weekly)
> 6] data copied is in terms of files(every backup is 100,000 files each
> file is some MB and some files in KB)
> 7] this is data storage so latency requirements are not very strict
> 8] Some part of data have very high HA requirements. Should be copied
> to data centers outside country on timely basis(weekly, but data size
> is small like few TB)
> 9]Currently we provide some sort of HSM(Hierarchical Storage
> Management ). company needs something similar in new solution
> 10] Single namespace and versioning of files is another requirement
>
> As I understood HDFS doesn't suit directly for such storage due to
> following design consideration
> 1] Large no of small files
> 2] duplicate data
> 3] write many read once requirement
>
> Here are my questions
> 1] Does DHFS support our client requirements? or at least can it be
> configured to suit needs?
> 2] is there any customization of HDFS(if possible) which will serve the
> purpose
>
> is there any other solution which will work?
>
> All thoughts/suggestions are welcome
>
> Regards,
> Hemant.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB