Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # dev - Hadoop for unstructured data storage


+
Hemant kulkarni 2011-10-06, 22:35
Copy link to this message
-
Re: Hadoop for unstructured data storage
Ted Dunning 2011-10-06, 22:50
HDFS does not really meet your needs.  I think that MapR's solution would.
 I will contact off-line to give details.

On Thu, Oct 6, 2011 at 3:35 PM, Hemant kulkarni <[EMAIL PROTECTED]>wrote:

> Hi all,
> We are a small software development firm working on data backup
> software. We have a backup product which copies data from client
> machine to data store. Currently we provide a specialized hardware to
> store data(1-3TB disks and servers). We want to provide solution to
> some customers(mining company) with following requirements
> 1] Huge data storage capacity(initially starting with 100 TB but
> should be easy to increase)
> 2] Initially this facility is used as data storage but in future
> company plans to add data processing software(some MapReduce jobs)
> 3] Most of data is unstructured (mostly images, text files and videos)
> 4] many times data is duplicate of some original. So need de duplication
> 5] Mostly data is added every time(daily backup) and occasionally
> read.(Write every day new data and read on weekly)
> 6] data copied is in terms of files(every backup is 100,000 files each
> file is some MB and some files in KB)
> 7] this is data storage so latency requirements are not very strict
> 8] Some part of data have very high HA requirements. Should be copied
> to data centers outside country on timely basis(weekly, but data size
> is small like few TB)
> 9]Currently we provide some sort of HSM(Hierarchical Storage
> Management ). company needs something similar in new solution
> 10] Single namespace and versioning of files is another requirement
>
> As I understood HDFS doesn't suit directly for such storage due to
> following design consideration
> 1] Large no of small files
> 2] duplicate data
> 3] write many read once requirement
>
> Here are my questions
> 1] Does DHFS support our client requirements? or at least can it be
> configured to suit needs?
> 2] is there any customization of HDFS(if possible) which will serve the
> purpose
>
> is there any other solution which will work?
>
> All thoughts/suggestions are welcome
>
> Regards,
> Hemant.
>