Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> is hadoop suitable for us?

Pierre Antoine Du Bois De... 2012-05-17, 20:38
Mathias Herberts 2012-05-17, 20:41
Pierre Antoine Du Bois De... 2012-05-17, 20:46
Abhishek Pratap Singh 2012-05-17, 21:36
Pierre Antoine Du Bois De... 2012-05-17, 22:32
Michael Segel 2012-05-17, 23:17
Pierre Antoine Du Bois De... 2012-05-18, 00:28
Sagar Shukla 2012-05-18, 00:44
Pierre Antoine DuBoDeNa 2012-05-18, 04:10
Copy link to this message
Re: is hadoop suitable for us?
We're using a multi-user Hadoop MapReduce installation with up to 100
computing nodes, without HDFS.  Since we have a shared cluster and not
all apps use Hadoop, we grow/shrink the Hadoop cluster as the load
changes.  It's working, and because of our hardware setup performance is
quite close to what we had with HDFS.  We're storing everything directly
on the SAN.

The only problem so far has been trying to get the system to work
without running the JT as root (I posted yesterday about that problem).
On 05/18/2012 06:10 AM, Pierre Antoine DuBoDeNa wrote:
> You used HDFS too? or storing everything on SAN immediately?
> I don't have number of GB/TB (it might be about 2TB so not really that
> "huge") but they are more than 100 million documents to be processed. In a
> single machine currently we can process about 200.000 docs/day (several
> parsing, indexing, metadata extraction has to be done). So in the worst
> case we want to use the 50 VMs to distribute the processing..
> 2012/5/17 Sagar Shukla<[EMAIL PROTECTED]>
>> Hi PA,
>>      In my environment, we had a SAN storage and I/O was pretty good. So if
>> you have similar environment then I don't see any performance issues.
>> Just out of curiosity - what amount of data are you looking forward to
>> process ?
>> Regards,
>> Sagar
>> -----Original Message-----
>> From: Pierre Antoine Du Bois De Naurois [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, May 17, 2012 8:29 PM
>> Subject: Re: is hadoop suitable for us?
>> Thanks Sagar, Mathias and Michael for your replies.
>> It seems we will have to go with hadoop even if I/O will be slow due to
>> our configuration.
>> I will try to update on how it worked for our case.
>> Best,
>> PA
>> 2012/5/17 Michael Segel<[EMAIL PROTECTED]>
>>> The short answer is yes.
>>> The longer answer is that you will have to account for the latencies.
>>> There is more but you get the idea..
>>> Sent from my iPhone
>>> On May 17, 2012, at 5:33 PM, "Pierre Antoine Du Bois De Naurois"<
>>> [EMAIL PROTECTED]>  wrote:
>>>> We have large amount of text files that we want to process and index
>>> (plus
>>>> applying other algorithms).
>>>> The problem is that our configuration is share-everything while
>>>> hadoop
>>> has
>>>> a share-nothing configuration.
>>>> We have 50 VMs and not actual servers, and these share a huge
>>>> central storage. So using HDFS might not be really useful as
>>>> replication will not help, distribution of files have no meaning as
>>>> all files will be again located in the same HDD. I am afraid that
>>>> I/O will be very slow with or without HDFS. So i am wondering if it
>>>> will really help us to use hadoop/hbase/pig etc. to distribute and
>>>> do several parallel tasks.. or is "better" to install something
>>>> different (which i am not sure what). We heard myHadoop is better
>>>> for such kind of configurations, have any clue about it?
>>>> For example we now have a central mySQL to check if we have already
>>>> processed a document and keeping there several metadata. Soon we
>>>> will
>>> have
>>>> to distribute it as there is not enough space in one VM, But
>>>> Hadoop/HBase will be useful? we don't want to do any complex
>>>> join/sort of the data, we just want to do queries to check if
>>>> already processed a document, and if not to add it with several of
>> it's metadata.
>>>> We heard sungrid for example is another way to go but it's
>>>> commercial. We are somewhat lost.. so any help/ideas/suggestions are
>> appreciated.
>>>> Best,
>>>> PA
>>>> 2012/5/17 Abhishek Pratap Singh<[EMAIL PROTECTED]>
>>>>> Hi,
>>>>> For your question if HADOOP can be used without HDFS, the answer is
>> Yes.
>>>>> Hadoop can be used with any kind of distributed file system.
>>>>> But I m not able to understand the problem statement clearly to
>>>>> advice
>>> my
>>>>> point of view.

Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
09010 Pula (CA), Italy
Tel: +39 0709250452
Michael Segel 2012-05-18, 10:10
Sagar Shukla 2012-05-17, 23:01