Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Three questions about Hadoop


Copy link to this message
-
Re: Three questions about Hadoop
Hi Annie,

2010/1/5 qin.wang <[EMAIL PROTECTED]>

> Hi team,
>
>
>
> When I try to do some research on Hadoop, I have several high level
> questions, if any comments from you it will do great help for me:
>
>
>
> 1. Hadoop assumes the files are big files, but take Google as an example,
> it
> seems the google result for user are small files, so how to understand the
> big files?And what’s the file content for example?
>
> I think the "big files" means very large file (bigger than 64MB). Hadoop
use the HDFS as Distributed filesystem, the user log & web log etc are
stored in HDFS, The engineers can use Hadoop to do analysis on the logs.
Anyway, i don't know whether Google puts it's web pags in the distributed
filesystem like this.

>
> 2. Why are the files write-once and read-many times?
>
>
> As  mentioned in last section, the logs are stored in HDFS, these log are
write-once and alway used by engineers for severl times.

>
> 3. How to install other softwares on Hadoop, is there any special
> requirements for the software? Do they need to support the Map/Reduce
> module
> and then can be installed?
>
> I just don't know what you mean, maybe you would like add additional jar
which used in your application, if so, "distributed cache" in hadoop will
help you.

Good Luck!

>
> It will be very appreciated for your help.
>
>
>
> 王 琴  Annie.Wang
>
>
>
> 上海市徐汇区桂林路418号7号�
�楼
> Zip code: 200 233
> Tel:      +86 21 5497 8666-8004
> Fax:     +86 21 5497 7986
> Mobile:  +86 137 6108 8369
>
>
>
>
--
http://anqiang1900.blog.163.com/