Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Three questions about Hadoop


Copy link to this message
-
Re: Three questions about Hadoop
Hi Annie,

2010/1/5 qin.wang <[EMAIL PROTECTED]>

> Hi team,
>
>
>
> When I try to do some research on Hadoop, I have several high level
> questions, if any comments from you it will do great help for me:
>
>
>
> 1. Hadoop assumes the files are big files, but take Google as an example,
> it
> seems the google result for user are small files, so how to understand the
> big files?And what’s the file content for example?
>
> I think the "big files" means very large file (bigger than 64MB). Hadoop
use the HDFS as Distributed filesystem, the user log & web log etc are
stored in HDFS, The engineers can use Hadoop to do analysis on the logs.
Anyway, i don't know whether Google puts it's web pags in the distributed
filesystem like this.

>
> 2. Why are the files write-once and read-many times?
>
>
> As  mentioned in last section, the logs are stored in HDFS, these log are
write-once and alway used by engineers for severl times.

>
> 3. How to install other softwares on Hadoop, is there any special
> requirements for the software? Do they need to support the Map/Reduce
> module
> and then can be installed?
>
> I just don't know what you mean, maybe you would like add additional jar
which used in your application, if so, "distributed cache" in hadoop will
help you.

Good Luck!

>
> It will be very appreciated for your help.
>
>
>
> 王 琴  Annie.Wang
>
>
>
> 上海市徐汇区桂林路418号7号�
�楼
> Zip code: 200 233
> Tel:      +86 21 5497 8666-8004
> Fax:     +86 21 5497 7986
> Mobile:  +86 137 6108 8369
>
>
>
>
--
http://anqiang1900.blog.163.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB