Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Wikisearch


Appreciate everyone's help on the file storage question, but I was also looking at Josh's response to Thomas Jackson, and do I understand him correctly that the scan of the Index (and likely the ReverseIndex) table are really the key part of the search query, and the full table scan isn't really useful for much (because all of the tokens should go in the Index tables)?
So if I understand correctly, the partitioned main table is where documents and tokens get written, and then a combiner feeds the index tables, which are then scanned during a search?
What would I lose if I wanted to avoid Thomas's OOME and just skip the full table scan part of the search?  
Obviously, since I am not searching Wikipedia, I am going to be making some changes, just want to do it smartly.
Thanks,
Frank    
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB