I will be delivering a set of static HTML pages on CD-Rom; these pages need to be fully viewable with no Internet access whatsoever.
I'd like to provide a full-text search (Lucene-like) for the content of those pages, which should "just work" from the CD-Rom with no software installation on the client machine.
A search engine implementation in javascript would be the perfect solution, but I have trouble finding any that looks solid / current / popular...?
I did find these:
+ jsFind
+ js-search
but both projects seem rather inactive?
Another solution, besides a specific search engine in javascript, would be the ability to access local Lucene indices from javascript: the indices themselves would be built with Lucene and copied to the CD-Rom along with the HTML files.
Edit: built it myself (see below).
Well in fact I built it myself.
The existing solutions (that I could find) were unconvincing.
I wanted to be able to search a very long tree (ul/li/ul...) that is displayed as one page; it contains 5000+ items.
It sounds a little weird to display such a long tree on one page but in fact with collapse / expand it's much more intuitive than separate pages, and since we're offline, download times are not a problem (parsing times are, though, but Chrome is amazing ;-)
The "search" function provided with modern browsers (FF and Chrome anyway) have two big problems: they only search visible items on the page, and they can't search non-consecutive words.
I want to be able to search collapsed items (not visible on the screen); I want to find "one two three" when searching "one three" (just like with Google / Lucene); and I want to open just the branches of the tree containing found items.
So, what I did was:
- create an inverted index of words <-> ids of items from the list (via xslt) (approx. 4500 unique words in the document)
- convert this index to bunch of javascript arrays (one word = one array, containing ids)
- when searching, intersect the arrays represented by the search words
- step 3 returns an array of ids that I can then open / highlight
It does exactly what I needed and it's really fast. Better yet, since it searches from an independant "index" (arrays of ids) it can search when the list is not even loaded in the browser!
Initial question was asked in '09
As of '14, there is lunr.js described as :
Simple full-text search in your browser
See the Demo, and Github repo.
UPDATE September 2016: Lightweight fuzzy-search, in JavaScript http://fusejs.io/
Zoom Search Engine can do this.
I haven't used the CD version, but I use the PHP version for my website and it works very well.
I know a lot of people use Java to write CD search applets. I have a slightly elderly list of various free and commercial programs at
Search Tools for CD-ROMs and DVDs.
Have a look at CLucene -
http://sourceforge.net/projects/clucene
http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=summary
Compiling the C++ sources into a console or a Win32 executable would make the above possible also using the Lucene technology (which I assume you'd rather want to stick with).
Fullproof is a nifty little javascript library that can act as a text search for you. It would be useful in this context, but it's also useful in the "thick-javascript-webpage" model.