Full-text search for static HTML files on CD-Rom v

2020-02-03 04:10发布

问题:

I will be delivering a set of static HTML pages on CD-Rom; these pages need to be fully viewable with no Internet access whatsoever.

I'd like to provide a full-text search (Lucene-like) for the content of those pages, which should "just work" from the CD-Rom with no software installation on the client machine.

A search engine implementation in javascript would be the perfect solution, but I have trouble finding any that looks solid / current / popular...?

I did find these: + jsFind + js-search

but both projects seem rather inactive?

Another solution, besides a specific search engine in javascript, would be the ability to access local Lucene indices from javascript: the indices themselves would be built with Lucene and copied to the CD-Rom along with the HTML files.

Edit: built it myself (see below).

回答1:

Well in fact I built it myself.

The existing solutions (that I could find) were unconvincing.

I wanted to be able to search a very long tree (ul/li/ul...) that is displayed as one page; it contains 5000+ items.

It sounds a little weird to display such a long tree on one page but in fact with collapse / expand it's much more intuitive than separate pages, and since we're offline, download times are not a problem (parsing times are, though, but Chrome is amazing ;-)

The "search" function provided with modern browsers (FF and Chrome anyway) have two big problems: they only search visible items on the page, and they can't search non-consecutive words.

I want to be able to search collapsed items (not visible on the screen); I want to find "one two three" when searching "one three" (just like with Google / Lucene); and I want to open just the branches of the tree containing found items.

So, what I did was:

  1. create an inverted index of words <-> ids of items from the list (via xslt) (approx. 4500 unique words in the document)
  2. convert this index to bunch of javascript arrays (one word = one array, containing ids)
  3. when searching, intersect the arrays represented by the search words
  4. step 3 returns an array of ids that I can then open / highlight

It does exactly what I needed and it's really fast. Better yet, since it searches from an independant "index" (arrays of ids) it can search when the list is not even loaded in the browser!



回答2:

Initial question was asked in '09

As of '14, there is lunr.js described as :

Simple full-text search in your browser

See the Demo, and Github repo.


UPDATE September 2016: Lightweight fuzzy-search, in JavaScript http://fusejs.io/



回答3:

Zoom Search Engine can do this.

I haven't used the CD version, but I use the PHP version for my website and it works very well.



回答4:

I know a lot of people use Java to write CD search applets. I have a slightly elderly list of various free and commercial programs at Search Tools for CD-ROMs and DVDs.



回答5:

Have a look at CLucene -

http://sourceforge.net/projects/clucene

http://clucene.git.sourceforge.net/git/gitweb.cgi?p=clucene/clucene;a=summary

Compiling the C++ sources into a console or a Win32 executable would make the above possible also using the Lucene technology (which I assume you'd rather want to stick with).



回答6:

Fullproof is a nifty little javascript library that can act as a text search for you. It would be useful in this context, but it's also useful in the "thick-javascript-webpage" model.