Need to add a search to static HTML site

2020-02-24 06:05发布

问题:

Basically I've got an old static html site ( http://www.brownwatson.co.uk/brochure/page1.html ) I need to add a search box to it to search a folder called /brochure within that folder is html documents and images etc I need the search to find ISBN numbers, Book Reference Numbers, Titles etc.. There's no database the hosting provider has got php I was trying to create something like this:

<div id="contentsearch">
         <form id="searchForm" name="searchForm" method="post" action="search.php">
           <input name="search" type="text" value="search" maxlength="200" />
           <input name="submit" type="submit" value="Search" />
           </form>
         <?php
$dir = "/brochure/";

// Open a known directory, and proceed to read its contents
if (is_dir($dir)) {
if ($dh = opendir($dir)) {
    while (($file = readdir($dh)) !== false) {
        if($file == $_POST['search']){
            echo('<a href="'.$dir . $file.'">'. $file .'</a>'."\n");
        }
    }
    closedir($dh);
}
}
?>
       </div>

I know, I know this is pretty bad and doesn't work any ideas? I haven't created anything like this in years, and have pretty much just taken bits of code and stuck it together!

回答1:

There are quite a few solutions available for this. In no particular order:

Free or Open Source

  1. Google Custom Search Engine
  2. Tapir - hosted service that indexes pages on your RSS feed.
  3. Tipue - self hosted javaScript plugin, well documented, includes options for pinned search results.
  4. lunr.js - javaScript library.
  5. phinde - self hosted php and elasticsearch based search engine

See also http://indieweb.org/search#Software

Subscription (aka paid) Services:

  1. Google Site Search
  2. Swiftype - offers a free plan for personal sites/blogs.
  3. Algolia
  4. Amazon Cloud Search


回答2:

A very, very lazy option (to avoid setting up a Google Custom Search Engine) is to make a form that points at Google with a hidden query element that limits the search to your own site:

<div id="contentsearch">
  <form id="searchForm" name="searchForm" action="http://google.com/search">
    <input name="q" type="text" value="search" maxlength="200" />
    <input name="q" type="hidden" value="site:mysite.com"/>
    <input name="submit" type="submit" value="Search" />
  </form>
</div>

Aside from the laziness, this method gives you a bit more control over the appearance of the search form, compared to a CSE.



回答3:

If your site is well index by Google a quick and ready solution is use Google CSE.

Other than that for a static website with hard coded html pages and directory containing images; yes it is possible to create search mechanism. But trust me it is more hectic and resource consuming then creating a dynamic website.

Using PHP to search in directories and within files will be very inefficient. Instead of providing complicated PHP workarounds I would suggest go for a dynamic CMS driven website.



回答4:

I was searching for solution for searching for my blog created using Jekyll but didn't found good one, also Custom Google Search was giving me ads and results from subdomains, so it was not good. So I've created my own solution for this. I've written an article about how to create search for static site like Jekyll it's in Polish and translated using google translate.

Probably will create better manual translation or rewrite on my English blog soon.

The solution is python script that create SQLite database from HTML files and small PHP script that show search results. But it will require that your static site hosting also support PHP.

Just in case the article go down, here is the code, it's created just for my blog (my html and file structure) so it need to be tweaked to work with your blog.

Python script:

import os, sys, re, sqlite3
from bs4 import BeautifulSoup
def get_data(html):
    """return dictionary with title url and content of the blog post"""
    tree = BeautifulSoup(html, 'html5lib')
    body = tree.body
    if body is None:
        return None
    for tag in body.select('script'):
        tag.decompose()
    for tag in body.select('style'):
        tag.decompose()
    for tag in body.select('figure'): # ignore code snippets
        tag.decompose()
    text = tree.findAll("div", {"class": "body"})
    if len(text) > 0:
      text = text[0].get_text(separator='\n')
    else:
      text = None
    title = tree.findAll("h2", {"itemprop" : "title"}) # my h2 havee this attr
    url = tree.findAll("link", {"rel": "canonical"}) # get url
    if len(title) > 0:
      title = title[0].get_text()
    else:
      title = None
    if len(url) > 0:
      url = url[0]['href']
    else:
      url = None
    result = {
      "title": title,
      "url": url,
      "text": text
    }
    return result

if __name__ == '__main__':
  if len(sys.argv) == 2:
    db_file = 'index.db'
    # usunięcie starego pliku
    if os.path.exists(db_file):
      os.remove(db_file)
    conn = sqlite3.connect(db_file)
    c = conn.cursor()
    c.execute('CREATE TABLE page(title text, url text, content text)')
    for root, dirs, files in os.walk(sys.argv[1]):
      for name in files:
        # my files are in 20.* directories (eg. 2018) [/\\] is for windows and unix
        if name.endswith(".html") and re.search(r"[/\\]20[0-9]{2}", root):
          fname = os.path.join(root, name)
          f = open(fname, "r")
          data = get_data(f.read())
          f.close()
          if data is not None:
            data = (data['title'], data['url'], data['text']
            c.execute('INSERT INTO page VALUES(?, ?, ?)', data))
            print "indexed %s" % data['url']
            sys.stdout.flush()
    conn.commit()
    conn.close()

and PHP search script:

function mark($query, $str) {
    return preg_replace("%(" . $query . ")%i", '<mark>$1</mark>', $str);
}
if (isset($_GET['q'])) {
  $db = new PDO('sqlite:index.db');
  $stmt = $db->prepare('SELECT * FROM page WHERE content LIKE :var OR title LIKE :var');
  $wildcarded = '%'. $_GET['q'] .'%';
  $stmt->bindParam(':var', $wildcarded);
  $stmt->execute();
  $data = $stmt->fetchAll(PDO::FETCH_ASSOC);
  $query = str_replace("%", "\\%", preg_quote($_GET['q']));
  $re = "%(?>\S+\s*){0,10}(" . $query . ")\s*(?>\S+\s*){0,10}%i";
  if (count($data) == 0) {
    echo "<p>Brak wyników</p>";
  } else {
    foreach ($data as $row) {
      if (preg_match($re, $row['content'], $match)) {
        echo '<h3><a href="' . $row['url'] . '">' . mark($query, $row['title']) . '</a></h2>';
        $text = trim($match[0], " \t\n\r\0\x0B,.{}()-");
        echo '<p>' . mark($query, $text) . '</p>';
      }
    }
  }
}

In my code an in article I've wrapped this PHP script in the same layout as other pages by adding front matter to PHP file.

If you can't use PHP on your hosting you can try to use sql.js which is SQLite compiled to JS with Emscripten. Here is example how to use ajax to load a file.