Updating MathJax CDN

2020-03-26 00:47发布

问题:

MathJax is shutting down its CDN, as officially announced on the MathJax website and on StackExchange's Meta. The official announcement recommends several alternatives, including using an alternative CDN and self-hosting. What is the best way to accomplish this process for a site with a large number of references to the MathJax CDN?

On my tiny little website, for example, I can count the number of files that I have that point to the MathJax CDN using the following unix command:

grep -rl 'cdn\.mathjax\.org/mathjax/latest/MathJax\.js' . | grep '\.html$' | wc -l

I find that I have 382 such files. What's the best (safest) way to go automate the process of going through these files and modifying the source to point to the new CDN, rather than the old CDN?

Note: I can use Python to traverse the directory tree and do search and replace pretty easily. I'm interested in robust techniques. In fact, I've already post a self-answer that uses BeautifulSoup to find and replace MathJax src links. This is the way that I usually use MathJax, but there are other ways, of course.

回答1:

Here's the first approach I whipped up in Python; not sure how safe or inclusive it is. I first write a function called update_mathjax. This uses BeautifulSoup to find the <head>, and then searches for a <script> whose src attribute points to the MathJax CDN. If it finds it, it replaces src with an updated src and rewrites the file.

from bs4 import BeautifulSoup as bs
old_cdn = 'https://cdn.mathjax.org/mathjax/latest/MathJax.js'
new_cdn = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js'
def update_mathjax(filename):
    with open(filename, 'r+') as file_handle:
        file_contents = file_handle.read()
        soup = bs(file_contents, "html5lib")
        scripts = soup.find('head').find_all('script')
        for script in scripts:
            if script.attrs and 'src' in script.attrs:
                if script.attrs['src'][:49] == old_cdn:
                        q = script.attrs['src'][49:]
                        new_src = new_cdn + q
                        script.attrs['src'] = new_src
                        file_handle.seek(0)
                        file_handle.write(soup.prettify())
                        file_handle.truncate()
                        print('udated ' + filename)

With that, it's reasonably easy to traverse the directory tree and apply the function.

import os
for directory_name, subdirectory_list, filename_list in os.walk(os.getcwd()):
    for filename in filename_list:
        if filename[-5:] == ".html":
            update_mathjax(os.path.join(directory_name, filename))