MediaWiki API section names encoding

2019-05-28 20:04发布

For [[Test#?]], I get "Test#.3F" from action=parse bit of MediaWiki API. What is this encoding and how do I bring it to human readable format using Perl's CPAN?

URI::Encode works for the percent decoding, but not the section names one.

1条回答
Root(大扎)
2楼-- · 2019-05-28 20:10

It is UTF-8 percent-encoding, but with . instead of %, and spaces replaced with underscores; additionally, multiple consecutive whitespaces are collapsed, and : is preserved (not encoded into .3A).

The exact code which handles it is Parser::guessSectionNameFromWikiText(), but if you do not want to dig through a lot of code, check the much simpler implementation in an older MediaWiki version (compatible except for a few edge cases), in anchorencode():

str_replace( '%', '.', str_replace('+', '_', urlencode( $text ) ) );
查看更多
登录 后发表回答