Open-source parser code for Mediawiki markup [clos

I'm interested in selectively parsing Mediawiki XML markup to generate a customized HTML page that's some subset of the HTML produced by the actual PHP Mediawiki render engine.

I want it for BzReader, an offline Mediawiki compressed dump reader written in C#. So a C# parser would be ideal, but any good code would help.

Of course, if no one has done it before, I guess it's time to start a project maintaining a free and separate Mediawiki parser, based on Mediawiki's own parser, but less tightly integrated with Mediawiki itself.

So, does anyone know of any base I could begin with, that would be better than hacking from the Mediawiki PHP code?

标签： c# php open-source parsing mediawiki

3条回答

爷的心禁止访问

2楼-- · 2019-07-24 01:33

There is a list of parsers on http://www.mediawiki.org/wiki/Alternative_parsers, but a c# parser is not included there...

0人赞添加讨论(0) 举报

【Aperson】

3楼-- · 2019-07-24 01:35

I had some words to say about Mediawiki templates here. Interesting that there's a list of alternative parsers now, I'll have to investigate that.

0人赞添加讨论(0) 举报

我只想做你的唯一

4楼-- · 2019-07-24 01:45

Update
Bare in mind Screwturn doesn't stick to the Mediawiki syntax but uses its own variation which does vary a bit.

The Mediawiki syntax doesn't lend itself to LALR parser (or even LL*) as it has a lot of ambiguities in its definition, and also allows HTML. There's a discussion on that in this question, you're essentially stuck with writing your own parser and tokenizer rather than simply writing a BNF file for it and then using ANTLR/Gold/Irony.

Roadkill Wiki uses a Creole parser for its Mediawiki parsing, but with limited support.

Screwturn is released under the GPL license, and has a C# parser:

Screwturn license
Screwturn source download (unfortunately there's no web svn)

The class you are after is Core.Formatter which has lots of regexs to do its work:

public static class Formatter {

}

It's not the nicest looking code "but it works".

0人赞添加讨论(0) 举报

Open-source parser code for Mediawiki markup [clos

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间