How to export text from all pages of a MediaWiki?

I have a MediaWiki running which represents a dictionary of German terms and their translation to a local dialect. Each page holds one term, its translation and a number of additional infos.

Now, for a printable version of the dictionary, I need a full export of all terms and their translation. Since this is an extract of a page's content, I guess I need a complete export of all pages in their newest version in a parsable format, e.g. xml or csv.

Has anyone done that or can point me to a tool? I should mention, that I don't have full access to the server, e.g. no command line, but I am able to add MediaWiki extensions or access the MySQL database.

标签： export mediawiki

6条回答

beautiful°

2楼-- · 2019-02-06 07:29

You can export the page content directly from the database. It will be the raw wiki markup, as when using Special:Export. But it will be easier to script the export, and you don't need to make sure all your pages are in some special category.

Here is an example:

SELECT page_title, page_touched, old_text
FROM revision,page,text
WHERE revision.rev_id=page.page_latest
AND text.old_id=revision.rev_text_id;

If your wiki uses Postgresql, the table "text" is named "pagecontent", and you may need to specify the schema. In that case, the same query would be:

SET search_path TO mediawiki,public;

SELECT page_title, page_touched, old_text 
FROM revision,page,pagecontent
WHERE revision.rev_id=page.page_latest
AND pagecontent.old_id=revision.rev_text_id;

0人赞添加讨论(0) 举报

叼着烟拽天下

3楼-- · 2019-02-06 07:34

Export

cd maintenance
php5 ./dumpBackup.php --current > /path/wiki_dump.xml

Import

cd maintenance
php5 ./importDump.php < /path/wiki_dump.xml

0人赞添加讨论(0) 举报

劳资没心，怎么记你

4楼-- · 2019-02-06 07:38

I'm not completely satisfied with the solution, but I ended up specifying a common category for all pages and then I can add this category and all of the containing page names in the Special:Export box. It seems to work, allthough I'm not sure if it will still work when I reach a few thousand pages.

0人赞添加讨论(0) 举报

女痞

5楼-- · 2019-02-06 07:40

You can use the special page, Special:Export to export to XML; here is Wikipedia's version.

You might also consider Extension:Collection if you want it eventually human readable (e.g. PDF) form.

0人赞添加讨论(0) 举报

走好不送

6楼-- · 2019-02-06 07:45

This worked very well for me. Notice I redirected the output to the file backup.xml. From a Windows Command Processor (CMD.exe) prompt:

cd \PATH_TO_YOUR_WIKI_INSTALLATION\maintenance
\PATH_OF_PHP.EXE\php dumpBackup.php --full > backup.xml

0人赞添加讨论(0) 举报

▲ chillily

7楼-- · 2019-02-06 07:48

It looks less than simple. http://meta.wikimedia.org/wiki/Help:Export might help, but probably not.

If the pages are all structured in the same way, you might be able to write a web scraper with something like Scrapy

0人赞添加讨论(0) 举报

How to export text from all pages of a MediaWiki?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间