How to properly delete all page content in MediaWiki?

3

I found this page that explains how to delete all pages in MediaWiki but I'm not at all convinced that running DELETE FROM page; is a good idea. The MediaWiki schema is pretty complex and just deleting the page table probably leaves a lot of unresolved dependencies in the database that are not going to be resolved.

Nevertheless I want to thoroughly delete all pages on a wiki (actually a copy of a wiki). Is there a somewhat straightforward way to do this?

kqw

Posted 2015-06-22T09:51:58.370

Reputation: 1 781

Answers

4

A slight modification of the great answer provided by Tgr: for some reason, MySQL on our MediaWiki server (and its clone to be wiped) refused the CASE...WHEN...THEN constructs, so I did the equivalent replacements in shell:

:; mysql my_wiki -s -N -e \
     "SELECT CONCAT(page_namespace, ':', page_title) FROM page;" | \
     sed -e 's,^0:,:,' -e 's,^1:,Talk:,' \
         -e 's,^2:,User:,' -e 's,^3:,User talk:,' \
         -e 's,^6:,File:,' -e 's,^7:,File talk:,' \
         -e 's,^[0-9][0-9]*:.*$,,' \
     | egrep -v '^$' > /DUMP/wikipage-list-utf8.txt

If you need to revise the listing, and your terminal is not UTF-8 (as mine was not), you can iconv the text file to the encoding of the terminal, for example:

:; iconv -f utf8 -t cp1251 < /DUMP/wikipage-list-utf8.txt

Also, for what it's worth, the namespaces defined in that particular Wikimedia installation (according to the api.php query) were:

<ns id="-2" case="first-letter" canonical="Media" xml:space="preserve">Медиа</ns>
<ns id="-1" case="first-letter" canonical="Special" xml:space="preserve">Служебная</ns>
<ns id="0" case="first-letter" content="" xml:space="preserve" />
<ns id="1" case="first-letter" subpages="" canonical="Talk" xml:space="preserve">Обсуждение</ns>
<ns id="2" case="first-letter" subpages="" canonical="User" xml:space="preserve">Участник</ns>
<ns id="3" case="first-letter" subpages="" canonical="User talk" xml:space="preserve">Обсуждение участника</ns>
<ns id="4" case="first-letter" subpages="" canonical="Project" xml:space="preserve">Simsla</ns>
<ns id="5" case="first-letter" subpages="" canonical="Project talk" xml:space="preserve">Обсуждение Simsla</ns>
<ns id="6" case="first-letter" canonical="File" xml:space="preserve">Файл</ns>
<ns id="7" case="first-letter" subpages="" canonical="File talk" xml:space="preserve">Обсуждение файла</ns>
<ns id="8" case="first-letter" subpages="" canonical="MediaWiki" xml:space="preserve">MediaWiki</ns>
<ns id="9" case="first-letter" subpages="" canonical="MediaWiki talk" xml:space="preserve">Обсуждение MediaWiki</ns>
<ns id="10" case="first-letter" canonical="Template" xml:space="preserve">Шаблон</ns>
<ns id="11" case="first-letter" subpages="" canonical="Template talk" xml:space="preserve">Обсуждение шаблона</ns>
<ns id="12" case="first-letter" subpages="" canonical="Help" xml:space="preserve">Справка</ns>
<ns id="13" case="first-letter" subpages="" canonical="Help talk" xml:space="preserve">Обсуждение справки</ns>
<ns id="14" case="first-letter" canonical="Category" xml:space="preserve">Категория</ns>
<ns id="15" case="first-letter" subpages="" canonical="Category talk" xml:space="preserve">Обсуждение категории</ns>

Sorry the tags are localized, but the "canonical" names are what we are after and they are descriptive enough.

Beside the items I had converted with sed above, there were also entries in namespaces 8, 10 and 12 which seemed more like "system data" rather than user content, and since the intent was to reuse the existing installation's setup only without the content - so I did not pass them to the delete-script, as in:

:; php deleteBatch.php /DUMP/wikipage-list-utf8.txt

Thanks to Tgr for the excellently detailed and linked post - it was all I needed to get started :)

As far as tools provided with MediaWiki go, this was "proper" deletion of content, relocation in the database so it can be undeleted. For real wiping this has to be cleaned on the database level. In our case we did not care much about the content (we cared that it should be gone from the clone, only the setup was needed to make a new wiki site), so below is roughly what I did, according to MySQL history log (I reviewed the SQL dump of the database to see what tables had most size and content, and did not seem to hold system data like help or localization text snippets):

DELETE FROM text WHERE old_id IN (SELECT ar_text_id FROM archive);
DELETE FROM files WHERE old_id IN (SELECT fa_id FROM filearchive);
delete from logging;
delete from archive;
delete from filearchive;
delete from recentchanges;
delete from searchindex;
delete from uploadstash;
delete from watchlist;
delete from ipblocks;
delete from objectcache;
update user set user_password = 'cV%@4dF.' where user_id not in (1,2,3);

I also wiped the attached files themselves with:

:; rm -f `find mediawiki/images -type f`

Of course, do not blindly copy my examples unless you intend to wipe all of the content and not just some subset, and even then better make a backup of both the database and the directory before doing anything irreversible ;)

Jim Klimov

Posted 2015-06-22T09:51:58.370

Reputation: 56

3

Do you really mean blank (ie. edit it to replace the text with an empty string, leaving the old text and all previous versions publicly visible in the page history) or delete?

The easiest way to delete everything is probably getting a list of all your pages (use api.php?action=query&meta=siteinfo&siprop=namespaces to get a list of all namespaces and their canonical names, then get the list of all pages with SELECT CONCAT(CASE page_namespace WHEN 0 THEN '' WHEN 1 THEN 'Talk' ..., ':', page_title) FROM page, and pass the file containing that page list to deleteBatch.php.

For blanking, you can use the same list with edit.php and some bash scripting.

There are also various tools like pywikibot which can do the deletion for you if you don't have shell access.

Tgr

Posted 2015-06-22T09:51:58.370

Reputation: 2 363