A slight modification of the great answer provided by Tgr: for some reason, MySQL on our MediaWiki server (and its clone to be wiped) refused the CASE...WHEN...THEN
constructs, so I did the equivalent replacements in shell:
:; mysql my_wiki -s -N -e \
"SELECT CONCAT(page_namespace, ':', page_title) FROM page;" | \
sed -e 's,^0:,:,' -e 's,^1:,Talk:,' \
-e 's,^2:,User:,' -e 's,^3:,User talk:,' \
-e 's,^6:,File:,' -e 's,^7:,File talk:,' \
-e 's,^[0-9][0-9]*:.*$,,' \
| egrep -v '^$' > /DUMP/wikipage-list-utf8.txt
If you need to revise the listing, and your terminal is not UTF-8 (as mine was not), you can iconv
the text file to the encoding of the terminal, for example:
:; iconv -f utf8 -t cp1251 < /DUMP/wikipage-list-utf8.txt
Also, for what it's worth, the namespaces defined in that particular Wikimedia installation (according to the api.php query) were:
<ns id="-2" case="first-letter" canonical="Media" xml:space="preserve">Медиа</ns>
<ns id="-1" case="first-letter" canonical="Special" xml:space="preserve">Служебная</ns>
<ns id="0" case="first-letter" content="" xml:space="preserve" />
<ns id="1" case="first-letter" subpages="" canonical="Talk" xml:space="preserve">Обсуждение</ns>
<ns id="2" case="first-letter" subpages="" canonical="User" xml:space="preserve">Участник</ns>
<ns id="3" case="first-letter" subpages="" canonical="User talk" xml:space="preserve">Обсуждение участника</ns>
<ns id="4" case="first-letter" subpages="" canonical="Project" xml:space="preserve">Simsla</ns>
<ns id="5" case="first-letter" subpages="" canonical="Project talk" xml:space="preserve">Обсуждение Simsla</ns>
<ns id="6" case="first-letter" canonical="File" xml:space="preserve">Файл</ns>
<ns id="7" case="first-letter" subpages="" canonical="File talk" xml:space="preserve">Обсуждение файла</ns>
<ns id="8" case="first-letter" subpages="" canonical="MediaWiki" xml:space="preserve">MediaWiki</ns>
<ns id="9" case="first-letter" subpages="" canonical="MediaWiki talk" xml:space="preserve">Обсуждение MediaWiki</ns>
<ns id="10" case="first-letter" canonical="Template" xml:space="preserve">Шаблон</ns>
<ns id="11" case="first-letter" subpages="" canonical="Template talk" xml:space="preserve">Обсуждение шаблона</ns>
<ns id="12" case="first-letter" subpages="" canonical="Help" xml:space="preserve">Справка</ns>
<ns id="13" case="first-letter" subpages="" canonical="Help talk" xml:space="preserve">Обсуждение справки</ns>
<ns id="14" case="first-letter" canonical="Category" xml:space="preserve">Категория</ns>
<ns id="15" case="first-letter" subpages="" canonical="Category talk" xml:space="preserve">Обсуждение категории</ns>
Sorry the tags are localized, but the "canonical" names are what we are after and they are descriptive enough.
Beside the items I had converted with sed
above, there were also entries in namespaces 8, 10 and 12 which seemed more like "system data" rather than user content, and since the intent was to reuse the existing installation's setup only without the content - so I did not pass them to the delete-script, as in:
:; php deleteBatch.php /DUMP/wikipage-list-utf8.txt
Thanks to Tgr for the excellently detailed and linked post - it was all I needed to get started :)
As far as tools provided with MediaWiki go, this was "proper" deletion of content, relocation in the database so it can be undeleted. For real wiping this has to be cleaned on the database level. In our case we did not care much about the content (we cared that it should be gone from the clone, only the setup was needed to make a new wiki site), so below is roughly what I did, according to MySQL history log (I reviewed the SQL dump of the database to see what tables had most size and content, and did not seem to hold system data like help or localization text snippets):
DELETE FROM text WHERE old_id IN (SELECT ar_text_id FROM archive);
DELETE FROM files WHERE old_id IN (SELECT fa_id FROM filearchive);
delete from logging;
delete from archive;
delete from filearchive;
delete from recentchanges;
delete from searchindex;
delete from uploadstash;
delete from watchlist;
delete from ipblocks;
delete from objectcache;
update user set user_password = 'cV%@4dF.' where user_id not in (1,2,3);
I also wiped the attached files themselves with:
:; rm -f `find mediawiki/images -type f`
Of course, do not blindly copy my examples unless you intend to wipe all of the content and not just some subset, and even then better make a backup of both the database and the directory before doing anything irreversible ;)