Mediawiki: Safe way to prevent template edits from overloading the job queue?

1

https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Job_queue#Updating_links_tables_when_a_template_changes

How do I mitigate the flood of job queue entries when a major template is updated? Is it safe to delete the queue and skip updating the links tables?

The idea is that I should be able to edit templates as needed without flooding the queue. Most of the pages using the template would be archives and almost never accessed.

What I would prefer is a dynamic "update on request" approach with minimal overhead and storage footprint:

  • Skip the Mediawiki job queue whenever templates are updated. In particular, the amount of queue storage space must not scale with the number of pages using any template.
  • Each page (both regular and templates) will maintain a timestamp for its last edit or update.
  • When a page is requested, its templates are expanded to produce the end-result HTML for the user. As part of this process, all templates on the requested page have their timestamps checked against the page's timestamp.
  • If the page's timestamp is greater than (after) all templates it includes, the page isn't stale and therefore no update task is required. Otherwise, run a job queue update on the page and update its timestamp to the latest template timestamp encountered in the processing (display) step BEFORE sending the HTML content to the browser.

Advantages of this architecture instead of the existing naive approach:

  • Avoids linear storage cost of the job queue scaling with the number of pages using a given template.
  • Link table updates are only done on pages that are actively used and requested. No resources (time or storage) are expended on archived pages that no one views for the most part

Is there any way to do this?

Due diligence: I looked for documentation on the links tables and there is almost none.

Example Google searches:

user1258361

Posted 2019-12-20T16:34:04.410

Reputation: 625

Answers

0

Expanding templates (and then rendering the HTML) is expensive and doing it on every request would probably immediately crash your site. You could probably evict the page from the parser cache but not reparse it when a template on it has been edited. (Except, how do you know a template on it has been edited? By keeping the page <--> template relationship graph accurate by always reparsing pages on template edits.)

If you don't care about your pages being up to date with template changes, and with other functionality related to the templatelinks table (like Special:Whatlinkshere), skipping recursive link updates shouls not cause any other problems. I don't think there's a nice way to do it, but you can probably do it the hacky way by setting $linksupdate->mRecursive to false in the LinksUpdateConstructed hook.

Tgr

Posted 2019-12-20T16:34:04.410

Reputation: 2 363

Template expansion wouldn't be done on every request, only the first request for each page after a template update. – user1258361 – 2019-12-23T19:20:14.787

And how do you know you are on the first request after a template update? – Tgr – 2019-12-24T04:14:19.517

"Otherwise, run a job queue update on the page and update its timestamp to the latest template timestamp encountered in the processing (display) step BEFORE sending the HTML content to the browser." (see point #4 in the update-on-request algorithm) – user1258361 – 2019-12-24T19:42:05.427

But you don't know which templates are being used (possibly indirectly) on the page since you are not keeping the templatelinks table up to date anymore. – Tgr – 2019-12-24T23:57:16.397

Solution: Maintain templatelinks for templates only, so it's always possible to know the timestamps down the template inclusion tree without requiring updates for regular pages. – user1258361 – 2019-12-26T20:34:24.313

For one thing, MediaWiki does not have a concept of template pages internally. Any page can be transcluded as a template, the Template: namespace is just a convenience. But even disregarding that, you'd still have to queue links updates to keep the templatelinks table up to date; the number of jobs would be significantly reduced, but you'd pay with increased architectural complexity (since you'd have two fundamentally different update mechanisms). And rendering would still be way too slow to provide a reasonable reading experience, even if it would only affect a smaller fraction of users. – Tgr – 2019-12-27T00:17:15.580

On second thought, there's a more fundamental problem with it: templatelinks depends on what arguments a template is invoked with (templates can conditionally transclude other templates depending on what parameters they are called with, and that is fairly common in e.g. infoboxes and an important factor in keeping template rendering times manageable), so there is no way to "pre-generate" the part of the templatelinks table that belongs to a specific template transclusion. – Tgr – 2019-12-27T00:21:48.473

Correct. I've actually used that approach in an attempt to "hide" a template transclusion from the parser - make an invoked template an argument. The approach I'm outline will only apply to proper transclusion (anything before the opening {{ and the first vertical bar in a template call). – user1258361 – 2019-12-27T21:01:35.737

I'm not sure we are talking about the same thing. If template A consists of {{ #switch:{{{1}}} | x = {{ X | {{{2}} }}} | y = {{ Y | {{{2}}} }} }} and it is invoked on page P as {{ A | x | 123 }} the parser has no way of telling that P depends on X but not on Y, without parsing P. So when A is changed P needs to be reparsed, otherwise the template link graph might become inaccurate. – Tgr – 2019-12-28T10:02:43.547