1
How do I mitigate the flood of job queue entries when a major template is updated? Is it safe to delete the queue and skip updating the links tables?
The idea is that I should be able to edit templates as needed without flooding the queue. Most of the pages using the template would be archives and almost never accessed.
What I would prefer is a dynamic "update on request" approach with minimal overhead and storage footprint:
- Skip the Mediawiki job queue whenever templates are updated. In particular, the amount of queue storage space must not scale with the number of pages using any template.
- Each page (both regular and templates) will maintain a timestamp for its last edit or update.
- When a page is requested, its templates are expanded to produce the end-result HTML for the user. As part of this process, all templates on the requested page have their timestamps checked against the page's timestamp.
- If the page's timestamp is greater than (after) all templates it includes, the page isn't stale and therefore no update task is required. Otherwise, run a job queue update on the page and update its timestamp to the latest template timestamp encountered in the processing (display) step BEFORE sending the HTML content to the browser.
Advantages of this architecture instead of the existing naive approach:
- Avoids linear storage cost of the job queue scaling with the number of pages using a given template.
- Link table updates are only done on pages that are actively used and requested. No resources (time or storage) are expended on archived pages that no one views for the most part
Is there any way to do this?
Due diligence: I looked for documentation on the links tables and there is almost none.
Example Google searches:
- mediawiki "link table" documentation
- mediawiki skip updating link table on template change
- https://www.mediawiki.org/wiki/Manual:Pagelinks_table provides only a technical table spec that doesn't help with the question
Template expansion wouldn't be done on every request, only the first request for each page after a template update. – user1258361 – 2019-12-23T19:20:14.787
And how do you know you are on the first request after a template update? – Tgr – 2019-12-24T04:14:19.517
"Otherwise, run a job queue update on the page and update its timestamp to the latest template timestamp encountered in the processing (display) step BEFORE sending the HTML content to the browser." (see point #4 in the update-on-request algorithm) – user1258361 – 2019-12-24T19:42:05.427
But you don't know which templates are being used (possibly indirectly) on the page since you are not keeping the templatelinks table up to date anymore. – Tgr – 2019-12-24T23:57:16.397
Solution: Maintain templatelinks for templates only, so it's always possible to know the timestamps down the template inclusion tree without requiring updates for regular pages. – user1258361 – 2019-12-26T20:34:24.313
For one thing, MediaWiki does not have a concept of template pages internally. Any page can be transcluded as a template, the Template: namespace is just a convenience. But even disregarding that, you'd still have to queue links updates to keep the templatelinks table up to date; the number of jobs would be significantly reduced, but you'd pay with increased architectural complexity (since you'd have two fundamentally different update mechanisms). And rendering would still be way too slow to provide a reasonable reading experience, even if it would only affect a smaller fraction of users. – Tgr – 2019-12-27T00:17:15.580
On second thought, there's a more fundamental problem with it: templatelinks depends on what arguments a template is invoked with (templates can conditionally transclude other templates depending on what parameters they are called with, and that is fairly common in e.g. infoboxes and an important factor in keeping template rendering times manageable), so there is no way to "pre-generate" the part of the templatelinks table that belongs to a specific template transclusion. – Tgr – 2019-12-27T00:21:48.473
Correct. I've actually used that approach in an attempt to "hide" a template transclusion from the parser - make an invoked template an argument. The approach I'm outline will only apply to proper transclusion (anything before the opening {{ and the first vertical bar in a template call). – user1258361 – 2019-12-27T21:01:35.737
I'm not sure we are talking about the same thing. If template
A
consists of{{ #switch:{{{1}}} | x = {{ X | {{{2}} }}} | y = {{ Y | {{{2}}} }} }}
and it is invoked on pageP
as{{ A | x | 123 }}
the parser has no way of telling thatP
depends onX
but not onY
, without parsingP
. So whenA
is changedP
needs to be reparsed, otherwise the template link graph might become inaccurate. – Tgr – 2019-12-28T10:02:43.547