Per a request, I created a new #Wikipedia database report that sorts featured articles using their "prose size": https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Featured_articles_by_size
Previously there was a report sorted them by size of the wikitext markup: https://en.wikipedia.org/wiki/Wikipedia:Featured_articles/By_length
That measure biased articles using online sources (longer reference markup) vs books (shorter).
(Yes, I demoted Taylor Swift from first to 254th 😢)
@legoktm poking at the very extreme cases, it looks like these are also picking up heavy use of text in notes or in tables - which I guess is unusual enough not to skew things too much by comparison to reference markup. Least "efficient" is https://en.m.wikipedia.org/wiki/Maya_stelae which definitely goes all-in on the table approach!
The tool sounds fun - will have a think about how it could be used.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!
@generalising ooh, very cool! I think most of the difference is that references and media are excluded, which take up a sizable amount of markup and are text shown to users, but not considered "prose".
I'm going to package this up as a web API on Toolforge later today to make it easier to use for non-Rust folks if you want to try it on other sets of articles.