Show newer

Since I seem to have accidentally reminded a lot of people about 'outwith' this week scottishpoetrylibrary.org.uk/p

@michaelmeckler I wonder if part of the issue is that these words are not "wrong" but they are (in context) tonally "awkward" - a thing that is harder to spot and edit out for a second language speaker, if they're not looking for it?

(eg in my case I could look at a auto-translated French text and say "yeah, that sounds like what I was trying to get across", but probably not "hmm, that sounds subtly off")

Good news: the results of an intensive international project have let us delay having to implement technically complex "negative leap seconds"...

Bad news: ...that project is humanity melting the icecaps

nature.com/articles/d41586-024

@Wikisteff believe it or not, these extreme cases were what made me think about full-text digging originally. Except of course there's only a handful of these - peer review is pretty good at weeding the most blatant stuff out (and I'd assume a lot of it is editorially desk-rejected even before that step). So it was amusing but all pretty low-level.

Then the adjective list came out and I thought, hey, this might actually show up at scale! :-)

huh, this is neat! someone did an AI-detector-tool based analysis looking at preprint platforms, and released it on exactly the same day as mine. Shows evidence for differential effects by discipline & country. biorxiv.org/content/10.1101/20

Show thread

@Wikisteff similarly I think it's plausible the 2024 data is weird in interesting ways - eg over-representing certain types of paper in certain journals because they publish faster- and that might complicate analysis of it. Which is not to say the 2024 figures *aren't* going to be terrifyingly high whatever corrections we apply!

@Wikisteff this is interesting, thankyou!

What I don't have a figure for is "what percentage of papers in any given year have full text", & it may not be constant over time. This was one of the reasons for including control words - they proxy for it and lets us know what a reasonable bound for year to year change might be (I got ~5%). I'm not sure if that complicates your analysis?

@richlitt @rmounce I would agree - I think I originally thought of that as covered by the ref to "copyediting", but V2 will try and tease out all those different (legitimate!) use cases a bit.

It's challenging though - since there's no disclosure of what/how the tools were used, we don't really have any way to tell what's causing these markers without really digging into individual cases.

@joeroe @ArchaeoIain that's my best guess as well (as someone who doesn't understand the technicalities involved very well). And some of it may just be a slight mismatch between the tone of the text & the intended venue - maybe when using an LLM for copyediting papers we should ask "and make it 25% less enthusiastic"...

@mob I wonder a lot about the grammarly thing, but also stuff like "AI assisted search tools". At what point does it cross the line between something we're happy for students to use, and something that's too much? Really hard to say, and very blurry sometimes.

@tdietterich this one is really interesting, thankyou! I had wondered if it would be visibly showing up in submitted material.

"Press release" is a good way of describing it - I think the first time I saw it it made me think of people writing travel guides, everywhere "vibrant" and with "stunning natural beauty"...

@sbszine yeah, I think it's very variable between document types - I found that some of the terms that were significant for peer reviews had little to no difference for papers. I guess it's down to the nuance of what terms are expected (by a human) in what contexts!

@sbszine interestingly, not so much in this context! I think this is because it's *such* a common word in academic writing anyway (~1.5m papers/year use it)

@Tom_Drummond yes, I think that's going to account for a lot of it - occasional horror stories aside, I wouldn't expect many pure-LLM papers are escaping into the wild. It's the middling grey area beyond "just polishing" that worries me...

@franco_vazza there's always a baseline of usage, but that pair together has suddenly become a lot more common.

I don't think this approach is great for detecting LLM involvement in any individual paper (there are *much* more sophisticated tools for that) but it works OK for estimation at a much broader scale.

@Jey_snow thanks - yes, I think that's very likely! Dimensions doesn't let me easily test for author affiliation location, but I think you'd be safe placing a small bet on it...

@unchartedworlds thanks - and fair point! I normally try to avoid it, but keep falling into the habit of anthropomorphising them to about the same level as recalcitrant lifts...

@Wikisteff No, these were all done by hand so I didn't want to spend a full week on doing all 100! Might be practical to test them all using the Dimensions API, though?

@Wikisteff Credit where it's due - I took the sample list from an earlier study! arxiv.org/abs/2403.07183 (p 15, 16) I think this is a bit of an idiosyncratic list due to the peer-review context (hence it's all adjectives/adverbs, almost all positive) and there will definitely be other distinctive terms, some unpredictable - it would be quite interesting to do some larger analysis to try and find them.

@ncdominie or we could just accept the inevitable triumph of the Leal Leid Makars?

Show older
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!