Artificial intelligence-assisted academic writing: recommendations for ethical use

Download the Audio (Right-click, Save-As)

You have a paper due tomorrow. Or rather, today. It's 2am, and it's due at 9am. You're on your third cup of coffee, and staring at what feels like a very (very) rough draft. Your ideas are out there, but it's clunky, and you don't like how it flows. You haven't put in any citations or built up your bibliography. You haven't spell-checked or proof-read. And this essay is long, so you've got hours to go. You'll be lucky to get any sleep at all. You're staring at the screen, and trying not to panic.

It's at this point that you consider doing something you would never normally do: use AI. You fire up an LLM, paste in your rough draft, and ask it to make it 'more readable', fact-check everything, and to fix the typos and grammatical errors. Then you upload the sources you've been poring over, and ask the LLM to add structured citations and a bibliography, APA style. A few minutes later, it spits out your final version. You've got prose that's cleaner and more polished, everything's fully cited and double-checked, your structure is where you need it to be, all the "i"s are dotted, and the "t"s are crossed.

Here's the question:

Did you just cheat? Do you need to tell anyone? Is what you did even allowed? And if so, where exactly should we draw the line?

If you've found yourself in this situation, you're not alone. And you're definitely not wrong to be unsure of the answers to those questions. Why? Because the world of academia is still figuring out what the rules are.

Today we're looking at a paper that tries to help sort through this mess. These authors are healthcare researchers who found themselves in the exact same boat as everyone else: trying to figure out how to use these powerful new tools without compromising academic integrity.

What makes this paper interesting is that they didn't just write guidelines, they practiced what they preached. They actually used ChatGPT to help them write parts of the article, and then documented exactly how they used it and what they learned from the process. And yes, it's very meta. They used an LLM to help them write a paper about how to use LLMs to help you write papers. Inception.

Let's get into it.

They start by acknowledging that these kinds of tools (LLMs) are fundamentally different from the software that came before it. There's a long history of researchers using technology to conduct tasks more efficiently. That is status quo. From academic search engines that save you from having to dig through the stacks at the library, to statistical packages that make complex analyses faster, to spell-checkers that underline typos, to reference managers that organize citations, to transcription software that helps to convert old interviews, broadcasts, and speeches to text. The consensus is that this kind of stuff is fine. It's not cheating. But generative AI tools don't just help you complete tasks. They produce novel written content on their own, for you. And for many people, that seems like a wholly different thing. Those capabilities introduce a challenge that traditional notions of academic integrity weren't designed to handle. The line between "assistance" and "replacement" gets blurry, fast.

So how do we make sense of this? And is there a structured way to think about it? That's really the point of this paper: to introduce a framework (and guide-rails) for a subject that is amorphous, gray, and changing fast. To start, the authors dove into the literature, and compared what a number of different papers had to say on the subject. They came out with a framework that organizes AI usage, conceptually, into three ethical tiers, each with different levels of acceptability and safeguards.

Tier 1 represents the most ethically acceptable uses. This when you use AI primarily to restructure existing text, rather than generating new content. Grammar and spelling correction tools fall here. So this would cover things like Grammarly but also the AI-powered spell-check and grammar-check that are likely built into your word processor. Readability improvement also belongs in Tier 1, as long as authors ensure the edits preserve their original voice and thought process. In this tier the model should be refining expression, not changing meaning. Translation tools round out this tier, but with some caveats. In many cases, an AI translator provides a starting point, not a finished product.

Tier 2 represents "ethically contingent" uses that require the careful handling of auto-generated content. This includes generating outlines from existing content, summarizing material, improving clarity of existing text, or brainstorming ideas. The distinction is between asking AI to work with substantial input versus asking it to create content from scratch. That is: are you prompting the model with content and asking it to organize the existing concepts? Or are you asking it to create an outline from minimal input, and formulate the core ideas itself? The former leverages the LLMs organizational capabilities, the latter risks introducing concepts that aren't yours and may not be accurate.

Tier 3 are the "ethically suspect" uses. That is: having AI draft novel text without providing substantial original content. Why? Because this approach circumvents the intellectual engagement that's essential to good research. You're not thinking, you're having it think for you. So data-interpretation falls in this tier as well. Using AI to perform primary analysis short-circuits the deep engagement with data that leads to genuine understanding and insight. The authors argue that if you analyze data yourself first, you gain a fuller understanding that enable you to critique any subsequent AI interpretations later. Without that grounding, you're just handing your analysis over to the machine, and missing out on a core part of the experience. Literature reviews are similarly problematic. LLMs are notoriously unreliable for citing references, so at least for now they're just unsuitable for this task.

To operationalize this framework, they put together a four-question checklist for evaluating AI use.

Have I used generative AI in a fashion to ensure that the primary ideas, insights, interpretations, and critical analyses are my own? This question gets at intellectual ownership. AI should enhance your thinking, not replace it.
Have I used generative AI in a fashion to ensure that humans will maintain competency in core research and writing skills? This addresses the authors' concern about skill atrophy. Over-reliance on AI for fundamental tasks like ideation, writing, and analysis could prevent researchers, especially novices, from developing essential capabilities.
Have I double-checked to ensure that all content and references in my manuscript are accurate, reliable, and free of bias? This acknowledges that regardless of how AI was used, authors bear full responsibility for their manuscript's accuracy and integrity.
Have I disclosed exactly how generative AI tools were used in writing the manuscript, and which parts involved AI assistance? This ensures transparency and allows readers to evaluate the work appropriately.

The authors recommend that even if the first three questions are answered affirmatively, the fourth remains mandatory. Transparency (in their opinion) is not optional. And that goes even for ethically sound AI use.

But how does one disclose? Where and how should you do that? Well, while some journals (for example) allow 'acknowledgment' sections, the authors argue that the 'methods' section is the most transparent place to disclose. They suggest specifying which AI tools were used, for which tasks, how AI-generated output was handled (reviewed, edited, verified), and how content was incorporated into final manuscripts. In addition to that, for translation uses, they recommend having native speakers of the target language review final manuscripts.

They also address the question of scholarly development. They're concerned not just with immediate productivity but with professional growth and evolution of researchers in their field. This includes ensuring scholars can think deeply and creatively about research problems and resulting data. While novice scholars need these skills most, even experienced researchers benefit from adapting their capabilities to evolving landscapes. The concern is that over-dependence on LLMs (for ideation, primary content generation, and data interpretation) could create dependency that arrests development. This is why their framework emphasizes maintaining human intellectual contribution, above all else.

All that being said, this field is changing fast. Some issues (like hallucinations) might naturally get mitigated over time. Others (like bias) might get worse before they get better. So no framework or structure will last too long. The conditions on the ground are just changing too quickly. The authors are firm on one point that (they argue) cannot and should not ever change: At the end of the day, it is the authors of a paper who bear full responsibility for the manuscript originality, the content accuracy, and appropriate referencing. This is true whether you used AI or not. So the widespread adoption of these tools should not ever be used as an excuse to skirt blame for errors or to avoid culpability or liability. It's your paper, and it's your work; whether you wrote it or not.

If you want to dive deeper into their analysis, explore the examples they walked through, or get more details on their framework, I'd strongly recommend downloading the paper. It's a useful playbook and their recommendations and guidance go well beyond what we could cover here.