Does Turnitin Detect Proofademic? I Ran the Same Chapter Through Both Before Filing
My dissertation filing deadline is in six weeks. Chapter 3 is the one keeping me up at night, not because the research is weak, but because it’s a literature synthesis. And honestly, literature synthesis chapters are basically designed to trigger AI detection. Neutral language, hedged claims, structured citations, parallel sentence structure throughout. It reads like AI output even when it isn’t.
So last month I did something I probably should have done earlier: I ran a 1,400-word excerpt from Chapter 3 through both Proofademic and Turnitin’s AI detection feature, back to back, before touching a single word. I wanted to know what I was walking into before my committee got there first.
Here’s what I found, and what it means if you’re in a similar situation.
The direct answer, before I bury it in context
Turnitin doesn’t detect Proofademic specifically. It detects patterns associated with AI-generated writing. Proofademic is a detector, not a writing tool, so the question is really: will Turnitin flag content that Proofademic cleared? On my 1,400-word excerpt, Proofademic flagged 31% of the text as likely AI-generated before I made any edits. Turnitin came back at 29%. After I revised the flagged sentences, Proofademic dropped to 6% and Turnitin landed at 8%. They don’t give identical scores, but they’re closer than you’d expect.
Why I tested both instead of just one
My university uses Turnitin. That’s what’s going to scan my dissertation when I file. So why bother with Proofademic at all?
Because Turnitin gives you a number and nothing else.
After the scan, all I had was “29% of this document may contain AI-generated writing.” I didn’t know which sentences. I didn’t know whether the flagged percentage was spread across the chapter or concentrated in one section. I had no way to fix it strategically because I didn’t know what was wrong.
Proofademic gave me something different. It highlighted each sentence that triggered detection and assigned an individual probability score. The three sentences that were flagging highest were all in my methodology summary paragraph, a dense block of hedged academic language that I’d written at 1am and, looking back, had basically summarized four sources using the flattest possible academic register.
That’s all I needed. Not a score. A location.
What the scores looked like before and after
To be precise about what I actually ran: I took a continuous 1,400-word block from Chapter 3, the section where I synthesize the existing literature on metacognitive strategy use. I didn’t AI-generate any of it, but I had used ChatGPT to help me outline the structure before I started writing.
Before edits:
Proofademic: 31% AI probability, with 4 specific sentences flagged as high-risk (above 80% AI probability per sentence)
Turnitin: 29% AI-written
The two tools agreed on roughly 60% of what they flagged. The other 40% was split: some sentences Turnitin flagged that Proofademic didn’t flag, and a couple Proofademic flagged that Turnitin apparently didn’t care about. That disagreement rate matters, because it tells you neither tool is definitive.
After I revised the four high-risk sentences that Proofademic identified, I also rewrote a few transitions that felt flat. I used Walter Writes with the academic tone setting to restructure those sentences without changing my citations or argument. Then I reran both tools.
After edits:
Proofademic: 6% AI probability, no sentences above the high-risk threshold
Turnitin: 8% AI-written
That drop from 29-31% to 6-8% came from editing four sentences and a handful of transitions. Not rewriting the chapter. Not overhauling the argument. Fixing the specific lines that were triggering the pattern recognition.
How AI detection actually works (the part people skip)
Before you can understand why two tools disagree, you need a basic model of how AI detection works. Both Turnitin and Proofademic are looking for statistical patterns that differ between human-written and AI-generated text.
Two of the main signals are perplexity and burstiness.
Perplexity is a measure of how predictable each word is given the words before it. AI-generated text tends to have low perplexity because language models are trained to select probable next tokens. Human writing is less predictable. We make stranger word choices, shift register unexpectedly, and write sentences that are grammatically coherent but stylistically weird.
Burstiness is about variation in sentence length and structure. AI output tends toward consistent sentence length and parallel construction. Human writing is messier. Long sentences followed by short ones. Fragments for emphasis. That kind of thing.
The reason literature synthesis chapters get flagged more than other academic genres is that the formal register flattens both of these signals. You hedge. You cite. You use passive constructions. The writing sounds like AI because it’s following a genre convention that happens to overlap with AI output patterns.
For what it’s worth, that’s also why Proofademic is calibrated specifically for academic writing. Its model is trained on academic text, which means it’s better at distinguishing between “this reads formal because it’s formal academic writing” and “this reads like AI output.” General-purpose detectors don’t make that distinction as well.
The disagreement between tools is the most important finding
I want to stay on this for a second, because I think the takeaway that gets missed is this: the tools disagreed on 40% of the flagged content.
That’s not a small number. It means that if you run your chapter through Turnitin and get a 25% score, fixing everything Turnitin implicitly flags (which you can only infer, since it doesn’t tell you what it flagged) might still leave you with a 10% score on Proofademic, because the two tools aren’t identifying the same sentences as problems.
The practical implication: running only one tool before filing is incomplete preparation. You don’t know what the other one will catch.
My workflow now is: run Proofademic first for sentence-level diagnosis, revise based on what it shows me, then run Turnitin as the final gate check before submission. Proofademic functions as the editor. Turnitin is the gatekeeper. You want to satisfy the editor before you face the gatekeeper.
What this means for dissertation chapters specifically
Not all chapters carry equal risk. In my experience running detection across all five chapters, the risk ranking goes something like this:
Chapter 3 (literature synthesis) is the highest-risk by a significant margin. The genre is the problem, not the writing. Citation-dense synthesis of existing research produces the exact statistical signature that detectors flag.
Introductions and conclusions are moderate risk, especially if they’re written to be clear and accessible rather than technical. The plain language and forward momentum can read as AI.
Methods chapters are generally lower risk. The specificity of procedural description, the jargon, the reference to your actual instruments and protocols, all of that introduces enough unpredictability to lower the AI probability score.
Results chapters are the lowest risk in most cases. Numbers, tables, and technical interpretation are hard for AI detection models to flag confidently.
If y’all are behind on dissertation writing and running out of time, at least run your lit review chapter through detection before you submit anything. That’s where the flags are going to hit.
The false positive problem is real and underdiscussed
Here’s the part that frustrates me about how institutions are handling this: false positive rates for academic writing aren’t zero, and no one talks about it clearly.
I wrote every word of Chapter 3. I used AI to help structure my outline, which is explicitly within my institution’s policy. And I still got a 29% score on first pass.
If I hadn’t run detection in advance and revised, I’d have submitted a chapter with a nearly one-third AI-written flag on it. And I’d have had to defend that to my committee without understanding where the score was coming from, because Turnitin doesn’t tell you.
A 2024 Stanford study on AI detection accuracy found that false positive rates for human-written academic text can reach 12-17% on standard detection tools, even without any AI assistance in the writing process. That’s not a rounding error. That’s a meaningful risk to graduate students who are writing in genres that pattern-match to AI output.
Proofademic’s sentence-level breakdown at least gives you something to work with. You can point to specific sentences and say: here is the language I used, here is why it’s flagging, and here is what I changed. That’s a defensible paper trail. A single percentage from Turnitin isn’t.
The pre-filing checklist I’m now using
After this test, here’s what I’m doing before I file each chapter:
Run Proofademic on the full chapter and note every sentence above 70% AI probability. Revise those sentences first, focusing on sentence structure and register rather than the content itself. Don’t change your argument; change the phrasing patterns.
Run Walter Writes on the transitions and summary sentences specifically, with the academic tone setting, to vary the sentence rhythm without losing the formal register.
Rerun Proofademic to confirm the scores dropped. Then run Turnitin as a final check.
If Turnitin still shows above 20% after Proofademic clears it, look at the sections Proofademic marked as 40-70% probability. Those are the ones Proofademic didn’t flag as high-risk but Turnitin might be catching with different weights.
That workflow got me from 29% to 8% on Turnitin without touching my citations, my sources, or my argument. The writing that changed was stylistic, not substantive.
FAQ
Does Turnitin detect all AI writing tools?
Turnitin’s AI detection isn’t tool-specific. It detects patterns associated with AI-generated text, not the output of any particular model. It doesn’t know you used ChatGPT vs. Claude vs. Gemini. It identifies statistical properties of the text.
Can Turnitin detect AI writing after it’s been edited?
Yes, depending on how much editing was done. Light editing that changes word choice without changing sentence structure often isn’t enough to lower the AI probability significantly. The patterns Turnitin and other detectors flag are structural, not lexical. Substantive revision of sentence structure, rhythm, and phrasing is more effective.
Is Proofademic more accurate than Turnitin for academic writing?
Proofademic is calibrated specifically for academic text, which reduces false positive rates on formal, citation-heavy writing. Whether it’s “more accurate” depends on what you mean: Proofademic is more precise in identifying which sentences to revise, while Turnitin provides the institutional benchmark your committee and filing office will actually see.
Should I run AI detection on my dissertation before submitting?
Yes, especially on literature review and synthesis chapters. Running detection before filing gives you time to revise and removes the risk of a score you can’t explain or defend.
For what it’s worth, the tool that actually helped me understand what was happening in my chapter was Proofademic. Not because it gave me a lower score than Turnitin, but because it showed me exactly where the problems were. If you’re ABD and chapter 3 is the one keeping you up at night, start there. Run the scan. Know what you’re dealing with before your committee does.
Hit reply if you’ve been through this. I’m curious what detection scores others are seeing on lit review chapters specifically.


