
I gave an AI a qualitative analysis that had taken me two months. Here's what happened (and what I'm still processing)
📑 View contents
It started out of curiosity. I wasn’t trying to prove anything, I didn’t want to validate AI or defend qualitative work, I had no hypothesis. I had just seen a Storytelling with Data challenge a few days earlier and thought: what if I ran that 2024 study through a Gemini GEM, the one with 21 interviews that took me two months to analyze by hand, with printed sheets I carried everywhere and codes I would first build in my head before moving them to Excel. What could go wrong, and I’d learn a lot.
Well, I learned a lot. Just not what I expected to learn.
The experiment, told dry
In 2024 I ran a study for a client in an industry I can’t name here, but I can say the size of the work: 21 in-depth interviews, plus a complementary survey I left out of this exercise. At the time I didn’t have paid software for qualitative analysis, so the workflow was manual from start to finish. I recorded the sessions, listened to each one again, and transcribed everything in Word. Then I coded each interview by hand, moved the codes into Excel noting the source of each code by interviewee, built patterns, and from there produced the final presentation.
The analysis combined two logics, as I usually work: a deductive one, with pre-codes defined from the study’s objectives (closer to how I learned to work in UX), and an inductive one closer to grounded theory (the one I trained in through psychology and my master’s, and the one I feel most comfortable with).
Without counting the interview execution itself — another full job of coordination and facilitation — the analysis alone took me between 60 and 80 hours. I don’t remember the exact number, but it was two months of work, several weekends, and a lot of coffee.
The experiment a few days ago: build a GEM in Gemini, connect it to the transcripts, ask for the same analysis. It took me 5 hours to set up. Three more hours for it to return the results structured in tables, distinguishing which interviewee said what, the way it mattered to me in 2024.
The deductive codes it returned were practically the same as mine. I expected that.
The inductive codes, too. I didn’t expect that, and it’s what shook me most.
Before I go on, something important about the input.
What the AI analyzed wasn’t just any raw material. It was 21 interviews done with a specific purpose, with people recruited because they met criteria that mattered for what the client needed to understand. And those 21 were preceded by work the GEM could never have done.
There was a kickoff with the client to understand what hurt, what hypotheses they had, what decisions they wanted to make later with the findings. From that kickoff came the recruitment criteria: who we were looking for, why those people and not others. Then the interview phase was planned, the script was designed, and that script was iterated with the team. The questions weren’t generic — they targeted specific frictions we had already detected in previous conversations. They aimed at real pain, not hypothetical pain.
And on the road to those 21 interviews, another 16 were scheduled that never happened. People who confirmed and didn’t show. I was there, waiting.
I’m saying all this because when the AI gave me back codes almost identical to mine, it was easy to think “how impressive, it replicated my analysis”. And yes — but it replicated an analysis done on material that was good. And the material was good because there were human decisions before the first transcript even existed.
The first discomfort
The inductive codes — the ones that are supposed to “emerge” from the data, the ones you don’t go looking for because they come from the implicit — appeared almost identical to the ones I built in 2024. With slightly different emphasis, yes. But there they were.
And reading them, I felt something I didn’t anticipate: a strange detachment from the findings.
In 2024 those codes were mine. I knew where each one came from, who had said it, in what context, why I had built that one and not another, what priority I assigned to it when placing it in the report. When a code was cited by a specific interviewee, I remembered the pause before they said it, or the face, or the sentence that followed.
Reading the results the AI handed me, I recognized them but didn’t inhabit them. It was like reading the summary of a book I had written. Everything was there. And at the same time nothing was there — that texture of “I was there, I know why this code weighs more than the other one” wasn’t transferring.
I don’t know if other people who do qualitative analysis experience the same thing. I sat staring at the horizon for a good while.
The first learning I can actually name
Here comes what I’ve been able to integrate, even though not much time has passed.
The 5 hours it took me to set up the GEM weren’t 5 hours of “AI is fast”. They were 5 hours of me operationalizing my own qualitative analysis process.
For the GEM to work I had to think, step by step, what I do first when I sit down in front of a transcript. What I do next. How I decide what counts as a code and what doesn’t. When I move from codes to patterns. How I distinguish a deductive code from an inductive one in practice, not in theory. When I stop to re-read instead of moving forward. What kind of deliverable I produce and for whom.
I had been doing all of that for years. But I had never written it down. It was implicit craft, decisions automated by experience, an internal logic that worked but that I had never looked at from the outside.
The GEM forced me to look at it from the outside. Systematically. And that, for someone coming from clinical psychology where you yourself are the instrument, is an uncomfortable and revealing move at the same time.
That was clear learning number one: AI didn’t save me 65 hours of work. It forced me to make explicit a kind of work I had been doing implicitly for years. And now that it’s explicit, I see it differently. Not better or worse, just differently.
What the AI didn’t do
Here comes the second learning, and it’s the one that matters most to me for working with clients going forward.
When I delivered the results in 2024, I didn’t make a single report. I made three readings inside the same presentation. One for the client’s stakeholders, with emphasis on the strategic and on how the findings affected business decisions. Another for the product team, focused on concrete iteration opportunities. And another for engineering, with emphasis on findings that touched technical architecture and feasibility. Same data, three readings, three levels of abstraction.
The GEM, by default, gave me a single result, flat, oriented mostly toward product. Not because it’s limited — because I never gave it that context. I never told it who was going to read this, what decisions would be made from the findings, what level of abstraction worked for each audience. It also didn’t see the product screens, or the flows, or the team’s context. And that was information I did have and that I used without thinking when I built the presentation.
There’s something important there: qualitative analysis is only one part of the work. The other part, just as heavy, is the situated interpretation — reading the findings in relation to who will use them, for what, in what context, with what level of decision-making power. That’s what a B2B client buys when they buy research. Not the coding, but the reading of the data.
AI can assist with the coding. The situated interpretation, as far as I’ve seen, remains human work. And not because AI is “limited”, but because the client’s context, the internal dynamics, the teams, the priorities in tension — that’s information the AI simply doesn’t have. I didn’t have it complete either, but I had a lot more.

What I’m still processing
That’s the sharp learnings. What follows is what still unsettles me and I haven’t fully integrated yet.
The detachment. I still don’t know what to do with the feeling of reading my own findings without inhabiting them. It hadn’t happened to me before. I have a hypothesis: that when you do the analysis by hand, the process itself is part of the knowledge. It’s not just what you found, but how you found it, what you thought when reading a certain quote, what you discarded and why. When AI does that process, the results arrive clean but without the memory of the journey. And that memory of the journey was, without my knowing it, an important part of how I later defended the findings in a meeting.
I don’t know if this is a problem to solve or a new feature I’m going to adapt to. For now I name it and leave it as a question.
Bateson coming back. These days I pulled out an old photocopy from a Bateson book. Bateson was central to the systemic therapy training I went through. Systems, contexts, parts that only make sense in relation to other parts.
What happened with the GEM can be read in systemic terms: AI isn’t a substitute, it’s a new part of the system in which I do research. It changes the other parts, forces them to become visible, redistributes what each part does. It doesn’t replace the system.
That reading helps me. But I’m still developing it.
The guild question. This is the most uncomfortable one and I leave it for last because I haven’t resolved it. When I started writing this, I thought about not publishing it. I thought it would sound like a betrayal of other people who work qualitatively, who are going through the same question without saying it out loud. I thought it would sound like “I’m replaceable”, which is not what I’m saying, but could be read that way.
I still think the risk exists. But I also think that even if we want to close our eyes, this doesn’t go away. If those of us who do qualitative work don’t write about this, the conversation will be left in the hands of people who never did qualitative analysis, and the market response will be a caricature. I’d rather this voice be there, even while processing, than only the voice of someone selling courses on “AI replaces the researcher”.

What I changed in my practice since then
Since this experiment, two things have changed in how I work:
One. I’m writing down my process. As an internal working document. If I had to distill my workflow for a GEM to follow it, it makes sense to have it documented for myself too. To use in other studies, to iterate on it, so that at some point I can teach it better than I can now.
Two. When I use AI for analysis, I now think of it as “how I would do it”, not as what I expect the AI to do for me. I’m not sure that distinction comes through. I give it client context, the audiences for the report, the decisions that will be made from the findings. I didn’t do that before because I assumed the AI only needed the data. Now I understand that AI, without situated context, delivers analysis that’s technically correct but strategically flat. That context is still my job. And it will probably be increasingly my main job.
There’s a phrase from Cole Knaflic, the author of the Storytelling with Data challenge that triggered all of this, that keeps echoing for me. In the video Data Storytelling in the Age of AI, around minute 2:36, she says: “What AI still can’t do, however, is decide what matters. AI doesn’t know what’s right for this specific audience at this specific moment in time.”
I think the same goes for qualitative analysis, but one level deeper. AI can code, but it can’t decide what story the codes are telling, or for whom, or with what emphasis. That (still) remains ours.
Sources
- Cole Nussbaumer Knaflic. Data Storytelling in the Age of AI: What is the human role when AI can generate anything? [Video, YouTube]. Direct quote from minute 2:36. Link:
- Storytelling with Data Community. May 2026 Challenge: Human + AI. Link: https://community.storytellingwithdata.com/challenges/may-2026-human-ai


