- The SEOs Diners Club
- Posts
- 🔥 SEOs Diners Club #203: How ChatGPT Works as a Search Engine (Deep Dive)
🔥 SEOs Diners Club #203: How ChatGPT Works as a Search Engine (Deep Dive)
This week, I'm decoding how ChatGPT works as a search engine from a strategic perspective. I'm sharing actionable SEO tactics to survive this ruthless filtering process and get your content cited by AI. Plus, agentic commerce announcements from Google and a new tool from the industry.

Hey there,
I'm back with another newsletter keeping my finger on the pulse of digital marketing and AI. The industry moves so fast that sometimes I believe the smartest strategy is to pause, understand the "magic," and take the machine apart. That's exactly what this week's main topic is about: How does ChatGPT—this tool I use every day and integrate into my work—actually work? Is it a magical oracle that knows everything, or is there much more to it?
My strategic focus isn't on ChatGPT's unpredictable and probabilistic generation process, but on the deterministic—the traceable and optimizable—information retrieval process that feeds it. Understanding this mechanism is the only key to ensuring your content gets found and cited by AI.
Drawing from David McSweeny's eye-opening analysis "ChatGPT Is a Search Engine. Here's How It Works" published on QueryBurst, along with my own observations, this week I'm lifting the hood on the machine and laying out what you can do to dominate this new competitive landscape. The module names below are coined by David McSweeny and the actual names may differ.
🚀 The ChatGPT Information Retrieval Pipeline: Surviving the 7-Stage Filtering Process
When we ask ChatGPT a simple question, the fluid and confident answer we receive is typically imagined as the work of a single massive, all-knowing AI model. But the reality is far more layered, more mechanical, and holds far more opportunity for us SEO professionals.
This is a ruthless filtering pipeline where each stage determines whether a website makes it into the final answer. Contrary to popular belief, complex queries aren't answered by a single giant model. Instead, a multi-stage "pipeline" of smaller, specialized models is used to analyze a query, search the web, filter results, and ultimately create a refined context for the main language model to synthesize. This process makes it similar to a traditional search engine, and each stage creates an opportunity for competitors to be eliminated.
Understanding this mechanism forms the foundation of what we call "AI Engine Optimization" (AEO) or "Generative Engine Optimization" (GEO). Our goal isn't to predict the probabilistic final output based on chance, but to ensure our content successfully passes through each stage of this deterministic filtering funnel. Here's that fascinating and ruthless 7-step process:

1. Classification (Triage): The First Greeting at the Door
Your query is first met by a lightning-fast small classifier model called "Sonic." Its sole job is to understand the nature of the incoming query and determine the next step. Think of it like the first operator when you call a customer service center. The operator listens to your problem and decides in milliseconds: Is this a simple request they can solve immediately? (Like "tell me a joke"). Or is this a complex query requiring current information or in-depth analysis, thus needing to be transferred to a specialist team (in this case, the web search process)? (Like "what are the best SEO trends for 2026?"). This initial classification allows the system to use resources efficiently and avoids triggering the costly and slow full search process for every query. Whether your content even enters this race depends on this model's decision right from the start.
2. Strategy Development ("Thinky"): The Query Brain
If a search is decided upon, the expert model called "Thinky"—the real brain of the process—takes the stage. Thinky takes your query and transforms it into a searchable strategy on the web. It doesn't produce a single query. It typically creates 3 to 5 different search queries. Some of these are simple keyword queries like what we'd type into Google ("how ChatGPT works"). Others are seemingly nonsensical, very long and complex "semantic search" queries. These semantic searches are actually vector embeddings designed to capture the intent and context underlying the query more deeply. With this multi-query strategy, Thinky maximizes the chance of finding the best sources that can address the topic from different angles.
3. Elimination (Metadata Filter): The Importance of Your SERP Presence
Thinky's generated queries are sent to search engines, and a pool of approximately 40–50 potential URLs is returned for each. This is where the first ruthless elimination begins. Thinky doesn't waste time reading the full content of these hundreds of candidates. Just like a user scanning search result pages, it only looks at the page title, meta description, and URL structure. These three elements provide strong signals about what the page is about. If your title and meta description don't resonate with the user's (in this case, Thinky's) intent, no matter how great your content is, you're eliminated at this stage. A weak SERP (Search Engine Results Page) presence means your content is knocked out before it's even seen.
4. Data Retrieval ("The Big Filter"): The Vital Role of Speed
This stage is the most ruthless gate proving how critical technical SEO is. The content of approximately 10–20 pages that passed the metadata filter begins to be fetched. But there's a very strict rule here: roughly a 2-second timeout. If your server is slow, your TTFB (Time to First Byte) is high, or your page loads late due to JavaScript and large images, the system won't wait. Your content is either fetched incompletely ("truncated") or the process is completely canceled and you're eliminated. This is a death sentence especially for slow hosting services or poorly optimized sites. Your perfect content can get stuck in this "big filter" just due to a technical glitch.
5. Scoring (Semantic Comparison): The Battle of Chunks
The content of pages that passed the performance test is divided into small text pieces ("chunks") of approximately 128 tokens (~100 words). Then those semantic queries Thinky created in step 2 come into play. Each text chunk is mathematically scored based on how semantically close it is to these semantic queries. This isn't simple keyword matching; it's an evaluation based on meaning and context. Every section of your page is individually assessed for its potential to meet the query's intent.
6. Final Selection ("Auditions"): The Moment of Fate
From among the hundreds of chunks for each page, the single highest-scoring piece becomes that page's "audition chunk." In other words, the fate of your 2,000-word article depends on the performance of a single ~100-word text chunk deemed most relevant within it. Thinky evaluates these best chunks (one from each candidate page) one final time, narrowing the final list down to 3–5 "winning" pages. Everyone else is eliminated at this stage. This is the critical moment showing how important the densest, most valuable, and most relevant part of your content is.
7. Synthesis and Response (The Grand Finale): The Conductor Takes the Stage
Only at this final stage does that large language model we know—the one that runs slowly (like GPT-5.2)—take the stage. Its job is to manage the information from 3–5 sources that have been eliminated, filtered, distilled, and proven most relevant, like a conductor. It synthesizes this information, combines it with its own training data, and writes that smooth, fluent, and human-like response for us. Even ready-made summaries from user-generated content platforms like Reddit or predetermined "VIP lane" trusted sources like Time and Forbes can be injected into this mix with preferential treatment. This shows the system doesn't just rely on information from the web, but also prioritizes certain sources.

Strategic Assessments and Common Pitfalls
This mechanical process makes it vulnerable to manipulation, especially SPAM. ChatGPT doesn't measure page reputation with complex backlink profiles like Google does; it largely relies on domain names and superficial signals. This can make it unable to distinguish a low-quality press release published on a trusted site from an award-winning article on a lesser-known blog. Short-term "GEO Spam" tactics trying to exploit this vulnerability can risk your brand's long-term search engine health and reputation.
Also, trying to track fixed "rankings" in ChatGPT is chasing an illusion. Due to personalization (query generation based on user's chat history) and probabilistic generation (creating different texts with the same sources), results vary for each user. The real goal isn't capturing a specific answer, but consistently getting into this "retrieval pool" for your target audience's queries.
📊 Algorithm & Search Engine Updates
This week the search world was quite active. Google announced a significant algorithm update on January 8th aimed at improving search quality and user experience. Early signals suggest Google has begun more aggressively filtering generic AI-generated content. Real human experience, expertise, and personal brand are becoming more valuable than ever.
The most strategic news from Google came from CEO Sundar Pichai's speech at the National Retail Federation (NRF) event. Pichai opened the doors to a new era he calls "agentic commerce." In his remarks, he spoke about a future where AI agents will manage the entire customer journey from discovery to purchase. The cornerstone of this vision is a new open-source protocol they call "Universal Commerce Protocol" (UCP). Developed with giants like Shopify, Walmart, and Target, this protocol will enable us to see direct "buy" buttons on Google surfaces (including AI Mode and Gemini). Most importantly, he emphasized that in this process, the retailer will remain the "merchant of record" and continue to own the customer relationship. This signals a future where search behavior evolves from keywords to natural conversations and purchase decisions are made directly within the search interface.
🤖 From the AI World
There are exciting developments on the AI front too. On my YouTube channel this week, I covered two important topics. The first was a video on dynamic note-taking and competitive analysis with NotebookLM. The second was the first in my new Claude.ai video series. In this first video titled "Creating a Project from Scratch with Claude.ai," I connected Semrush to Claude.ai and analyzed our website Stradiji.com through a Claude project I created.
There's also great news from our industry. My dear friend Yusuf Özbay announced his web application Schema Data Generator, which he developed as a solution to a practical need. In Yusuf's own words, "developing a solution at light speed while experiencing a problem feels incredibly good." This tool allows users to dynamically generate Schema data in JSON-LD or Microdata format for FAQPage (Frequently Asked Questions) from content or URLs using their Gemini API keys. Considering how important FAQs are for GEO (Generative Engine Optimization), I believe this is an MVP (Minimum Viable Product) that will make everyone's work easier. Yusuf mentions that updates like nested Schema data and more LLM integrations are coming soon. Well done, Yusuf!
Meanwhile, according to TechCrunch, Gmail is preparing to offer a personalized AI inbox. This new feature will summarize emails, prioritize important ones, and even prepare response drafts for you. This could be a big step for productivity and email management.
Finally, OpenAI announced ChatGPT Health. Noting that 230 million users ask health-related questions each week, OpenAI aims to provide more reliable and accurate health information with this new feature. This is an indication of how AI can play a role in a sensitive and important field.
💡 Practical SEO Tips: Optimization for Retrieval and Citation
Now that we understand ChatGPT's ruthless filtering process, you should focus your strategy on ensuring your content successfully passes through each deterministic filtering stage, rather than chasing probabilistic final answers:
1. Technical Performance to Overcome Stage 4: No matter how good your content is, if the system can't fetch it in time, it will remain invisible. Low TTFB (Time to First Byte) and fast page load speed directly increase the likelihood of your content being fetched within that narrow 2-second window.
2. Designing the "Audition Chunk" to Conquer Stages 5–6: The first ~100 words of your page—the "audition chunk"—is the elevator pitch for your entire article. This piece should contain information that's semantically closest to the targeted search intent, dense and clear; it should be free of navigation links like "skip to content."
3. Optimizing SERP Presence to Pass Stage 3: Create titles and meta descriptions that directly respond to user intent to signal to the "Thinky" model that your page is valuable. Clear URL structures and Structured Data usage will also help you pass this first elimination.
Closing
That's all for this week! As you can see, AI isn't just a tool—it's a force fundamentally changing how we work, how we access information, and even how we think. Understanding this transformation and shaping your strategies according to this new reality will be the most critical competency of the coming period.
If you want to take your company's SEO and content strategies one step ahead in this new AI era, reaching the right audience not just in search engines but also in AI answers, we at Stradiji would be happy to help. With my conversion-focused and data-driven approaches, I can prepare your brand for the future.
Best,
Mert Erkal
About Mert Erkal
Mert Erkal is an expert with 15+ years of experience in digital marketing and SEO. As the founder of Stradiji, he provides consulting services to corporate companies on SEO strategies, conversion rate optimization, and AI integration.
More Information:
Company: https://www.stradiji.com
Digital Marketing Notes: https://www.merterkal.com/
Twitter: https://twitter.com/merterkal
LinkedIn: https://www.linkedin.com/in/merterkal
YouTube: https://www.youtube.com/@stradiji
I love sharing the latest developments and strategies in the SEO world with you. If you find my content helpful, you can support me by buying a coffee. ☕ Click 'Buy Me a Coffee' to contribute to knowledge sharing. Let's achieve more together!