A Casual Intro (aka The Why of TEP?)

Why is someone who doesn’t even work in AI writing about an AI protocol? Fair question.

Truth is, this isn’t my first idea in this space. Some of my past ideas turned out to already exist as research papers. When a good idea strikes, chances are, someone else has had the same lightbulb moment. But this one? This one has been stuck in my head for over seven months.

Back then, I called it Cognition Link Protocol (CPL). I even wrote it down in my private GitBook, long before I joined Google.

CLP

So why talk about it now? Simple:

I want this idea out there.
I want smarter people to help refine it.
I want to contribute something meaningful to AI’s future.

I don’t expect to be the next Tim Berners-Lee, but I’d like to live up to my own expectations of making a dent in this field.

Let’s be real. I’m not a visionary, nor a protocol designer, neither a computer scientist. But does that mean I shouldn't share my thoughts? Absolutely not.

If this blog post sparks your imagination and leads to something world-changing, I’ll be as happy as Satoshi Nakamoto the day Bitcoin went live.

How Did It All Start?

What is this all about? Don’t we already have AI agents that take action?

Good question. Let’s talk about the Rabbit R1 fiasco.

A company claimed their product could book cabs, order food, and take action online, except it turned out to be just Playwright scripts running in the background. The AI wasn’t reasoning, it was just using hardcoded automation.

That got me thinking.

The Web was built for humans, not AI.

Websites are designed for visual presentation and user flow—not for machines to navigate efficiently.
Playwright scripts and similar automation tools weren’t built to assist AI systems.
Minor UI changes can break automation, making AI fragile and unreliable.

So instead of holding everything together with ducttape like with our internet, why not design a system from the ground up where AI can actually take action without hacks or workarounds?

That’s where Thought-Execution Protocol (TEP) comes in.

The Core Idea: AI Needs to Work as a Group, Not an Island

Darwin’s Survival of the Fittest theory is often misquoted. What he actually emphasized was Survival of the Fittest Group.

Right now, companies are forcing LLMs to do everything—from generating text to reasoning to taking action. But is that really the best way?

💡 LLMs don’t need to be all-powerful. They need to be team players.

Instead of expecting a single model to handle everything, what if AI agents could:

Take instructions
Identify the best AI or service provider for the task
Coordinate execution seamlessly

That’s exactly what TEP aims to standardize—an Layer 7 protocol (like HTTP) that enables AI to communicate, delegate, and execute tasks efficiently.

Imagine an AI that can not just generate text, but actually book your cab, process a refund, or schedule appointments reliably.

The Challenge: Making AI Responses Deterministic

One of the biggest roadblocks? LLMs are not deterministic.

Most non-AI systems are strict, rule-based, and expect precise inputs/outputs.
Protocols like HTTP, SMTP, or gRPC work because their response ( and its structures) are 100% predictable.

But AI? AI is unpredictable by nature.

We can’t have a system where:

AI spits out too little or too much information
AI responds in an inconsistent format
AI hallucinates critical execution steps

If we don’t solve this, AI-driven automation will be unreliable and frustrating.

By the end of this blog, I want to explore possible middle-ground solutions. Maybe we can structure responses better, or maybe there’s a way to ensure compliance.

At worst, I’ll have rubber-duck-debugged the problem. At best? Maybe you have a solution.

How It Could Work (Examples Coming Up)

A reverend once said that the people of Israel in the time of Jesus listened better when given examples.

Maybe that applies here too.

Let’s walk through a few example scenarios of AI-to-AI and AI-to-Machine communication under TEP to see how this could actually work in practice.

Scenario 1: Generate me an image

Let’s say we ask an AI system to generate an image. What actually happens?

If we’re dealing with a generic LLM, we have a problem—it’s going to try to generate the image itself, which isn’t what we want. Instead, our specialized model should handle it.

AI-To-AI

The diagram I’ve shared lays out how AI-to-AI coordination should work. I would assume this is what we would observe for AI-to-Machine coordination as well. Most of it is straightforward, but a few key steps deserve a closer look:

Step 3: Service Availability Check – How do we know the model is capable of handling the request?
Step 4: Response to Service Availability Request – What details does the agent need to move forward?
Step 6: Step-by-Step Execution Instructions – How does the agent ensure the right workflow?
Step 7: Action Execution – How do we actually execute the steps and action on the response?

Let’s break it down.

Service Availability Check

Before sending a request, the agent needs to confirm that the service can actually generate the image we want.

{
  "action": "capability_check",
  "query": "image_generation",
  "requirements": "color image, cartoon, 4K resolution"
  "requester": "123456"
}

Here, the agent is asking:

Can you generate a colorful, cartoon-style image in 4K?
If not, what are the limitations?

If the service can’t handle it—maybe it doesn’t support 4K, or it’s overloaded—it should respond accordingly. The agent can then either look for an alternative provider or adjust the requirements before proceeding.

Prediction: We’ll eventually need a search engine or marketplace for discovering AI services. Otherwise, how does an agent even find the best provider in the first place?

Service Availability Response

Once the check is complete, the service responds with what it can and can’t do.

{
  "action": "capability_response",
  "query": "image_generation",
  "supported": true,
  "parameters": ["prompt", "size", "style"],
  "example_request": {
    "action": "generate_image",
    "prompt": "funny cat",
    "size": "1024x1024",
    "style": "cartoon"
  },
  "request_id": "123456"
}

Now the agent knows:

The service can generate images.
It expects a prompt, size, and style as inputs.
It provides an example request to clarify the expected format.

At this point, the agent can structure its request properly.

Steps to Take

Once we confirm the service is available, the agent needs a workflow. That’s where step-by-step execution comes in.

{
  "action": "execute_steps",
  "workflow_id": "987654", # this is for agent (internal)
  "steps": [
    {
      "step_id": "1",
      "description": "Send image generation request with parameters",
      "dependencies": [""],
      "action": "generate_image",
      "parameters": {
        "prompt": "funny cat",
        "size": "1024x1024",
        "style": "cartoon"
      }
    },
    {
      "step_id": "2",
      "description": "Save to Drive",
      "dependencies": ["1"],
      "action": "save_to_drive",
      "parameters": {
        "account": "username@google.com",
        "folder": "/path/to/folder",
        "name": "Cat Pic"
      }
    },
  ],
  "request_id": "abcdef"
}

This structure ensures:

Each step is explicitly defined and executed in order.
Dependencies are tracked—Step 2 won’t run until Step 1 is complete.
The workflow is flexible—users can add more steps, like sharing the image after saving.

Without this, agents would be stuck improvising workflows, leading to unpredictable failures.

Action Execution

Once the image is ready, how does the agent action on it?

{
  "action": "fetch_resource",
  "task_id": "step7-fetch",
  "resource_type": "image",
  "source": "https://image-generator-service.com/job/123456",
  "destination": "temp_storage",
  "request_id": "req-xyz-001"
}

A few considerations here:

Is the result streamed or fetched as a static file?
How does the agent know when the resource is ready?
Should the agent retry if the request fails?

Right now, I assume the instruction set shown above is purely internal—meaning it happens inside the agent and doesn’t leave its execution environment. But there’s room for debate on the best approach.

Scenario 2: Book me an Uber ride to my office

This scenario works much like Scenario 1, but there’s an added layer of context-handling because the request is vague.

AI-to-Machine

When the user says, “Book me a ride to the office,” they don’t include all the details. But here’s where the magic of AI-to-Machine coordination shines—agents can pre-fetch and infer missing information by combining memory and real-time data from the service.

Let’s imagine what’s happening in the AI Agent’s "head":

“I can book a ride, but I need the user’s current location, destination, and price cap. Oh wait, they usually leave at 8 AM, prefer rides under $15, and always head to Office Y on weekdays. I can work with that!”

With this, the agent fills in the blanks and sends a well-structured request to the service.

Here’s a simplified breakdown of what this process might look like:

Service Availability Check – “Can the ride service handle this request?”
Context Enrichment – “Let me fetch stored preferences and recent ride details to make this seamless.”
Request Formation – “Here’s the current location, destination, price cap, and any extra info the service needs.”
Action Execution – “Send the request and process the response.”

Of course, this is just a simplified view. There’s likely more context-switching and back-and-forth behind the scenes. But the important thing is how AI systems can fill in the gaps so you don’t have to spell everything out.

Prediction: As AI systems evolve, they’ll need more advanced ways to manage memory and context to make vague queries like this feel intuitive.

Nondeterminism and the Kidney Stone

Now for the fun part—what happens when things don’t go according to plan?

AI isn’t deterministic. It can’t be trusted to generate perfect JSON every time. If we rely on it to structure requests, we’ll inevitably see failures.

One option? AI responds with CLI commands as output. The command itself matters, not the extra text.
Another option? Use structured RPC calls. Remove all ambiguity from the AI’s role in the process.

There might be a better approach—I just haven’t found it yet.

What do you think?

Conclusion

As we push the boundaries of AI-to-AI and AI-to-machine communication, one thing is clear: coordination isn’t just about capability—it’s about reliability and finding what's right to meet the need.

This is only the beginning. The systems we’re building today are laying the foundation for a world where machines collaborate as fluidly as humans. The real question is: how much trust can we place in their hands—and how can we design systems that earn it?

The answer is somewhere in the messy middle, but one thing’s for sure: we’ll get there, one misstep, workaround, and kidney stone at a time.

Here’s to figuring it out together,

Nino Stephen