Bertrand October 7, 2025

Real Artists Ship

13 min read

Steve Jobs said it to the Macintosh team in January 1983. They had been refining, debating, polishing — doing everything except finishing. “Real artists ship.” Three words that separated the people who make things from the people who talk about making things.

Forty-two years later, most AI projects still haven’t heard them.

The Pilot Problem

Industry estimates consistently show that the vast majority of AI pilot projects never reach production deployment. Gartner and IDC have both reported that only a fraction of enterprise AI initiatives — across global samples — progress beyond the pilot stage within eighteen months of inception. The rest remain in some variant of “proof of concept,” “evaluation phase,” or “stakeholder alignment” — which is corporate vocabulary for standing still.

The rate is worse for SMEs. Smaller enterprises lack the dedicated engineering teams and integration infrastructure that move pilots to production. Among micro-enterprises, the pilot-to-production conversion barely registers.

Pilot-to-production funnel

These are not failed projects. They are projects that never attempted to succeed. A pilot is not a product. A pilot is a controlled environment where failure has no consequences and success has no users. It is theatre with a budget line.

Why Pilots Don’t Become Products

Three structural reasons. Not one of them is technical.

The first is accountability diffusion. A pilot belongs to everyone and no one. The innovation team proposed it. IT approved the infrastructure. The business unit provided the use case. The steering committee reviews the quarterly update. Five groups are involved. Zero groups are responsible for putting the tool into the hands of the people who will use it daily.

In a production deployment, someone’s name is on it. Someone decided this tool ships on this date to these users. That decision is uncomfortable. Pilots exist to avoid it.

The second is success criteria inflation. Pilots begin with modest goals: “Can the model classify customer inquiries with 85% accuracy?” The model achieves 87%. Success. But then the success criteria shift. Can it handle edge cases? Can it integrate with the ERP? Can it process inquiries in four languages? Can it run on-premises? Each question is reasonable. Together, they form an infinite qualification loop that ensures the pilot never ends because the finish line keeps moving.

Enterprise survey data across multiple sources shows this pattern clearly. Among companies that report “AI in evaluation,” evaluation periods routinely stretch beyond a year. A year or more of evaluating whether a tool works, while the team that would use it waits — or, more likely, builds a spreadsheet workaround and moves on.

The third is fear of adoption failure. This is the real one. A pilot that stays a pilot cannot fail publicly. A product that ships to 200 users and gets ignored is a visible failure — in the budget, in the metrics, in the hallway conversations. The pilot is a hedge against embarrassment. Keep it small, keep it contained, keep it away from the people who might reject it.

But rejection is data. Rejection tells you what the tool actually needs. A pilot that runs for a year and produces a positive evaluation tells you nothing about whether anyone will use the thing. Adoption is the only metric that matters, and you cannot measure adoption without shipping.

What “Ship” Actually Means

Jobs was specific. Shipping was not releasing. Shipping was not making available. Shipping was putting a finished product into the hands of the people who would use it, in their actual environment, with their actual constraints.

For AI tools in a European SME, shipping means:

The tool is accessible to the people it was designed for — not the innovation team, not the IT department, but the procurement officer, the customer service agent, the logistics coordinator. The actual users.

The tool is integrated into the actual workflow. Not a separate tab. Not a new login. Not a dashboard nobody visits. Integrated into the place where the work happens.

The tool has a feedback mechanism. Users can report what works and what doesn’t, and someone acts on those reports within days, not quarters.

The tool has an owner. One person whose job includes making sure this tool stays useful. Not a committee. Not a channel. A name.

Bluewaves calls this the “three-week test.” If the tool isn’t in daily use within three weeks of deployment, something is wrong — not with the tool, but with the deployment architecture. Three weeks. Not three months. Not “after the next training session.” Three weeks.

The Prototype Is the Argument

Leonardo da Vinci kept notebooks full of ideas. He also built things. The difference mattered. An idea in a notebook is speculation. An idea in the world is an argument — it argues for its own existence by working or failing. Both outcomes are useful. Only one is available to the idea that never ships.

The same principle applies to every AI deployment. A model in a Jupyter notebook is a hypothesis. A model in production is an argument. It argues that this particular task, done this particular way, produces better outcomes than the previous method. The argument is testable. The hypothesis is not.

I have built eight companies across six countries. Every one of them started with a prototype that shipped before it was ready. Not because impatience is a virtue — because feedback from real users is the only input that matters, and you cannot get it from a pilot environment.

The first version of every good product is embarrassing in retrospect. The first version of every good product also taught its creators more in two weeks of real use than six months of internal testing.

The Cost of Not Shipping

A failed AI pilot costs an SME between €10,000 and €50,000 in direct spend, depending on company size and project scope — licensing, compute, consultant hours, internal time allocation. These figures do not include opportunity cost — the competitive advantage that accrues to the company that ships while you evaluate.

But the real cost is cultural. Every pilot that dies teaches the organisation a lesson: AI is experimental. AI is not for us. AI is something the innovation team plays with while we do real work. This lesson compounds. After the second failed pilot, the third one faces a credibility deficit that no steering committee presentation can overcome.

The inverse is also true. One tool that ships, that works, that people actually use — that single deployment changes the organisation’s relationship with AI permanently. The team that names the tool (a reliable sign of adoption, as Érica has documented) becomes an advocate. The team that sees results becomes curious. The cultural momentum from one successful deployment is worth more than ten successful pilot evaluations.

The European Disadvantage That Isn’t

There is a narrative that European companies are slower to adopt AI because of regulation, because of risk aversion, because of cultural conservatism. The narrative is wrong — or rather, it’s imprecise enough to be useless.

European companies are slower to adopt AI because they over-pilot. They evaluate longer, qualify more thoroughly, and build more comprehensive business cases before committing. These are not character flaws. In many contexts, they are strengths. European manufacturing quality, European financial system stability, European product safety records — all of these come from a culture of thoroughness.

But thoroughness applied to pilot projects produces thoroughness without shipping. The same rigour that ensures a German car doesn’t break should ensure that an AI deployment works. Instead, it ensures that the AI deployment never leaves the test track.

The EU AI Act, which takes full effect in stages through August 2026, actually provides a framework for responsible shipping. The risk classification system (Article 6) tells you exactly what level of oversight each deployment requires. The conformity assessment procedures (Articles 16-22) define what “ready to ship” looks like for high-risk systems. These are not obstacles — they are specifications. An engineer reads a specification and builds to it. A committee reads a specification and schedules a meeting about it.

Regulation is a creative constraint. The best products in history — from the original Macintosh to the Volkswagen Golf to the EU’s own SEPA payment system — were built within tight constraints. Constraints don’t prevent shipping. They define what shipping looks like.

The Riff and the Performance

There’s a moment in live music when a guitarist has rehearsed a riff a thousand times and still hesitates before playing it on stage. The rehearsal room is safe. The stage is not. The audience will hear every imperfection. The temptation is to rehearse once more, to refine once more, to wait until it’s perfect.

David Gilmour doesn’t wait. He plays. And the slight imperfections — the human timing, the breath before the bend — are what make it real. The studio version is perfect. The live version is true.

AI deployment works the same way. The pilot environment is the rehearsal room. Production is the stage. The tool will encounter inputs you didn’t predict, users you didn’t train, workflows you didn’t map. Some of those encounters will produce imperfect outputs. Good. Now you know what to fix. You cannot learn that from the rehearsal room.

What We Actually Do

At Bluewaves, the build methodology is three waves of three weeks each. Not because three weeks is a magic number — because three weeks is long enough to build something real and short enough to make hiding in a pilot impossible.

Wave one: build and deploy. The tool goes to real users on real tasks within the first three weeks. Not a demo. Not a sandbox. Real.

Wave two: observe and adjust. Watch what people actually do with the tool. Not what they say they’ll do. What they do. Adjust the tool based on observed behaviour, not reported preferences.

Wave three: optimise and document. The tool works. Now make it faster, more accurate, better integrated. Document what was learned for the next deployment.

Nine weeks. Three iterations. One deployed product. Not perfect. Deployed.

The alternative — the twelve-month evaluation cycle, the quarterly steering committee, the stakeholder alignment sessions — is more comfortable. Nobody’s name is on a failure. Nobody’s reputation is at risk. Nobody ships.

The Compound Effect

The difference between a company that ships its first AI tool in October 2025 and a company that ships in October 2026 is not twelve months. It is twelve months of compound learning.

The company that ships in October 2025 will have twelve months of production data by October 2026. Twelve months of user feedback. Twelve months of adjustments, improvements, and accumulated knowledge about how its specific users interact with AI tools in its specific operational context. The model will have been refined. The workflows will have been optimised. The team will have developed fluency. The organisation will have absorbed the cultural shift from “we have an AI strategy” to “we use AI.”

The company that ships in October 2026 will be starting from zero. Same technology. Same features. Same model capability. Zero accumulated learning. Zero production data. Zero organisational muscle memory.

The compound effect in AI deployment is not about the technology. The technology improves regardless of whether you use it. The compound effect is about operational knowledge — the organisation’s understanding of how AI tools interact with its specific workflows, its specific customers, its specific constraints. This knowledge compounds. It cannot be accelerated. It can only be started.

Every month of delay is a month of forgone compound learning. The cost is not linear. It is exponential — because each month of learning makes the next month more productive, and the gap widens with time.

This is why “let’s wait for better models” is the most expensive sentence in AI strategy. The models will be better in six months. They will also be better in twelve months. And in twenty-four months. The model improvement is continuous and external. The operational learning is internal and must begin. The best model in the world, deployed to a team with no operational experience, will underperform a mediocre model deployed to a team with twelve months of production learning.

The surfer who waits for the perfect wave never learns to surf. The waves keep coming. The learning only happens in the water.

Ship early. The compound effect starts at deployment. It starts nowhere else.

The Uncomfortable Truth

Most AI projects die not because the technology fails but because nobody commits to the moment where the tool meets its users. The technology is ready. The infrastructure exists. The regulatory framework is defined. The use case is clear. What’s missing is the decision: this ships on this date to these people.

That decision requires someone to accept that the first version will be imperfect. That some users will be frustrated. That some use cases won’t work as expected. That the dashboard will show adoption metrics that start low and rise slowly — if the deployment is done right — or start low and stay low, which is also useful information.

The decision requires someone who cares more about deploying a working tool than about presenting a successful pilot.

The EU has approximately 33 million enterprises. According to Eurostat’s December 2025 data, roughly 20% of enterprises with 10 or more employees have adopted AI in some form. The 80% that haven’t are not waiting for better technology. They are waiting for someone to say: this ships.

The Anti-Pilot Manifesto

Let me be explicit about what I am arguing, because the conventional wisdom pushes back hard.

I am not arguing against testing. Test rigorously. Test with real data. Test with edge cases. Test with hostile inputs. Testing is engineering. Engineering is non-negotiable.

I am not arguing against planning. Plan the deployment. Map the workflow. Identify the users. Design the integration. Planning is architecture. Architecture is non-negotiable.

I am arguing against the pilot as a permanent state. The pilot that runs for six months without a ship date. The pilot that is renewed quarterly because “we need more data.” The pilot that has become a comfortable, low-risk, low-accountability activity that lets the organisation say “we’re working on AI” without ever putting a tool in front of a user.

The pilot is not inherently wrong. A two-week pilot that validates a use case and then ships is a powerful tool. A two-week pilot that validates a use case and then becomes a four-month evaluation that becomes a twelve-month assessment is not a pilot. It is avoidance with a timeline.

The distinction is the ship date. A pilot with a ship date is an engineering activity. A pilot without a ship date is an organisational comfort mechanism. The ship date forces a decision: this is good enough to deploy, or this is not worth deploying. Both outcomes are useful. Neither is available to the pilot that never ends.

Set the ship date before the pilot begins. Write it down. Tell the team. Tell the stakeholders. Tell the board. The tool ships on this date, or the project is cancelled on this date. There is no third option.

Real artists ship. Real engineers ship. Real companies — the ones that will still be competitive in 2030 — ship.

The pilot is over. Ship it or shut it down.

Written by

Bertrand

Creative Technologist

A serial entrepreneur with a PhD in AI and twenty-five years building systems across Europe. He creates code the way he surfs: reading patterns, finding flow, making the difficult look easy.

← All notes