Bertrand November 4, 2025

Your Data Is Not Their Platform

13 min read

Every time your customer service team sends a query to a third-party AI platform, you are sending your customer data, your operational language, your domain expertise, and your competitive intelligence to a server you do not control. The response comes back. The data stays.

This is not a privacy argument. This is an architecture argument.

The Rented Platform Problem

The standard AI adoption path for a European SME in 2025 looks like this: sign up for a managed AI service, feed it your company data, let it learn your patterns, depend on its outputs. The setup takes a week. The dependency takes a quarter.

The GDPR — specifically Article 28, which governs data processor obligations — requires a contractual framework between the data controller (you) and the data processor (the platform). Most companies check this box. Few companies understand what happens to the derivative value of their data once the platform processes it.

The distinction matters. Your customer data, in isolation, is yours. The patterns extracted from your customer data, combined with patterns from ten thousand other companies’ customer data, become a training signal. That training signal improves the platform’s general model. The general model is then sold back to you — and to your competitors — as a feature.

You are subsidising a product that will be used against you. With your own data.

What Data Sovereignty Actually Means

Data sovereignty is not about keeping data in a vault. It is about controlling the chain of value extraction. Three levels.

Level one: storage sovereignty. You know where your data physically resides. This is the GDPR baseline. Article 44 through 49 govern international data transfers. Most EU companies have addressed this — or think they have. EDPB guidance on cloud service providers has added specificity: knowing the country is not enough. You need to know the specific data centres, the subprocessors, and the conditions under which data may be accessed by third-party entities.

Level two: processing sovereignty. You control how your data is processed. This goes beyond GDPR’s Article 5 purpose limitation. Processing sovereignty means that when your data is used to train, fine-tune, or adjust a model, the resulting model improvements are attributable and controllable. Most managed AI platforms do not offer this level of transparency. The processing happens in a black box. The value extraction is opaque.

Level three: insight sovereignty. The patterns, predictions, and decisions derived from your data remain yours. Not as a legal claim — as a technical architecture. The insights generated from your operational data feed back into your systems, not into a general-purpose model that serves your competitors.

Most companies operate at level one and assume they’ve solved the problem. They have not.

The Architecture of Independence

Building data sovereignty into an AI deployment is not philosophical. It is architectural. Four technical decisions.

Decision one: where the model runs. A model running on your infrastructure (or dedicated cloud infrastructure with contractual guarantees) processes your data without transmitting it to a shared platform. This is not about building your own GPT. It is about deploying fine-tuned models — open-weight models like Mistral, Llama, or Qwen — on infrastructure you control. The compute cost is higher than a managed API. The sovereignty is absolute.

For most SMEs, the practical middle ground is a dedicated instance of a managed model with contractual guarantees that your data is not used for training, is not combined with other customers’ data, and is deleted after processing. Anthropic, OpenAI, and Mistral all offer such guarantees — but you have to read the specific contract, not the marketing page. The model card (a document I’ll write about separately) tells you more about what the model actually does than the sales deck.

Decision two: where the fine-tuning happens. If you fine-tune a model on your domain data — your customer support transcripts, your product specifications, your operational procedures — the resulting adapted model contains your competitive intelligence in its weights. That model should live on infrastructure you control. Fine-tuning on a rented platform means your domain expertise is embedded in a system you don’t own. If the platform changes its terms, raises its prices, or discontinues the service, your fine-tuned model goes with it.

Decision three: where the vectors live. RAG (retrieval-augmented generation) architectures use vector databases to store embeddings of your documents. Those embeddings are a compressed representation of your knowledge base. They should live on infrastructure you control — not on a managed vector service that co-mingles your embeddings with other customers’ data. Hosting your own vector database (Qdrant, Milvus, pgvector in a managed PostgreSQL instance) costs between €50 and €300 per month for a typical SME workload. That is the cost of owning your knowledge architecture.

Decision four: where the feedback loop closes. When users interact with your AI tool, their feedback — corrections, preferences, rejected suggestions — is the most valuable data in the system. It tells you where the model fails on your specific tasks. This feedback loop should close within your systems. If the feedback flows to a managed platform, the platform learns from your users’ corrections. You paid for the deployment. They get the learning.

Article 22 of the GDPR gives individuals the right not to be subject to decisions based solely on automated processing. This is usually discussed as a compliance requirement. It is also an architectural requirement.

If your AI tool makes decisions that affect individuals — credit scoring, hiring screening, service eligibility — Article 22 requires meaningful human oversight. “Meaningful” is the operative word. The Hamburg DPA’s 2025 enforcement action (a €492,000 fine for automated credit decision-making without meaningful human oversight) demonstrated that “meaningful” means the human reviewer must have the technical ability and the operational authority to override the automated decision. A rubber-stamp review process does not qualify.

When this automated decision-making runs on a third-party platform, the technical architecture for meaningful human oversight becomes more complex. The human reviewer needs access to the model’s reasoning (or at least its confidence signals), the input data, and the alternative decisions the model considered. If those are generated on a rented platform, the review process depends on the platform’s explainability features — which may be limited, may change without notice, and may not satisfy the DPA’s definition of “meaningful.”

On your own infrastructure, you control the explainability layer. You decide what the human reviewer sees, what override mechanisms exist, and how decisions are logged.

Owned Channels: The Content Parallel

The data sovereignty argument has a content parallel that is equally important and equally underappreciated.

Most companies produce content on rented platforms: LinkedIn posts, Instagram stories, Medium articles. The platform controls distribution. The algorithm determines reach. The terms of service define what you can say. Your audience is one algorithm change away from disappearing.

Owned channels — your website, your email list, your direct customer relationships — are the content equivalent of data sovereignty. You control the distribution. You own the relationship. The audience belongs to you, not to the platform.

At Bluewaves, every piece of content we produce lives on our own domain first. It may be syndicated elsewhere, but the canonical version lives on infrastructure we control. Every subscriber relationship is direct — no algorithm between us and the reader. Every piece of performance data flows to our analytics, not to a platform’s dashboard that can be deprecated without notice.

The same principle applies to AI deployment. Your AI tool should run on channels you own, serve users you have a direct relationship with, and generate data that feeds back into your systems. Renting reach is tempting because it’s fast. Owning reach is harder because it requires infrastructure. But rented reach is rented, and the landlord can change the terms at any time.

The Cost Comparison Nobody Does Honestly

Managed AI platforms price on usage: per token, per query, per API call. The marginal cost feels low. At scale, it compounds.

A 200-person company running a customer service AI tool that handles 500 queries per day at an average of 2,000 tokens per query is processing 1 million tokens per day. At current managed API prices (approximately $3–$15 per million input tokens depending on model and provider), that’s $90–$450 per month for inference alone. Affordable.

But add fine-tuning costs, vector database hosting, monitoring, and the implicit cost of data flowing to a third party, and the comparison shifts. A dedicated deployment on a managed Kubernetes cluster with an open-weight model costs €400–€1,200 per month for the same workload — with full data sovereignty, no per-token pricing, and no dependency on a provider’s pricing decisions.

The upfront cost is higher. The ongoing cost is lower. The strategic cost — the cost of dependency on a platform that controls your data pipeline — is zero.

Most companies never do this comparison because the managed API is faster to set up. Speed of setup is not a strategic advantage. Speed of setup is a tactical convenience that becomes a strategic liability.

The ECB Dimension

The ECB’s November 2025 Financial Stability Review noted that “concentration risk in cloud and AI service providers represents a systemic concern for EU financial stability.” The report specifically flagged the dependency of EU financial institutions on a small number of US-based AI infrastructure providers.

This is the macro version of the same argument. When thousands of companies depend on the same three AI platforms, a pricing change, a service disruption, or a policy shift affects all of them simultaneously. Concentration risk at the individual company level is dependency. Concentration risk at the EU level is a systemic vulnerability.

For an individual SME, the response is not to build your own cloud. It is to ensure that your AI architecture is portable — that you can move your models, your data, and your workflows to a different provider (or to your own infrastructure) without rebuilding from scratch. Portability is the architectural expression of sovereignty.

Open-weight models are portable by definition. A model you fine-tuned on Mistral can run on any infrastructure that supports the model format. A model you fine-tuned on a managed platform may or may not be exportable — check the contract.

Your vector database is portable if it uses open formats and open protocols. Your RAG pipeline is portable if it’s built on open-source components. Your feedback data is portable if it’s stored in a format you control.

Portability is not a feature. It is an architectural decision made before the first line of code.

What This Means Operationally

For an EU SME with 50 to 500 employees, data sovereignty in AI deployment means:

Use managed APIs for experimentation, not for production. Test models, evaluate capabilities, prototype use cases on managed platforms. When the use case is validated, build the production deployment on infrastructure you control. The pilot runs on their platform. The product runs on yours.

Fine-tune on your infrastructure. If your AI tool needs domain-specific knowledge, fine-tune an open-weight model on your data, on your infrastructure. The resulting model is yours — the weights, the adaptations, the competitive intelligence embedded in those adaptations.

Own the feedback loop. Every user interaction with your AI tool generates data. Corrections, preferences, usage patterns, failure modes — this data is more valuable than the original training data because it represents what your specific users actually need. Store it in your systems. Use it to improve your model. Do not send it to a managed platform where it becomes part of their general training signal.

Build for portability. Use open formats, open protocols, open models. When you can switch providers in a week rather than a quarter, you have sovereignty. When switching takes six months of re-engineering, you are a tenant, not an owner.

Read the contract, not the marketing. The terms of service for AI platforms are not marketing documents — they are legal instruments that define what happens to your data. Read them. Specifically: does the provider use your data for model training? Under what conditions? Can you export your fine-tuned model? Your vector embeddings? Your usage logs? If the answer is no, you know what you are buying.

The Build-vs-Buy Decision, Reframed

The conventional build-vs-buy decision in AI focuses on capability: can you build a model as good as the managed service? The answer, for most SMEs, is no. The managed models are trained on more data, with more compute, by more researchers than any SME can replicate.

But the decision is not about capability. It is about control.

Buy the capability. Own the data. This is the practical middle ground that most sovereignty discussions miss.

Use the managed model’s API for inference — for generating outputs, answering questions, classifying inputs. The model’s capability is rented. The data that flows through the model is not.

Own the data pipeline: the inputs, the outputs, the feedback, the corrections, the usage patterns. Store them in your systems. Analyse them with your tools. Use them to evaluate, improve, and eventually replace the managed model with a fine-tuned open-weight alternative.

Own the vector database: the embeddings of your knowledge base, your documents, your operational procedures. These are your organisational knowledge in compressed form. They should not live on a shared platform.

Own the evaluation framework: the benchmarks, the test cases, the quality criteria that determine whether the model’s outputs are good enough for your specific use case. The managed platform’s generic benchmarks do not capture your domain requirements.

The sequence is: rent the capability, own the data, build the independence. The independence does not happen on day one. It happens over months, as your owned data accumulates, your evaluation framework matures, and your understanding of what you need from an AI model becomes specific enough to justify a dedicated deployment.

The managed API is a starting point. It should not be the architecture.

The Principle

Your data is not neutral raw material that gains value only when processed by a platform. Your data is your competitive advantage, your operational intelligence, your customer relationships expressed as information. It is the product of years of work, thousands of interactions, millions of decisions.

When you send it to a platform you don’t control, you are exchanging sovereignty for convenience. The convenience is real. The cost is hidden — until the platform changes its pricing, its terms, or its API, and you discover that the foundation of your AI capability belongs to someone else.

Own your data. Own your models. Own your channels. Own the infrastructure that turns your knowledge into competitive advantage.

The alternative is building your house on rented land and hoping the landlord never raises the rent.

The landlord always raises the rent.

Own your data. Own your models. Own your channels. The architecture of independence is more work upfront. It is less work in total. And the work produces something that rented convenience never produces: an asset that compounds.

Your data, your models, your feedback loops — these compound. Every month of operation makes the next month more valuable. Every user interaction improves the next interaction. Every correction makes the system more accurate.

On a rented platform, the compounding benefits the platform. On your own infrastructure, the compounding benefits you.

Own the compound. The rent is never worth it.

Written by

Bertrand

Creative Technologist

A serial entrepreneur with a PhD in AI and twenty-five years building systems across Europe. He creates code the way he surfs: reading patterns, finding flow, making the difficult look easy.

← All notes