Back to perspectives

PERSPECTIVES

Physical AI with Helsing: When Models Have to Act

Why the hardest deployment environments reveal where physical AI actually stands, and what it takes to close the gap? Earlybird's investor Laurin Class shares his perspective below.

Mar 6, 2026

6 Min Read

Earlybird News

Share

Last week we hosted an AI breakfast roundtable in Munich with Robert Fink, CTO of Helsing. Around the table: robotics founders, AI researchers, and defense engineers, all building systems that have to work in the real world. Robert walked us through Helsing's product stack with videos and architecture diagrams, from strike drones to autonomous fighter jets to underwater sensor systems. What followed was an honest conversation about what happens when AI models have to act, and where they work best when complemented with classical systems.

Defense is physical AI on hardcore mode. Every constraint that civilian robotics encounters in moderated form shows up in defense at maximum intensity: scarce data, adversarial environments, limited compute, no connectivity, and regulatory friction. GPS gets jammed. Communication gets denied. Your adversary actively wants to deceive your sensors. There is no cooperative environment. Studying how AI performs under these conditions reveals, more honestly than any lab demo, where physical AI actually stands today and where it breaks. The lessons generalize far beyond the battlefield.

The data problem in physical AI is well understood in principle, but the reality is worse than most people assume. In defense, there is effectively no open data. Electronic warfare, the discipline of reading and classifying radar pulses, has no equivalent of ImageNet. You cannot download a dataset of Russian tank signatures. Sensor data from military exercises often literally evaporates because there is no infrastructure to store it, and when it does exist, it is typically locked behind proprietary interfaces with no documentation and heavy legal or regulatory restrictions. 

Helsing's response is pragmatic: to train and evaluate models for their underwater sensor systems, they rent boats and drop hydrophone sensor rigs into the water to collect training data themselves. For their electronic warfare models, the majority of the training pipeline runs on simulation, because real data of “wartime modes” simply does not exist. This is an extreme case, but the pattern holds across physical AI more broadly. Anyone building robots for construction, agriculture, inspection, or warehouse logistics faces a version of the same problem: the real world does not come with labels, and the interesting edge cases, the ones that actually matter for deployment, are by definition rare. Simulation-first development is increasingly the only viable path forward. And the quality of your simulator becomes your moat.

For their AI fighter pilot, a reinforcement learning agent that consistently beats real human pilots in simulated dogfights and has actually flown on a Swedish Gripen jet, Helsing trains through a cascade of simulators with increasing fidelity. The first stage teaches flight physics on open, unclassified models. The next adds simplified missile dynamics from customized off-the-shelf or in-house flight simulators. The final stage of fine-tuning happens in classified facilities where you can bring data in but not out. The model that emerges is small enough to run on tiny embedded chips while being superhuman at its task. This cascading approach, from cheap and broad simulation to expensive and narrow reality, is a pattern worth studying for any physical AI company, whether the domain is dogfights or dish-washing.

One of the sharpest insights from the discussion was about the coexistence of classical and learned methods. The discourse tends to frame this as a binary: either you use traditional control theory, or you go end-to-end neural. The reality at Helsing, and likely at most serious physical AI companies, is far more pragmatic. You use what works best where it works best. 

For flight control on strike drones, canonical control theory works very well and carries the enormous advantage of being understood and trusted by regulators. No AI needed, and that is fine. For navigation without GPS, Helsing combines classical visual-inertial odometry (tracking pixels across frames and combining that with motion sensor data) with AI methods for grounding against satellite imagery. Two algorithms – each doing what it does best – integrated into one capability.

For target acquisition in difficult conditions (think fog, rain, and whilst under attack), the AI-based approach delivers performance that is obviously superhuman, identifying targets at distances and in conditions no human operator could match. Here, AI is a categorically different capability. The principle that emerges is knowing precisely where each approach excels and combining them without ideology. This applies well beyond defense: any physical AI system that ships to production will likely be a hybrid of established methods and learned components, assembled pragmatically.

Edge computing in defense is not a philosophical choice. It is the only option. When your communication bandwidth tops out at one kilobit per second and everything is actively jammed, there is no cloud debate. Models must be small, efficient, and run on embedded hardware. Helsing's fighter pilot agent fits comfortably on minimal silicon. Their strike drone perception runs entirely onboard. Their underwater detection system operates with zero connectivity by design. This forced discipline of building capable systems under extreme resource constraints may actually produce more robust architectures than the cloud-first approach dominant in civilian AI. For anyone building physical AI systems that need to operate reliably, whether on a factory floor with intermittent connectivity or on a delivery robot navigating a dead zone, the lesson is useful: design for the edge first, and treat cloud as an optimization you can add later.

A common misconception about neural networks in safety-critical systems is that they are inherently non-deterministic. Helsing's approach to certification cuts through this confusion. Once you fix the weights, a neural network is deterministic. It will produce the same output for the same input every time. The non-determinism lives in training; in inference, the model is as predictable as any classical system. This reframing matters enormously for regulation. Traditional safety assurance works by having humans inspect code line by line against requirements. For AI models, assurance must also embrace automatable, statistical methods: large test suites, behavioral thresholds, automated checks. And if you want to keep pace with adversaries or competitors who iterate on days and weeks, the assurance process itself must move from years to days or hours. Building the tooling to automate continuous evaluation of physical AI systems is an underexplored area with significant compounding value, and it applies equally to autonomous vehicles, surgical robots, and industrial automation.

Underneath all of this runs a sovereignty question that most of the AI industry prefers to ignore. Robert laid out the dependencies with unusual clarity. Today, state-of-the-art vision-language models either come with prohibitive license clauses or a heritage incompatible with defense applications. If the US were to impose export controls on frontier AI models or advanced AI chips, European defense and potentially European industry more broadly would lose access overnight. Most off-the-shelf PCBs have a transitive supply chain across Asia. Cloud infrastructure in Europe that meets sovereign security requirements barely exists.

"Should Helsing own a mine?" was posed half-jokingly but reflects a real strategic calculus. The supply chain is simply not ready for this shift, and the dependencies run deeper than most founders realize. This is a defense-specific problem to a degree, but every physical AI company that relies on frontier models, specialized chips, or cloud infrastructure from a single geography carries a version of this risk.

Physical AI is still early. The algorithms largely work. What is missing is the infrastructure around them. Data pipelines that treat sensor data as a strategic asset. Simulation toolchains sophisticated enough to close the sim-to-real gap across domains. Regulatory frameworks that can evaluate statistical safety evidence at iteration speed. Sovereign compute and model supply chains that hold up under geopolitical stress. These are institutional, industrial, and political problems, and they will gate the deployment of physical AI far more than any model architecture choice. Defense reveals this honestly because the stakes strip away comfortable assumptions. The frontier is real. The infrastructure to reach it is still being built.

Read more on Laurin’s Substack here.
To not miss similar events, subscribe to the Gradient Descending calendar
here.