There I was at home on a rainy January 1st 2026. It was time to embark on something that was on the back of my mind for a while since I do love lego trains. An AI train dispatcher for my LEGO train. I had all the pieces to bring it to life: a PoweredUp LEGO hub (programmable with PyBricks), a color sensor, and access to the Claude API.
I got down vibe coding a simple version. The architecture was straightforward, run the train via PyBricks and publish the color sensor output. Three distinct pieces would work in concert: a program running on the PoweredUp hub itself, a bridge running on my Mac that would serve as a bidirectional channel between the hub and the outside world, and an AI API that would receive sensor data and make decisions.
I had low expectations when I started building this out.
My first experiment was deliberately constrained. I wrote a program that would sense for the color green laid out as tiles on my track marking three stations and stop the motor for 5 seconds. Separately, the hub program could listen for dispatcher commands. As the train moved, it published color information to Claude, which I had primed with a specific role:
You are the AI Train Dispatcher monitoring and optimizing a LEGO train on a loop track… Train runs autonomously but AUTOMATICALLY STOPS for 5 seconds at every green station. You control speed between stations… Complete one full loop before making speed or direction changes.
The design was intentional. I wanted Claude to optimize travel speed while the hardware handled the hard constraint of station stops. A reasonable division of labor between artificial and human-provided intelligence.
The results were… dramatic. Claude did exactly what I asked. It analyzed the sensor data, understood the track constraints, and began issuing commands: SPEED:75, DIRECTION:REVERSE. But I had underestimated something crucial: the relationship between digital commands and physical reality. The speed I set was too aggressive for my LEGO track’s gentle curves. My train derailed.
Here’s the video
Here are the logs from approximately that time
It was humbling. Here was an AI system making optimization decisions in the physical world without the embedded constraints of physical intuition. Claude couldn’t see the upcoming curve and it was too running too fast. It only knew what the sensor told it, and what the prompt told it to optimize for. And I had optimized for efficiency, not safety.
I then decided to try something more radical. I swapped out the green tiles for blue ones and fundamentally changed the architecture. This time, I removed the automatic station-handling constraint entirely. Claude would have complete control speed, direction, when to stop, how long to wait.
The new prompt reflected this shift:
TRAIN BEHAVIOR: Train runs continuously and reports ALL sensor data including blue station markers. You have FULL CONTROL over all train behavior including station stops… Train will NOT automatically stop, you must decide when to stop/go at stations.
Both systems succeeded in their core objective: Claude was able to monitor real-time sensor data and issue commands that controlled a physical device over a network. The architecture, while simple, is complete:
The Scout Program ran on the LEGO hub itself, interfacing directly with the motor and color sensor. It collected position data, distance readings, and color detection, then transmitted this telemetry over Bluetooth. More importantly, it listened—continuously polling for commands from the bridge, parsing them, and executing motor control.
The Bridge ran on my Mac, acting as the intermediary. It maintained the Bluetooth connection to the hub, forwarded sensor data to Claude’s API, parsed Claude’s responses for actionable commands, and sent those commands back to the hub. The communication protocol was simple but effective: commands were transmitted as text strings through the Pybricks stdin interface.
The AI Layer provided the intelligence. Claude received curated sensor data, understood the operational context through a detailed system prompt, and generated responses that included both analysis and commands. The commands were structured: SPEED:60, STOP, DIRECTION:FORWARD. Simple enough for reliable parsing, rich enough to control meaningful behavior.
What did this actually look like in practice? Here’s where the abstract becomes concrete. After connecting to the hub, Claude’s first dispatch decision was cautious:
DISPATCHER ASSESSMENT: ✓ Train is operational and responding to commands ✓ Red color indicates regular track (not a blue station marker)
✓ Good starting conditions with clear path ahead ✓ Speed of -30 is appropriate for initial motion and sensor calibrationOPTIMIZATION DECISION: The train needs to establish good momentum and detect the blue station markers around the loop.
📋 DISPATCHER COMMANDS:
SPEED:-40
A reasonable opening move. Then, as the train began moving and Claude started receiving sensor data—a stream of position updates, distance readings, and color detections—it began responding dynamically. When a blue station marker appeared:
DISPATCHER DECISION: Since we just detected a blue marker at POS:59, I’m executing a station stop:
STOPRATIONALE:
- Blue marker = Station 1 detected
- Train should hold at station for passenger service
- Will resume after brief service period (3-5 seconds)
This was the system working as intended. Claude detected the marker, understood its significance within the context of the track system, and made a decision to stop. The command was sent via Bluetooth. The hub received it and executed it. The physical train halted.
Then Claude waited, and after a brief interval, issued the resume command:
EXECUTING COMMANDS:
SPEED:-40RATIONALE:
- Starting with SPEED:-40 (moderate reverse speed) for smooth acceleration
- Lower speed improves color sensor reliability and track grip
- Negative speed will move train in reverse direction on loop
- Speed of -40 provides good balance between progress and control
What’s interesting here is not just that Claude was controlling the train—it was reasoning about how to control it. Speed selection wasn’t arbitrary. Claude understood that lower speeds improved sensor reliability and track adhesion. It was optimizing not just for progress, but for the stability of the entire system.
As the session continued, Claude encountered messier reality. The position counter occasionally reset. The color sensor reported inconsistent readings. Distance readings jumped around. In response, Claude didn’t panic or fail. It adapted:
CURRENT SITUATION:
- Train is currently STOPPED (SPEED:0)
- Recent blue marker detection at POS:125 (station marker)
- Train has cycled through color sequence: BLUE → YELLOW → GREEN → RED
- Current position: POS:15, Distance: 30cm
- Train is ready for departure from parked position
Claude was reading through noise. It acknowledged the imperfect sensor data but continued making decisions. It had mapped the track, identified the stations, understood the constraint structure, and was executing a coordinated schedule of stops and motion.
Eventually, as the train navigated further around the loop, Claude encountered a situation that exposed the limits of its perception:
IMMEDIATE CONCERNS:
- Distance sensor shows 0 – train is too close to obstacle
- Train needs to slow down to prevent collision
- We just passed a blue station (POS:156) in reverse motion
DISPATCHER COMMANDS:
STOPREASONING:
- Distance sensor shows 0 – train is too close to obstacle/corner ahead
- Emergency stop required for safety
Without perfect information about the physical world, Claude defaulted to caution. It stopped the train rather than risk collision.
The main bottleneck wasn’t the AI or the communication protocol. It was the sensors themselves. The color sensor misfired often or provided inconsistent readings at higher speeds. This revealed something crucial: the quality of an AI system’s decisions is fundamentally limited by the quality of its perception. Garbage in, garbage out or in this case, uncertain sensor data in, somewhat confused but still functional dispatch decisions out.
But the core experiment worked. A Large Language Model, connected through an API, was able to perceive a physical system through sensors and control it. It made real-time decisions about speed, direction, and stopping. It reasoned about track safety. It adapted to changing conditions. This is not a theoretical possibility anymore. It happened in my living room on January 1st, 2026.
Here’s where the toy train experiment stops being cute and starts being unsettling.
We are entering an era where unlimited intelligence is available on tap. Any device with an API and a network connection can now be intelligent. And because large language models have shown they can handle open-ended reasoning about complex systems, the obvious next step is connecting them to everything.
Your smart lock has a camera and a network connection. It could be constantly sending AI video frames, asking: “Should I unlock?” Your refrigerator has sensors tracking temperature, door openings, food inventory. It could query AI: “Is this temperature acceptable? Should I alert the owner? Should I adjust the compressor?” Your thermostat knows your location, your preferences, historical patterns. AI could manage your home’s climate with a sophistication that no hardcoded algorithm could match.
Ring cameras observing your front porch. Intelligent cars navigating traffic. Medical devices in hospitals adjusting dosages and monitoring vitals. The list extends into every corner of connected infrastructure.
The pattern is obvious now: Any device with sensors and actuators, connected to an AI system, becomes autonomous in meaningful ways.
This raises questions that we haven’t fully grappled with. When my LEGO train derailed, the stakes were low. But what happens when an AI system has incomplete information and makes optimization decisions about systems where the real costs are measured in dollars, safety, or human welfare?
My first train system failed partly because I had set “efficiency” as the optimization target without binding it to “physical safety.” I had created an objective function that was internally consistent but disconnected from physical reality. Claude was doing exactly what I asked, optimize for speed while I implicitly assumed it would also respect the physical constraints of the system.
This is a microcosm of a larger problem. When we deploy AI systems to control real infrastructure, we’re making implicit assumptions about what “correct” behavior looks like. We assume the AI will be conservative, that it will respect boundaries we never explicitly programmed, that it will care about things we forgot to mention. It reminds me of the AI creating paperclip problem.
There’s also the question of the human in the loop. I was always present, reading Claude’s output & able to intervene. But this doesn’t scale. You can have one human closely monitoring one LEGO train. You cannot have humans closely monitoring thousands of devices in a smart home ecosystem, or thousands of trains.
We will drift, inevitably, toward systems where AI makes decisions autonomously. Not because we prefer it, but because the alternative, maintaining human oversight over every device & process becomes computationally and organizationally impossible.
The question then becomes not “if” but “how.” How do we build systems where AI can make good decisions without explicit human oversight? How do we encode values and constraints in ways that survive contact with complex, messy reality? How do we maintain meaningful human agency in systems where the decisions are happening faster than humans can perceive them?
My LEGO train experiment is simultaneously trivial and profound. It’s trivial because it’s a toy, there’s no real cost to failure. But it’s profound because it demonstrates, concretely, that the technical barrier to giving AI control over physical systems has essentially vanished. What remains are questions of judgment, design, and governance. These are harder problems, and we’re only beginning to take them seriously.
Better sensors would have helped my system. More sophisticated reasoning about the physical constraints would have helped. But these are incremental improvements. The real challenge is systemic.
We need to think carefully about how we design these systems. Every API that connects a device to an AI system is a potential point of autonomous decision-making. Every optimization target we set is an implicit value judgment. Every moment we decide not to maintain close human oversight is a bet that the system will do the right thing.
This isn’t an argument for rejecting AI control of physical systems. The potential benefits are real, better resource management, more responsive systems, decisions made at superhuman speed when that’s actually valuable. But it’s an argument for recognizing what we’re doing when we do this. We’re not just connecting devices. We’re creating autonomous agents with real-world effects.
My toy train will run on my track for as long as the batteries last or I decide to kill the program. Claude will optimize its speed, adjust its direction, make micro-decisions about how to move through space. It’s delightful, and instructive, and ultimately harmless.
But the train carrying cargo, the factory managing production, the medical device monitoring a patient, these are the same technical pattern with incomparably higher stakes. We’re entering an era of unlimited intelligence on tap, and the implications are still settling in.
The question isn’t whether AI will be controlling physical systems. It already is, in small ways, in my living room and in countless others. The question is how we choose to think about it, design for it, and govern it. That conversation is just beginning.
















