How Tesla’s FSD Builds a 3D World from Pixels (Part 3)

March 29, 2026

By Karan Singh

In the previous instalments of our “How FSD Works” series, we explored the foundational elements of Tesla’s approach to autonomous driving and how FSD understands its environment.

Now, we’re diving a bit deeper into how FSD actually transforms the flat, 2D images from its cameras into a rich, dynamic 3D world it operates in. Two recent Tesla patent applications shed some light on this, covering Vision-Based Occupancy Determination and Vision-Based Surface Determination. Together, they reveal how FSD aims to perceive everything from other vehicles to the very texture of the road surface using only vision.

We’d recommend reading some of our other series to get a good background before you dive into this technical piece.

Creating a 3D World from 2D Images

The fundamental challenge for any system that operates with vision is to accurately perceive a three-dimensional world from two-dimensional camera images. Unlike systems that rely on LiDAR to directly measure distance and depth, Tesla’s vision-only approach requires FSD to infer depth, shape, motion, and context from pixel patterns, lighting, and the relative motion of objects across multiple camera views. Putting all that together is the key to building the 3D environment that FSD lives in.

Part 1: Vision-Based Occupancy Determination

Tesla’s patent, “Artificial Intelligence Modeling Techniques for Vision-Based Occupancy Determination,” details how FSD identifies which objects are present in the car’s surroundings and the space they occupy.

While Tesla previously used bounding boxes to outline the rough shapes objects occupied, this has now evolved significantly in newer iterations of FSD. This goes far beyond drawing simple 2D boxes around objects; it’s about creating a true volumetric understanding.

FSD Pipeline: Pixels to Objects

The patent outlines a sophisticated AI pipeline to achieve this:

Image Input: The entire process begins with raw image data from your vehicle’s cameras, capturing different viewpoints around the car at a particular point in time.
Image Featurization: Raw pixel data by itself isn’t very useful. Tesla uses specialized neural networks, which it calls Featurizers, to process these images and extract relevant visual details. This can include patterns, textures, edges – anything that can help understand the scene.
Spatial Transformation: The critical step is here – FSD uses a transformer model, a type of neural network that is particularly good at understanding context and relationships. This transformer takes the 2D features from all the camera views, and using what Tesla calls a “spatial attention mechanism”, projects and fuses them into a unified 3D representation of the environment. This, Tesla internally refers to as the vector space that FSD’s path planner operates inside.
Temporal Alignment: The world isn’t static, so the system then fuses those 3D representations from consecutive points in time. This makes the features spatial-temporal, meaning that FSD doesn’t just capture “how” things are in an instant, but how they are moving in time.
Deconvolution: After processing, FSD uses deconvolution, a mathematical operation ot transform the fused data back into distinct predictions for each voxel* in the 3D grid.

*Think of a voxel as a 3D pixel – a tiny volumetric cube that represents a single point in the 3D space around the car. FSD divides the entire environment into a dense grid of these voxels.

With the decovoluted data, FSD then predicts several key outputs, including occupancy – is this voxel occupied, or is it free space? If it is occupied, what is the velocity vector? Then, using inferred data from earlier, FSD can put together multiple voxels to build a more detailed understanding of the space around it. That helps FSD know what a voxel is occupied by – whether that be a static or moving object, like a building or another vehicle.

The Output

All of this information is then compiled into an occupancy map. This means that FSD’s planning system, which is distinct from its Occupancy system detailed above, can ask specific questions about the environment to determine its next moves. This could be determining whether a space is clear, whether an object in a given voxel is moving, what it is, and whether it is relevant. The path planner then takes the answers from this 3D model and uses those to make its decisions.

In simpler terms, this means that FSD builds a live, constantly updating 3D video game version of the world around it, where every important element has properties such as location, motion, and type. This detailed digital replica of reality is what the FSD planning module uses to make its driving decisions, moment by moment.

Part 2: Vision-Based Surface Determination

Knowing where objects are is only half the battle. An autonomous vehicle also needs an incredibly detailed understanding of the surface it’s driving on and the terrain around it. Tesla’s patent WO2024237939A2, “Artificial Intelligence Modeling Techniques for Vision-Based Surface Determination,” addresses exactly this.

Surface Understanding

FSD needs to know more than just “there’s a road here.” It needs to understand the road’s geometry (is it flat, uphill, banked?), its material (asphalt, dirt, gravel?), the location of curbs, lane markings, speed bumps, potholes, and whether a surface is actually navigable. This patent details how Tesla aims to achieve this nuanced understanding using only camera inputs, a crucial step towards navigating complex environments without relying on pre-existing, high-definition (HD) maps.

Yes, for the eagle-eyed observers, this means that FSD does indeed attempt to look for and search for potholes – but whether those are taken into account by the path planner is still up in the air. Back in 2020, Elon confirmed this was the case.

Yes! We’re labeling bumps & potholes, so the car can slow down or steer around them when safe to do so.

— Elon Musk (@elonmusk) August 14, 2020

Tesla also said it was working on adjusting air and adaptive suspensions based on a road roughness map. These roughness maps could be generated with a combination of Vision-Based Surface Determination, alongside new tech like the smart tire tread sensors that Tesla is now equipping certain flagship vehicles with.

Predicting Surface Attributes from Vision

Similar to part 1, this half of the system uses another AI model to analyze camera imagery, but instead of predicting object occupancy, it tries to determine and predict a range of surface attributes for the space around the vehicle.

These attributes include determining elevation, whether a surface is navigable and safe to drive on, what material the surface is made of, and finally, the features. Those features include items like lane lines, markings, curbs, speed bumps, potholes, hill crests, and banked or flat curves.

Building the 3D Surface Mesh

Taking all that critical information and putting it together allows FSD to build a 3D mesh that represents the environment around it, all taken from 2D images. This mesh is an array of points, where each point has X, Y, and Z coordinates and is tagged with the attributes from above. This helps to build that overall 3D environment that FSD navigates inside of.

The 3D surface mesh is overlaid onto the voxels when everything is put together.

Training for Surface Recognition

The training process for this AI is detailed in the patent, and it is considerably sophisticated. Tesla pulls information from sensors like LIDAR that it uses during its testing and data generation, or through techniques like photogrammetry. This data is then correlated with camera images from real vehicles, which helps to train the system on distance and surfaces.

Occupancy + Surfaces = Unified World Model

Crucially, these two patented systems – occupancy determination and surface determination – are designed to work in concert. The surface determination patent explicitly describes how its methods can be combined with the occupancy detection paradigm.

Imagine a scenario where the occupancy system detects an object (such as a traffic cone). The surface determination system simultaneously understands that the road ahead is a steep hill. By combining these, FSD can accurately place the traffic cone on the hill’s slope within its 3D world model.

This simple split between detecting objects and understanding the surfaces they are on enables much more robust and accurate perception, especially in complex environments with varying elevations. It can even effectively expand the vertical range in which objects can be accurately identified and positioned.

This means FSD doesn’t just see a collection of objects and a flat plane; it builds a truly three-dimensional and semantically rich understanding of what is around the car and the attributes of the environment it’s all situated in.

What Fuels FSD’s Decisions

This incredibly detailed, real-time 3D world model, built from occupancy and surface data, serves as the fundamental input for all subsequent stages of the FSD stack: prediction (what will other road users do?), path planning (what’s the safest and most efficient route?), and control (how to execute that path smoothly).

You can read about just how Tesla addresses that in Part 1 of the series.

While the journey to full self-driving is ongoing and filled with immense challenges, these methods for occupancy and surface determination represent critical building blocks. They are essential for creating an AI that doesn’t just detect patterns but also truly perceives and comprehends the dynamic, complex environments our vehicles navigate every day.

As Tesla continues to refine these systems, the detail and accuracy of FSD’s world model will only continue to grow, paving the way for increasingly capable autonomous systems.

March 29, 2026

By Nehal Malik

The Tesla Cybertruck is famous for its stainless steel exoskeleton and futuristic looks, but the real magic is actually happening underneath the metal. One of the most talked-about features since its launch is the advanced four-wheel steering system. This technology allows a massive, heavy-duty pickup truck to move with the agility of a much smaller sedan, completely changing the game for how large vehicles handle tight spaces.

In action pic.twitter.com/x007hRtcDz

— Cybertruck (@cybertruck) December 5, 2023

Tesla previously shared a detailed look at how this system works through its official @cybertruck account, providing visualizations of the wheels in motion. By combining four-wheel steering with a revolutionary steer-by-wire system, Tesla has solved the biggest headache of truck ownership: maneuvering a giant vehicle in a crowded world.

Low-Speed Agility and the End of 10-Point Turns

If you’ve ever tried to park a full-sized F-150 or Silverado in a standard parking lot, you know the struggle. The Cybertruck avoids this entirely by allowing the rear wheels to rotate up to 10 degrees. According to Tesla, “Four-Wheel Steering gives Cybertruck a tighter turning radius than a Model S.”

At low speeds, the rear wheels rotate in the opposite direction to the front wheels. This essentially “shortens” the wheelbase of the truck, enabling much tighter maneuvering around a job site or inside a parking garage.

Four-Wheel Steering gives Cybertruck a tighter turning radius than a Model S

At low speeds, rear wheels rotate opposite to the front wheels—enabling tighter maneuvering around the jobsite, parking lots, etc pic.twitter.com/nxDiRTZKEI

— Cybertruck (@cybertruck) December 5, 2023

This is also what allows the truck to perform its famous crab walk, moving diagonally to escape tricky off-road situations. Because the truck uses steer-by-wire, there is no physical steering column. Instead, software-driven actuators move the wheels based on your input, making the steering feel incredibly fast and responsive without needing to spin the wheel hand-over-hand.

High-Speed Stability and the “Glide” Factor

The system doesn’t just help with parking and low-speed maneuverability; it makes the highway experience much safer, too. When you are traveling at high speeds, the front and rear wheels turn in sync. Tesla explains that this allows the truck to “glide between lanes,” which increases stability and reduces the body roll that typically makes passengers feel a bit seasick in tall SUVs.

At high speeds, front and rear wheels turn in sync—allowing you to glide between lanes

This translates to increased stability and reduces body roll that typically causes passenger discomfort pic.twitter.com/iQsJATQGmJ

— Cybertruck (@cybertruck) December 5, 2023

By turning all four wheels in the same direction during a lane change, the Cybertruck moves laterally rather than pivoting. This makes the truck feel more planted and reduces the “top-heavy” sensation often associated with lifted pickups. This level of control is integrated into Tesla’s new 48-volt electrical architecture, which uses thinner wires and higher voltage to power these heavy-duty steering motors more efficiently.

A New Foundation for Future Teslas

The Cybertruck is effectively a rolling laboratory for Tesla’s next generation of tech. Features like steer-by-wire (and potentially four-wheel steering) are expected to make their way into future models, including the rumored “CyberSUV” and eventually even mass-market cars like the Model 3.

While traditional car enthusiasts initially worried that removing the physical connection to the wheels would ruin “road feel,” Tesla has used software to simulate feedback through the steering wheel. This allows the car to filter out vibrations from potholes while still giving the driver a sense of the road. When you pair this with other Cybertruck-firsts like Powershare (bi-directional charging) and the Etherloop communication system, it’s clear that the Cybertruck is less of a niche product and more of a preview for the entire industry.

As these systems become more common, we might finally see the end of the bulky, difficult-to-drive utility vehicle. If a truck this size can turn tighter than a luxury sedan, the future of the automotive industry looks a lot more agile.

March 28, 2026

By Nehal Malik

Tesla is making its autonomous ride-hailing service a bit more family-friendly. In a recent update to its Rider Terms of Service, the company has officially lowered the minimum age for passengers using its Robotaxi network, signaling that it is getting more comfortable with its software’s performance across diverse groups of riders.

According to information shared by Tesla watcher @sawyermerritt, the minimum rider age has been slashed from 13 down to just 8 years old. While this opens the door for younger children to experience the future of transport, Tesla isn’t letting them ride solo just yet. “Riders aged 8-17 must be accompanied by a parent, legal guardian, or other authorized adult for the duration of the ride,” the new policy states.

Expanding the Network

This policy shift comes at a time when Tesla is aggressively scaling its ride-hailing ambitions. During the Q4 2025 earnings call, Tesla laid out a massive roadmap for the first half of 2026. While the service is currently operational with a safety driver in the SF Bay Area and is ramping up unsupervised rides in Austin, several new cities are next on the list.

The company is preparing to launch the service in Dallas, Houston, Phoenix, Miami, Orlando, Tampa, and Las Vegas within the next few months. Las Vegas and Dallas appear to be next in line, with Model Y Robotaxi fleets recently being spotted in the cities. Meanwhile, Arizona has also granted statewide approval for the service, making it a key battleground for Tesla as it takes on robotaxi competition like Waymo.

Preparing for the Cybercab

While current rides are mostly happening in Model 3 and Model Y vehicles, the real “gear shift” for this network will be the arrival of the Cybercab. Designed specifically for autonomy with no steering wheel or pedals, the Cybercab is expected to drastically lower operating costs once it enters mass production next month.

As Tesla pushes to expand its Robotaxi network, the company has also been hiking its prices to make the service commercially viable. Earlier this month, the company raised per-mile costs by 40% and tripled the base fare from $1 per ride to $3. This suggests that even as Tesla lowers the age barrier for riders, it is confident enough in the demand for its “unsupervised” experience to charge a premium for it.

By updating its terms now, Tesla is ensuring that the legal groundwork is ready for a summer where families in multiple states can finally call a Tesla to their door. As the network transitions from a “pilot program” into a legitimate transportation alternative, these small policy tweaks are what will eventually make robotaxis feel like a normal part of everyday life.

Source link