Tesla is stepping up its game with the production of its new supercomputer known as Dojo, designed specifically to train the Autopilot system. Currently, Tesla relies on a supercomputer equipped with Nvidia A100 GPUs, composed of 5,760 GPU units spread across 820 nodes. This powerful setup can deliver 1.8 Exa-FLOPS (floating point operations per second). However, the Dojo supercomputer is expected to surpass these specifications easily.
Unlike the current setup, Dojo doesn’t depend on outsourced cores. Instead, it utilizes a self-designed D1 chip, manufactured by TSMC using a 7nm fabrication process. This chip contains more than 300 computing cores, and multiple D1 chips are clustered to form tiles. Each System Tray is made up of six tiles, and one cabinet consists of a pair of System Trays. To create one ExaPOD, ten cabinets are combined.
During Tesla’s AI Day event last year, a detailed presentation on Dojo was provided, shedding light on its architecture. The primary goal of Dojo is to expedite the auto-labeling process, which plays a vital role in training the Autopilot model. By enhancing the speed of this process, Dojo enables the Autopilot system to better recognize real-world objects and understand various scenarios, ultimately leading to more accurate decision-making.
However, even during the early stages of testing, Dojo’s power was so overwhelming that it reportedly overloaded the local power grid. This demonstrates the high potential and impressive capabilities of Tesla’s new supercomputer.