Cerebras AI Models Enables Even Larger Generative AI Functionalities With CS-3 And Qualcomm
- Lily
- Mar 18, 2024
- 3 min read

Cerebras, an innovator in silicon and systems technology, made several key announcements last week: the next generation wafer scale engine AI processor (WSE-3) and server (CS-3), the next Cerebras supercomputer Galaxy Condor 3 (CG-3) based on the CS-3, and a collaboration with Qualcomm to support inference processing.
Continuing the Momentum with Cerebras AI
Cerebras has had an eventful year. Cerebras transitioned from being a systems vendor to being a service provider through a partnership with G42, an AI development holding company based in Abu Dhabi, with plans to build three supercomputing centres in the United States, which were later expanded to nine, based on its AI platforms. It also marked a shift from a specialised technology provider to a competitor in the AI training market. This is significant because most early AI startups had a simple business model: develop some intellectual property (IP) and then sell the company to a larger semiconductor, systems OEM, or hyperscaler for a large sum of money, which is why most of them failed. Few recent semiconductor startups have long-term business plans. Ampere and Cerebras are two companies that have achieved success in the semiconductor industry. Cerebras has significant engineering capabilities that set it apart from the competition. With each new product generation, the company has faced significant engineering challenges. First, the ability to design, manufacture, and operate a single chip the size of a 200mm (12 inch) silicon wafer, known as a "wafer scale engine" or WSE, to train some of the world's largest language models in an efficient and timely manner while maintaining high accuracy. Early sales success stemmed from collaboration with government and commercial entities dealing with large data sets and unique challenges, such as pharmaceutical research. The company now boasts a diverse set of customers from healthcare, energy, and other industry segments, as well as hyperscalers. The second major engineering challenge was scaling the platform across multiple systems to achieve a data centre scale solution. Cerebras launched the CS-2 in 2022. Cerebras, in collaboration with G42, will build its first two supercomputers, the Condor Galaxy 1 (CG-1) and Condor Galaxy 2 (CG-2), in California in 2023. Both achieved four exaFLOPS of AI compute performance with FP16 data precision using only 2.5MW of power, a fraction of a traditional data centre. Cerebras' third generation of solutions builds on the engineering and market momentum that it has established. This starts with the third generation of the wafer scale engine, the WSE-3, which once again breaks the record for the number of transistors in a single chip design. Built on the TSMC 5nm process, the WSE-3 has four trillion transistors, including 900,000 processing cores optimised for sparse linear algebra and 44 GB of on-chip memory. The end result is 125 petaFLOPS (1015, or one thousand million million floating point operations per second) of AI performance. As a result, there is no reasonable comparison to any other semiconductor solution in terms of size or single chip performance. Cerebras does not, however, sell chips; instead, they sell large, complex servers. The new server, known as the CS-3, has a new chassis design. According to the company, the CS-3 provides twice the performance for the same power and price as the previous generation CS-2. Moore's Law is very much alive, according to that standard! Furthermore, up to 2048 CS-3 can be clustered together, a tenfold increase over the CS-2, resulting in 256 exaFLOPS (1018 FLOPS) of AI performance.
A New Level of AI Training.
This absurd level of performance makes it possible to efficiently train ever-increasing Large Language Models (LLMs) for generative AI. This is especially suitable for one trillion and larger LLMs. According to Cerebras, a single CS-3 can train a trillion parameter model while requiring exponentially less time and code, resulting in 10x better FLOPS per dollar and 3.6x better compute performance per watt than some of the current AI training platforms. Tirias Research cannot verify these figures.
Comments