How Does The US Decide Which AI Chips to Control?
How does the US Bureau of Industry and Security decide which chips enable China to create “advanced AI with military applications?”
On October 7, 2022, the BIS released a significant new export control policy in order to restrict China’s “ability to both purchase and manufacture certain high-end chips used in military applications.” Just over a year later, after public scrutiny found a number of loopholes in the controls, BIS released an updated (and still current) version of the controls, especially targeting datacenter chips for AI. These updated controls are strict enough that NVIDIA immediately reported that all of its major datacenter GPUs—the H100, A100, H800, and A800 (the latter two of which were specifically designed to undercut the previous controls for sale to the Chinese market)—fall under the controls, fully severing its connection to the hungry Chinese market.
These controls are worth understanding in detail for a number of reasons. For one, the US government is asking a huge question of China: Can you indigenize the semiconductor industry? Given China purchases over 50 percent of all chips globally, most produced overseas, analysts answered in the negative immediately after the controls were announced. However, there are some indications that this is still an open question. For the best overview of the controls and their implications, read this report by Gregory Allen.
I want to focus here instead on a very specific subset of the controls: the classification benchmarks. I have found it is much more difficult than it ought to be to find out exactly what criteria BIS uses to decide whether a specific chip is controlled under its current policy. In fact, this information is spread across thousands of pages of government filings and random public internal documentation. I will do my best to bring them into the light.
The Classification Process
To understand how BIS classifies chips, we first have to understand how advanced semiconductors optimized for AI tasks work. In short, these chips, typically called Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs), perform a massive number of mathematical calculations, typically using “floating point numbers” (numbers with fixed decimal precision). Because these chips are optimized to work most effectively on the repetitive calculations of AI training and inference (e.g. many consecutive matrix multiplications), they can achieve a very high number of Floating Point Operations per Second (FLOPS). This is the most important metric for AI chip performance, the equivalent of a racecar’s top speed.
Floating point numbers with fewer numbers after the decimal can be multiplied and added more quickly, so FLOPS values reported are typically higher for lower precision types. For example, NVIDIA reports the following FLOPS specifications for its current best GPU, the H100:
Probably the most widely reported metric is TF32, which we can interpret as the performance of the chip on 32-digit floating point numbers when used for AI. In this case, the H100 can do this at a mind-boggling 989 trillion FLOPS.
Luckily for us, this is effectively the only metric BIS cares about when deciding whether to control a chip or not. But not exactly in a simple way. Oh, and they consider the surface area of the chip—I’ll take it on faith you know what that is.
Two Important Metrics
Suppose you’re the BIS staffer tasked with evaluating a newly-announced AI chip, let’s call it the MAGMA1.1 Your job is simple: determine whether or not the chip can be legally exported to China. In government language, your job is to determine whether MAGMA1 falls under the ECCN (Export Control Classification Number) 3A090 classification.
You must calculate two metrics: the Total Processing Performance (TPP) of the chip, and its Performance Density. To calculate TPP, we check MAGMA’s website or marketing material for reports of theoretical maximum TeraFLOPS (TOPS) (as in the NVIDIA figure above). For each floating point category, multiply the number of bits in the operation by the TOPS at that precision. The maximum of these values is the TPP.
Suppose MAGMA reports two performance metrics for the MAGMA1: 400 TOPS at tensor float 6 and 210 TOPS at tensor float 16. Then the TPP of the MAGMA1 is the max of 3200 and 3360, which is 3360.
To calculate the chip’s Performance Density, we divide its TPP by its “applicable die area,” measured in millimeters squared (this includes only logic die area (p. 23)). It looks like the MAGMA1 has 100 mm2 of applicable die area for logic, so its Performance Density is 336.
A brief aside: Why does BIS bother with the Performance Density metric at all? The Performance Density metric, added in the 2023 update to the export controls, is designed to preempt a particularly sneaky way that chip designers might attempt to bypass controls. In particular, one could imagine many very small chips (chiplets) that distribute the performance of a single very powerful package and could technically be classified as separate chips. In this way, one might achieve the same performance of a very powerful datacenter chip without ever having a large number of TOPS on what BIS is told is a single chip. However, because of the small size of these chips, they would have very high Performance Density.
Now that you have both of the important metrics in hand, ask yourself: Does this chip seem to be designed for use in data centers or supercomputing clusters? If yes, the relevant check ECCN 3A090.a and 3A090.b; otherwise, check only ECCN 3A090.a.
First, the non-datacenter chips (ECCN 3A090.a). The BIS rule states the following:
The revised 3A090.a control parameter will control ICs [integrated circuits, i.e. chips] with one or more digital processing units having either: (1) a ‘total processing performance’ of 4800 or more, or (2) a ‘total processing performance’ of 1600 or more and ‘performance density’ of 5.92 or more. (Interim Final Rule BIS-2022-0025, p.61)
The MAGMA1 has a TPP of 3360 and a Performance Density of 3.36. This means that, under ECCN 3A090.a, it is not controlled. However, MAGMA1 was designed as a (lower-end) data center chip for training AI models. Thus, we also need to check the requirements of ECCN 3A090.b. That rule states that BIS
will control ICs with one or more digital processing units having either: (1) a ‘total processing performance’ of 2400 or more and less than 4800 and a ‘performance density’ of 1.6 or more and less than 5.92, or (2) a ‘total processing performance’ of 1600 or more and a ‘performance density’ of 3.2 or more and less than 5.92 (Interim Final Rule BIS-2022-0025, p.61)
The MAGMA1’s TPP of 3360 and Performance Density of 3.36 mean it fulfills the first (and second, but it need only fulfill one) criterion, which places it under 3A090.b and prevents it from being exported to almost any Chinese entity.
To put the power of this chip in perspective, a rough calculation says one could use a thousand MAGMA1s in data center to train GPT-3 in approximately 48 days.2 The same task, with one thousand A100s, can be achieved in approximately 34 days. That’s not a weak chip, so it makes sense that it is controlled as a data center chip. However, a somewhat weaker chip would evade export controls, for example a theoretical GPU that could train GPT-3 in approximately 100 days (at one thousand units).
So What?
I should mention that there do exist controls from this order for non-logic chips, especially DRAM and NAND memory chips. (Memory can be an important driver of AI chip performance, e.g. see HBM.) These controls, however, are targeted mostly at the manufacturing equipment used to produce leading-edge (advanced) memory chips, which falls under ECCN 3B001 and 3B002. This equipment is produced almost entirely outside of China, in US-allied nations.
Yet, Chinese companies have found ways to skirt these controls as well. For example, BIS explicitly bans export of any manufacturing equipment which can be used to create NAND flash memory with 128 layers or more. So, Chinese memory producer YMTC has developed a unit that stacks two 116 layer memory arrays on top of each other in a 3D packaging technique they call Xtacking, essentially creating an ‘illegal’ 232-layer NAND chip without any controlled equipment.
The transparency of the export control classification system presents an interesting landscape for chip development firms. Many expect to see clever packaging and chip designs which undercut the thresholds of these controls while delivering a marketable product to the Chinese market, despite Secretary of Commerce Gina Raimondo’s warning that if a company designs a chip in this way, BIS will “control it the very next day.”
The US is not known to take its national security interest lightly. Watch for changes to these thresholds or the relevant metrics over the coming years as the fog of war around China’s nascent leading-node semiconductor industry clears.
MAGMA: Meta, Amazon, Google, Microsoft, Apple
This paper trains in-datacenter on a GPT-3-size model at 140 TOPS with 1024 A100s and can complete training in 34 days. The MAGMA1 has about 100 TOPS at the same use ratio, which corresponds to 34 * (14/10) = 47.6 days.