Licensing AI Like Airplanes: Some Initial Thoughts

Can the FAA’s method for plane certification work on AI?

Jun 18, 2024

American commercial aviation has a stellar safety record. Since 2010, only two passengers have died due to accidents in the United States. This incredible consistency is in large part attributable to the process used by the Federal Aviation Administration (FAA) to certify aircraft (and their parts, such as engines and propellers) for development, flight, and production. This process, known as Type Certification (TC), spans initial designing, through prototyping and testing, and follows airplane models through to their retirement.

Today, artificial intelligence (AI) systems are becoming increasingly capable and many experts and policymakers are warning of potential catastrophic risks associated with the technology. Despite these concerns, the only current regulation of AI development in the US involves requirements to report “ongoing or planned activities related to training, developing, or producing dual-use foundation models” involving more than 10²⁶ FLOPs. These preliminary measures could be succeeded by a model development licensing process not unlike the FAA’s TC process.

In aviation and AI, the consequences of failure are severe, meaning the acceptable level of risk is very near zero. Despite this, airplanes and AI systems are not identical from a regulatory standpoint. Here we will cover the TC process in depth, discuss the features of AI that differ from airplanes, and propose a process for Model Certification (MC) that could form the basis of AI licensing in the future.

Type Certification in Aviation

A TC is a design approval for a specific model of airplane1 issued by the FAA to an applicant (typically an aviation firm like Boeing or Airbus) who is seeking to bring a new plane model to the market. The rigorous TC application process has five major phases:

Conceptual Design
Requirements Definition
Compliance Planning
Implementation
Post-Certification Activities

Previous phases must be completed before the applicant and FAA proceed with the succeeding phase. The end of each phase includes a ‘gate’: a step which must be completed before moving into the next phase. For major projects (i.e. new or significantly updated airplane models from large firms), the time from whiteboarding to issuance of a TC can be several years. For example, with the Airbus A380—the world’s largest passenger aircraft—the project was formally announced in 2000, prototype was built in 2005, and TC was approved in December 2006.

Figure 1 shows the FAA’s detailed graphic for the TC process, which will be useful as a reference.

Figure 1: The TC process (FAA Order 8110.4C)

Conceptual Design

Before initiating a TC application, and typically before full progress has been made on aircraft design, experienced firms brief the FAA on their proposed design. The FAA is interested especially in “technical issues and unique or novel features” (FAA Order 8110.4C, 20). Initiating discussions at this phase increases the likelihood of smooth and timely certification of the aircraft. Based on these discussions, the design proposal, potential regulatory challenges, and the FAA’s level of familiarity with the applicant, an FAA “level of project involvement (LOPI) will be determined,” setting the relative priority and resource requirements of the project for regulators (Industry Guide, 3).

In this phase, the applicant is also responsible for submitting to the FAA a certification plan. This plan will serve as the roadmap for interactions between the applicant and FAA throughout the certification process, and includes, among other requirements:

Sketches and schematics of the proposed design;
The airplane’s intended regulatory operating environment;
A description of how compliance will be shown sufficient to determine that all necessary FAA data will be collected;
A description of how continued operational safety requirements will be met after a TC is issued; and
A detailed project schedule (FAA Order 8110.4C, 21-22).

The need to agree upon a project schedule in advance of submitting the TC application ensures that any certification delays are the responsibility of the applicant.

Requirements Definition

The ‘gate’ for entering the Requirements Definition phase is a submitted TC. The actual TC application must contain a three-view drawing of the aircraft; all known relevant basic data such as weight, airspeed, capacity, maximum altitude, etc.; and the certification plan described above. When submitting the TC application, the applicant will also have the option of submitting a Production Certification (PC) form, which is required for legal production of the TC’d aircraft.

The FAA’s Aircraft Certification Service (AIR), which includes “more than 1400 engineers, scientists, inspectors, test pilots, and other experts,” is then responsible for appointing engineers and pilots who verify the adequacy of the submitted type design and related data (FAA AVS - Offices).

Once the TC application is verified, the FAA considers whether this project is “of a certain magnitude” to warrant the appointment of a type certification board (TCB) to oversee the project, composed of the project’s senior oversight officials from the FAA and AIR (FAA Order 8110.4C, 25). It also may request expert participants, such as a domain expert when a design includes a niche technology. A TCB is typically established for projects with major type design changes, generally significant or high-profile projects, or projects involving new type certification (rather than updates to an existing TC).

The major document gate which determines Requirements Definition is known as the Certification Basis. This document established precisely the parts of the TC specifications (from various FAA documents) that must be met and to what extent they must be met in order for a TC to be issued. In many cases, these are only the relevant criteria laid out in Federal Code 14 CFR.

However, two types of requirements can be added to the 14 CFR criteria: Special Conditions and Equivalent Level of Safety (ELOS) findings. A Special Condition for a novel or unusual design feature is issued “if the existing applicable airworthiness standards do not contain adequate or appropriate safety standards for the aircraft… because of novel or unusual design features” (Ibid., 31). On the other hand, an ELOS finding indicates some criteria in 14 CFR has been replaced for the purposes of this TC due to alternative guarantees of safety. On significant projects, there can be many of these changes to the Certification Basis. The Airbus A380 project had 32 Special Conditions and 25 ELOS findings (A380 TCDS, 6-7).

Once the applicant and FAA agree to a Certification Basis, it is mostly set in stone for the duration of the project, barring significant dangerous findings.

Compliance Planning

Compliance Planning guarantees that the FAA and applicant understand each other with regards to the tests required and planned to fulfill the Certification Basis. This phase concludes with a Project Specific Certification Plan (PSCP), from which “the certification team should be able to determine that, if the plan was successfully executed, its results would show compliance (FAA Order 8110.4C, 39-40).

Implementation

By the time an applicant’s project has entered the Implementation phase, at least one airplane of the relevant model has been produced. During this phase, the FAA and applicant conduct the certification project by implementing the PSCP. In Implementation, there are roughly three parts: Compliance Data Generation, Compliance Substantiation, and Compliance Finding.

Before physical tests proceed, an FAA manufacturing inspector verifies the product(s) conforms to submitted drawings and specifications. While tests are performed (which can include flight tests, weather tests, stress tests, etc.), an FAA witness must be present. This includes certification flight tests, which generally require verification of a qualitative target outcome (e.g. successful takeoff and landing) and engineering certification tests, which require verification of quantitative outcomes (e.g. braking time).

It is critical that the FAA does not release proprietary information submitted by the applicant. The FAA has the ability to indefinitely protect such information based on specific exemptions in the Freedom of Information Act Program, FAA Order 1270.1 (Ibid., 48). No later applicant may request access to another applicant’s data.

The Compliance Substantiation responsibility of the applicant involves successfully arguing the compliance of the aircraft on the basis of submitted data:

Compliance reports are the applicant’s way of proving compliance (that is, showing compliance). Adequate compliance reports present appropriate evidence to convince the FAA of the overwhelming likelihood of the claim. The claim is a declaration that the type design meets a particular airworthiness… requirement levied by regulations identified in the certification basis. The substantiation case presents and explains the inter-relationship of the evidence in a logical order leading from the requirement to the claim. (Ibid., 49)

Compliance Finding is the responsibility of the FAA to verify that submitted data and argumentation lead directly to compliance with the Certification Basis. For example, the FAA has the right to question any textbook equations used by the applicant, and FAA test pilots verify all flight test reports with their own flights (Ibid., 50-51).

If the FAA finds successful compliance with the Certification Basis, a final meeting between the applicant and TCB is held to issue Instructions for Continued Airworthiness (ICA), an Aircraft Flight Manual (AFM) which details the operating limitations and procedures of the aircraft, and the Type Certification (Ibid., 56).

Post-Certification Activities

Following the issuance of a TC, a Certification Summary Report (CSR) is generated, which involves an “executive summary containing a high-level description of major issues and their resolution” (Ibid., 57). This document can be used as a resource for foreign aviation administrations to view concerns raised by the FAA when they are certifying the same aircraft.

AIR is responsible for continued airworthiness inspections and requirements for “the preservation of the product’s level of safety as defined at the time of certification… [through] the end of the product life cycle” (Ibid., 58).

Any issued TC “is effective until revoked or suspended,” although it is worth noting that nontrivial updates to aircraft design, software, or production will require another TC process (Ibid., 67).

Artificial Intelligences and Airplanes

I suppose the immediate question is: why should we care about Type Certification?

At face value, it seems like many of the high-level steps in certifying an aircraft map on nicely to the steps in producing a frontier foundation model in AI. Conceptual design is required, where scientists determine which algorithms to use, which data to train on, and how much compute (how many computers and for how long) to use. There is a training phase which culminates in something like a prototype of the foundation model. Post-training optimizations like RLHF and adversarial fine-tuning are made before the model is (typically) red-teamed internally. Finally, a model is deployed to some deployment environment (public, internal, limited internal, etc.) as a system for a lifespan involving interaction with millions of customers.

Despite these similarities, we do not and should not expect the Type Certification process to be the only inspirational model for an AI certification regime. However, I would like to explore what such a regime would look like given (mostly) inspiration from the (very successful) FAA TC process. We will examine the hypothetical Federal Artificial Intelligence Administration’s (FAIA) licensing process: the Model Certification (MC).

Model Certification for AI

Although the FAA certifies both aircraft and their constituent parts, the FAIA MC should likely consider only models, their deployment environments (DEs), and the data centers used to train and serve them. These are analogous to the TC’s airplanes, regulatory environment, and Production Certification, respectively. An MC could in theory also license integrated circuits (ICs), data sources/sets, and specific algorithms used, but this seems to bring us far afield from the core focus of certification while requiring significantly more talent in the FAIA (a likely bottleneck for the organization).

The FAA benefits heavily from the existence of peer organizations in other nations, such as the European Union Aviation Safety Agency (EASA). International coordination of this type is important when dealing with technologies that fly between nations or propagate through the public internet. The EU AI Act and semi-regular AI Safety Summits between nations may encourage simultaneous development of a peer agency to the FAIA in the EU, UK, and elsewhere, but this remains to be seen.

Implementation of an MC is heavily dependent on the ‘innovation tax’ imposed on the American AI industry generally due to regulation. Because the US sees AI as a significant national security issue due to competition with China, policy that undercuts AI innovation in America is unlikely to succeed. Thus, it is incredibly important to ensure the MC does not significantly extend AI development timelines or lead to significant expenditure on the part of the AI development firms. Until robust international agreements can be made on AI, this political reality must be appeased.

Nevertheless, an MC process which takes its inspiration from the TC should have these five phases:

Design and Consultation
Requirements Definition and Compliance Planning
Training
Compliance Implementation
Deployment Monitoring

These map onto the standard development pipeline for a frontier model in order to allow the least intrusive certification process possible.

Before we examine each step in turn, stress should be placed on the fact that the FAIA would be governed by similar guidelines to the FAA with regards to sensitive and proprietary information. Any information about algorithms, compute, data, or anything else that might threaten the national security of the US or competitive integrity of the AI industry will not be shared.

Design and Consultation

Under current development practices, before any AI developer (‘the applicant’) spends millions of dollars on data center time to train a large AI model, it must pursue research and engineering to develop a training plan. It must know what algorithms will be used (e.g. the model’s architecture), the data to be trained on, the optimal duration for training, methods for efficiently training across many ICs, and other details. These details, especially the development of novel algorithms for use in training, require significant time and scientific effort and compose the bulk of an AI firm’s intellectual capital.

As in the TC’s Conceptual Design phase, the applicant should consult early with the FAIA to determine level of project involvement, various production timelines, and especially formulate methods for demonstrating compliance of unique or novel features of the proposed model design.2

In this phase, the applicant should submit a certification plan. This is the roadmap for the full MC process and should include at least:

Descriptions of model architecture;
A description of how compliance will be shown;
The model’s intended deployment environment (e.g. open source, through an API, internal only, etc.);
A description of data and amount of compute to be used;
Which data center(s) or cloud provider(s) will be used for training and inference;
A description of planned or possible post-training improvements, such as RLHF; and
A detailed project schedule.

These documents should be submitted along with the formal MC application at the conclusion of the design phase. The MC must also include estimates by the applicant of model capabilities at various checkpoints throughout training. These estimates must be substantiated by disclosed (to the FAIA) data and relevant arguments, likely based on scaling laws, previous model capabilities, planned compute use, architectural improvements, and other factors. If the FAIA cannot effectively determine these estimates to be reasonable with a wide margin for safety, the applicant should not be allowed to proceed into the Requirements Definition and Compliance Planning phase.

After accepting the estimates of the applicant, the FAIA should also require that a set or threshold of safety requirements be implemented by the applicant prior to training. These measures should be sufficient to protect against the possible risks of a model slightly more capable than the applicant’s estimates. This document will be reviewed and updated during the next phase, but all measures must be implemented before training can begin.

Requirements Definition and Compliance Planning

After submitting the MC application, the applicant will not yet be allowed to begin training. In cases of high-profile, potentially highly-capable, or generally novel model projects, a Model Certification Board (MCB) should be established to oversee the general project. The MCB should be composed of in-house experts, external experts disinterested in the economic outcomes of the applicant, and management staff on the MC project. For models projected to exceed state-of-the-art performance in impactful categories (such as general intelligence or CBRN applications), national security officials should be considered for appointment.

This phase has three major objectives: first, to create a Certification Basis (CB) document which establishes the specific criteria along which model compliance will be evaluated; second, to create a Project Specific Certification Plan (PSCP) in which the applicant describes its planned data gathering and demonstration procedures; and third, for the FAIA to issue a Training Authorization (TA) document, which authorizes the applicant to begin training the relevant model in a Production Certified data center.3 The TA should also contain safety measures the applicant is required to take during training, such as hierarchical evaluations and checkpoint reporting. A TA should only be issued after scrutiny of the submitted MC certification plan.

Training

With TA in hand, the applicant may begin training the model in the specified PC’d data center. Although projected capabilities will have been outlined in the MC, these projections may not be accurate. In order to mitigate risks from deviations in capability progression, models should be evaluated at various checkpoints during training. In the case of deviations, training should pause until new projections can be made, verified, and allowed under an updated TA in consultation with the FAIA.

The Training phase includes all changes to the model before its final specifications, detailed in the MC. For foundation models, this will generally include pretraining, which grants the model a general basis of knowledge, and fine-tuning, which directs that knowledge towards a desired task (e.g. being a helpful assistant).

At the conclusion of the planned training period, the model will become the compliance prototype and should not be deployed for any other purpose. In addition, further training will not be allowed unless recommended by the FAIA.

Compliance Implementation

During this phase, the applicant and FAIA conduct the certification project by implementing the PSCP. Pursuant to the specifications of the PSCP, the applicant will be required to pursue data generation and argue for compliance with the Certification Basis based on this data, after which the FAIA will attempt to find compliance by verifying these data and arguments.

Specific strategies for showing and verifying compliance with explicit safety standards will likely include model capability evaluations, adversarial testing (red teaming), and expert input from the MCB. Demonstrating the safety of highly capable AI systems is a difficult problem, and an expanded list and detailed methods for doing so will be pursued in future work.

The FAIA must ensure the confidentiality of all proprietary or national security-relevant information surfaced during the entire MC process.

As in the case of compliance with FAA requirements, successful substantiation of the CB should require “appropriate evidence to convince the [FAIA] of the overwhelming likelihood” of each claim in the CB (FAA Order 8110.4C, 49).

The FAIA should further verify compliance of the model with the CB by finding validity of all data used in substantiations. In addition, the FAIA should verify all quantitative and qualitative capability reports by performing its own evaluations.

During its initial meetings, the MCB will have determined the experience and trustworthiness levels of the applicant. Conformity may need to be checked with rigor according to these determinations. In aviation, safety inspectors in the FAA and firms have similar incentives. Passengers are likely to avoid dangerous or supposedly dangerous airlines. Hence airlines will not purchase planes with a poor record of safety. Thus, aviation firms suffer economically if accidents occur. However, these economic effects are much less clear in the case of AI. Because the risks associated with AI models do not necessarily fall on the direct consumer, economic success in frontier AI may not require a similar reputation of safety. Thus, the FAIA should exercise more caution when deciding whether to accept reported data and substantiations than the FAA.

During the verification process, the FAIA may have insufficient evidence to find compliance with specific requirements in the CB. If possible, the FAIA should in this case recommend further changes to the model which might increase its safety. Under these circumstances, further training can be authorized, but would be the responsibility of the applicant.

Supposing all requirements in the CB are substantiated and found (verified), the FAIA should issue the MC to the applicant. This must be accompanied by a Model Deployment Card (MDC), which details the circumstances under which the model can be legally used, and Instructions for Continued Safety (ICS), which describe the measures to be taken to ensure the safety of the model throughout its lifecycle.

Deployment Monitoring

While the model is in deployment, ICS measures should be pursued, including periodic capability evaluations and inspections of the implemented safety practices of the MC recipient. In most cases, post-deployment capability improvement should not be permitted. However, if the ICS permits this, evaluations should be more frequent and significant changes in model behavior or capability should warrant further investigation by the FAIA, potentially requiring an updated MC.

The FAIA should proactively work to ensure the model is not being used in illegal deployment contexts, which includes any environment not specified in the MDC.

Conclusion

The regulatory practices of the aviation industry, particularly the FAA’s Type Certification (TC) for airplanes, might serve as a useful inspiration for future regulation of highly-capable AI. The success of such a policy is likely contingent on the talent constraints of a hypothetical Federal AI Administration, the effectiveness of future methods for determining and demonstrating the safety of AI models, and the extent to which time and expense intrusions can be minimized so as to allow for continued American innovation in AI. Future work should tackle these questions.

To simplify discussion, we will ignore other products certified under FAA TCs, including engines, propellers, airships, gliders, etc.

The hypothetical FAIA is tasked with certifying models rather than systems. Models are simply statistical models of some data, whereas systems also include any scaffolding or complex engineering design that can be used to improve the usefulness of a model. An ideal regulatory agency would have to reckon with both, but for the purposes of this piece it seems prudent to focus only on models.

We propose that data centers be Production Certified through a separate process, potentially under the FAIA, according to their location, security, reliability, scale, and other factors. Production Certifications could be tier ranked according to these factors, and the FAIA’s Training Authorization could only allow for training at or above a specific tier data center. For example, models with projected capabilities relevant to national security could be authorized only for training in data centers with hardened security.

Ameliorology

Discussion about this post