Test and Evaluation Challenges in Artificial Intelligence-Enabled Systems for the Department of the Air Force
(2023)
Chapter:
Front Matter
Test and Evaluation Challenges
in Artificial Intelligence–Enabled
Systems for the Department of
the Air Force
__________
Committee on Testing, Evaluating, and
Assessing Artificial Intelligence-Enabled
Systems Under Operational Conditions for the
Department of the Air Force
Air Force Studies Board
Division on Engineering and Physical Sciences
Consensus Study Report
NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by a contract between the National Academy of Sciences and the Department of the Air Force under award number FA955016D00001 FA865121-F-9323. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-70439-7
International Standard Book Number-10: 0-309-70439-1
Digital Object Identifier:
https://doi.org/10.17226/27092
This publication is available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313;
http://www.nap.edu
.
Printed in the United States of America.
Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2023.
Test and Evaluation Challenges in Artificial Intelligence–Enabled Systems for the Department of the Air Force
. Washington, DC: The National Academies Press.
https://doi.org/10.17226/27092
.
The
National Academy of Sciences
was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The
National Academy of Engineering
was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.
The
National Academy of Medicine
(formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the
National Academies of Sciences, Engineering, and Medicine
to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at
www.nationalacademies.org
.
Consensus Study Reports
published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.
Proceedings
published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
Rapid Expert Consultations
published by the National Academies of Sciences, Engineering, and Medicine are authored by subject-matter experts on narrowly focused topics that can be supported by a body of evidence. The discussions contained in rapid expert consultations are considered those of the authors and do not contain policy recommendations. Rapid expert consultations are reviewed by the institution before release.
For information about other products and activities of the National Academies, please visit
www.nationalacademies.org/about/whatwedo
.
COMMITTEE ON TESTING, EVALUATING, AND ASSESSING ARTIFICIAL INTELLIGENCE-ENABLED SYSTEMS UNDER OPERATIONAL CONDITIONS FOR THE DEPARTMENT OF THE AIR FORCE
MAY CASTERLINE, NVIDIA,
Co-Chair
THOMAS A. LONGSTAFF, Carnegie Mellon University,
Co-Chair
CRAIG R. BAKER, Baker Development Group, LLC
ROBERT A. BOND, Massachusetts Institute of Technology
RAMA CHELLAPPA (NAE), Johns Hopkins University
TREVOR DARRELL, University of California, Berkeley (
until December 2022
)
MELVIN GREER, Intel Corporation
TAMARA G. KOLDA (NAE), Independent Consultant,
MathSci.ai
NANDI O. LESLIE, Raytheon Technologies (
until December 2022
)
ROBIN R. MURPHY, Texas A&M University
DAVID S. ROSENBLUM, George Mason University
JOHN (JACK) N.T. SHANAHAN, United States Air Force (retired)
HUMBERTO SILVA III, Sandia National Laboratories (
until December 2022
)
REBECCA WILLETT, University of Chicago
Staff
RYAN MURPHY, Program Officer
GEORGE COYLE, Senior Program Officer
EVAN ELWELL, Research Associate
CHARLES YI, Research Assistant
MARTA HERNANDEZ, Program Coordinator
AMELIA A. GREEN, Senior Program Assistant (
until July 2022
)
AIR FORCE STUDIES BOARD
ELLEN M. PAWLIKOWSKI (NAE), Independent Consultant,
Chair
CHRISTOPHER P. AZZANO, Booz Allen Hamilton
KEVIN G. BOWCUTT (NAE), Boeing Company
RAMA CHELLAPPA (NAE), Johns Hopkins University
MARK F. COSTELLO, Georgia Institute of Technology
DANIEL A. D
E
LAURENTIS, Purdue University
BONNIE J. DUNBAR (NAE), Texas A&M University
JAMES M. HOLMES, Red 6
DEBORAH L. JAMES, Independent Consultant
CHRISTOPHER T. JONES (NAE), Leadership Compass
EDWARD M. LAWS (NAM), Harvard University
LESTER L. LYLES (NAE), Independent Consultant
VALERIE M. MANNING, Overair
WENDY MASIELLO, Independent Consultant
LAURA J. M
C
GILL (NAE), Sandia National Laboratories
HENDRICK W. RUCK, Edaptive Computing, Inc.
JULIE J.C.H. RYAN, Wyndrose Technical Group
MICHAEL SCHNEIDER, Lawrence Livermore Laboratory
Staff
ELLEN CHOU, Board Director
GEORGE COYLE, Senior Program Officer
RYAN MURPHY, Program Officer
ALEX TEMPLE, Program Officer
MARTA HERNANDEZ, Program Coordinator
EVAN ELWELL, Research Associate
AMELIA A. GREEN, Senior Program Assistant (
until July 2022
)
CHARLES YI, Research Assistant
DONOVAN THOMAS, Financial Business Partner
Reviewers
This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.
We thank the following individuals for their review of this report:
JILL CRISMAN, Digital Safety Research Institute
MICHAEL A. FANTINI, United States Air Force (retired)
SHAUN GLEASON, Oak Ridge National Laboratory
J. MARCUS HICKS, United States Air Force (retired)
MARVIN J. LANGSTON, Independent Consultant
GARRY M
C
GRAW, Berryville Institute of Machine Learning
YEVGENIYA “JANE” PINELIS, Johns Hopkins University Applied Physics Laboratory
AMIR SADOVNIK, Oak Ridge National Laboratory
MICHAEL SCHNEIDER, Lawrence Livermore National Laboratory
ALBERT SCIARRETTA, CNS Technologies, Inc.
JONATHAN SMITH, University of Pennsylvania
REBECCA WINSTON, Winston Strategic Management Consulting
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by STEVE BELLOVIN, Columbia University, and BOB SPROULL, University of Massachusetts Amherst. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring committee and the National Academies.
SUMMARY
1 INTRODUCTION
1.1 A Central Question: How to Achieve Sufficient Confidence in AI-Enabled Systems?
1.2 Study Questions to Be Addressed
1.3 What Do We Mean by “Artificial Intelligence”?
1.4 Current State of the Art of AI
1.5 Current State of the Practice of AI in the DAF
1.6 Algorithmic Warfare Cross-Functional Team (Project Maven) Case Study
2 DEFINITIONS AND PERSPECTIVES
2.1 AI-Enabled Systems
2.2 Role of Data in AI-Enabled Systems
2.3 History of T&E in AI-Enabled Systems
2.4 Human-Machine Teaming
3 TEST AND EVALUATION OF DAF AI-ENABLED SYSTEMS
3.1 Historical Approach to Air Force Test and Evaluation
3.2 AI and DevSecOps/AIOps in the DAF and Commercial Sector
3.3 OSD and DAF T&E Policies for AI-Enabled Systems
3.4 AI T&E in the Commercial Sector
3.5 Contrast of Commercial and DoD Approaches to AI T&E
3.6 Trust, Justified Confidence, AI Assurance, Trustworthiness, and Buy-In
3.7 Risk-Based Approach to AI T&E
4 EVOLUTION OF TEST AND EVALUATION IN FUTURE AI-BASED DAF SYSTEMS
4.1 Introduction
4.2 Appointing a DAF AI T&E Champion
4.3 Establishing AI T&E Requirements
4.4 Culture Change and Workforce Development
4.5 Summary of Implications of Future AI for DAF T&E
4.6 Recommendation Timelines
5 AI TECHNICAL RISKS UNDER OPERATIONAL CONDITIONS
5.1 Introduction
5.2 General Risks of AI-Enabled Systems
5.3 AI Corruption Under Operational Conditions
5.4 Attack Surfaces for AI-Enabled Systems
5.5 Risk of Adversarial Attacks
5.6 Network Security and Zero Trust Implications
5.7 Robust and Secure AI Models
5.8 Research in T&E to Address Adversarial AI
6 EMERGING AI TECHNOLOGIES AND FUTURE T&E IMPLICATIONS
6.1 Trustworthy AI
6.2 Foundation Models
6.3 Informed Machine Learning Models
6.4 AI-Based Data Generators
6.5 AI Gaming for Complex Decision-Making
7 CONCLUDING THOUGHTS
B Public Meeting Agendas
C Committee Member Biographical Information
D Acronyms and Abbreviations
E
Testing, Evaluating, and Assessing Artificial Intelligence–Enabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop—in Brief
At the request of the 96th Test Wing of the U.S. Air Force and Air Force Materiel Command, the National Academies of Sciences, Engineering, and Medicine were asked to convene a committee to conduct a consensus study to examine the Air Force Test Center’s technical capabilities and capacity to conduct rigorous and objective tests, evaluations, and assessments of artificial intelligence (AI)-enabled systems under operational conditions and against realistic threats.
The National Academies of Sciences, Engineering, and Medicine appointed the Committee on Testing, Evaluating, and Assessing Artificial Intelligence-Enabled Systems Under Operational Conditions for the Department of the Air Force to conduct this study, per the Statement of Task found in
Appendix A
and
Box P-1
. The committee held its initial kickoff meeting in April 2022, conducted a data-gathering workshop in June 2022 (a Proceedings of a Workshop—in Brief of which can be found in
Appendix E
), and held further data-gathering sessions throughout 2022 and early 2023, including a site visit to Eglin Air Force Base. Agendas for the data-gathering meetings can be found in
Appendix B
. Biographies of the committee members can be found in
Appendix C
.
Appendix D
contains a list of acronyms and abbreviations used in the report.
BOX P-1
Statement of Task
The National Academies of Sciences, Engineering, and Medicine will establish an ad hoc committee to (1) plan and convene a multi-day workshop and (2) conduct a consensus study to examine the Air Force Test Center’s technical capabilities and capacity to conduct rigorous and objective test, evaluation, and assessments of artificial intelligence (AI)-enabled systems under operational conditions and against realistic threats. Specifically, the committee will:
Evaluate and contrast current testing and assessment methods employed by the Department of the Air Force and in commercial industry.
Consider examples of AI corruption under operational conditions and against malicious cyberattacks.
Recommend promising areas of science and technology that may lead to improved detection and mitigation of AI corruption.
The committee will provide workshop proceedings—in brief and in a report summarizing the results from the consensus study.
The National Academies of Sciences, Engineering, and Medicine
500 Fifth Street, NW | Washington, DC 20001
Terms of Use and Privacy Statement
Test and Evaluation Challenges in Artificial Intelligence-Enabled Systems for the Department of the Air Force
Get This Book
The Department of the Air Force (DAF) is in the early stages of incorporating modern artificial intelligence (AI) technologies into its systems and operations. The integration of AI-enabled capabilities across the DAF will accelerate over the next few years.
At the request of DAF Air and Space Forces, this report examines the Air Force Test Center technical capabilities and capacity to conduct rigorous and objective tests, evaluations, and assessments of AI-enabled systems under operational conditions and against realistic threats. This report explores both the opportunities and challenges inherent in integrating AI at speed and at scale across the DAF.
READ FREE ONLINE
Appendix E: Testing, Evaluating, and Assessing Artificial IntelligenceEnabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop - in Brief
163–180
Welcome to OpenBook!
You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.
Do you want to take a quick tour of the OpenBook's features?
No Thanks
Take a Tour »
Show this book's
table of contents
, where you can jump to any chapter by name.
« Back
Next »
...or use these buttons to go back to the
previous
chapter or skip to the
next
one.
« Back
Next »
Jump up to the
previous
page or down to the
next
one. Also, you can type in a page number and press
Enter
to go directly to that page in the book.
« Back
Next »
Switch between the
Original Pages
, where you can read the report as it appeared in print, and
Text Pages
for the web version, where you can highlight and search the text.
« Back
Next »
To
search
the entire text of this book, type in your search term here and press
Enter
.
« Back
Next »
Share
a link to this book page on your preferred social network or via email.
« Back
Next »
View our
suggested citation
for this chapter.
« Back
Next »
Ready to take your reading offline? Click here to
buy
this book in print or
download
it as a free PDF, if available.
« Back
Next »