添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Test and Evaluation Challenges in Artificial Intelligence-Enabled Systems for the Department of the Air Force (2023)

Chapter: Front Matter

Test and Evaluation Challenges
in Artificial Intelligence–Enabled
Systems for the Department of
the Air Force

__________

Committee on Testing, Evaluating, and
Assessing Artificial Intelligence-Enabled
Systems Under Operational Conditions for the
Department of the Air Force

Air Force Studies Board

Division on Engineering and Physical Sciences

Consensus Study Report

NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001

This activity was supported by a contract between the National Academy of Sciences and the Department of the Air Force under award number FA955016D00001 FA865121-F-9323. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.

International Standard Book Number-13: 978-0-309-70439-7
International Standard Book Number-10: 0-309-70439-1
Digital Object Identifier: https://doi.org/10.17226/27092

This publication is available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu .

Printed in the United States of America.

Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2023. Test and Evaluation Challenges in Artificial Intelligence–Enabled Systems for the Department of the Air Force . Washington, DC: The National Academies Press. https://doi.org/10.17226/27092 .

The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.

The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.

The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.

The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.

Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org .

Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.

Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.

Rapid Expert Consultations published by the National Academies of Sciences, Engineering, and Medicine are authored by subject-matter experts on narrowly focused topics that can be supported by a body of evidence. The discussions contained in rapid expert consultations are considered those of the authors and do not contain policy recommendations. Rapid expert consultations are reviewed by the institution before release.

For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo .

COMMITTEE ON TESTING, EVALUATING, AND ASSESSING ARTIFICIAL INTELLIGENCE-ENABLED SYSTEMS UNDER OPERATIONAL CONDITIONS FOR THE DEPARTMENT OF THE AIR FORCE

MAY CASTERLINE, NVIDIA, Co-Chair

THOMAS A. LONGSTAFF, Carnegie Mellon University, Co-Chair

CRAIG R. BAKER, Baker Development Group, LLC

ROBERT A. BOND, Massachusetts Institute of Technology

RAMA CHELLAPPA (NAE), Johns Hopkins University

TREVOR DARRELL, University of California, Berkeley ( until December 2022 )

MELVIN GREER, Intel Corporation

TAMARA G. KOLDA (NAE), Independent Consultant, MathSci.ai

NANDI O. LESLIE, Raytheon Technologies ( until December 2022 )

ROBIN R. MURPHY, Texas A&M University

DAVID S. ROSENBLUM, George Mason University

JOHN (JACK) N.T. SHANAHAN, United States Air Force (retired)

HUMBERTO SILVA III, Sandia National Laboratories ( until December 2022 )

REBECCA WILLETT, University of Chicago

Staff

RYAN MURPHY, Program Officer

GEORGE COYLE, Senior Program Officer

EVAN ELWELL, Research Associate

CHARLES YI, Research Assistant

MARTA HERNANDEZ, Program Coordinator

AMELIA A. GREEN, Senior Program Assistant ( until July 2022 )

AIR FORCE STUDIES BOARD

ELLEN M. PAWLIKOWSKI (NAE), Independent Consultant, Chair

CHRISTOPHER P. AZZANO, Booz Allen Hamilton

KEVIN G. BOWCUTT (NAE), Boeing Company

RAMA CHELLAPPA (NAE), Johns Hopkins University

MARK F. COSTELLO, Georgia Institute of Technology

DANIEL A. D E LAURENTIS, Purdue University

BONNIE J. DUNBAR (NAE), Texas A&M University

JAMES M. HOLMES, Red 6

DEBORAH L. JAMES, Independent Consultant

CHRISTOPHER T. JONES (NAE), Leadership Compass

EDWARD M. LAWS (NAM), Harvard University

LESTER L. LYLES (NAE), Independent Consultant

VALERIE M. MANNING, Overair

WENDY MASIELLO, Independent Consultant

LAURA J. M C GILL (NAE), Sandia National Laboratories

HENDRICK W. RUCK, Edaptive Computing, Inc.

JULIE J.C.H. RYAN, Wyndrose Technical Group

MICHAEL SCHNEIDER, Lawrence Livermore Laboratory

Staff

ELLEN CHOU, Board Director

GEORGE COYLE, Senior Program Officer

RYAN MURPHY, Program Officer

ALEX TEMPLE, Program Officer

MARTA HERNANDEZ, Program Coordinator

EVAN ELWELL, Research Associate

AMELIA A. GREEN, Senior Program Assistant ( until July 2022 )

CHARLES YI, Research Assistant

DONOVAN THOMAS, Financial Business Partner

Reviewers

This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.

We thank the following individuals for their review of this report:

JILL CRISMAN, Digital Safety Research Institute

MICHAEL A. FANTINI, United States Air Force (retired)

SHAUN GLEASON, Oak Ridge National Laboratory

J. MARCUS HICKS, United States Air Force (retired)

MARVIN J. LANGSTON, Independent Consultant

GARRY M C GRAW, Berryville Institute of Machine Learning

YEVGENIYA “JANE” PINELIS, Johns Hopkins University Applied Physics Laboratory

AMIR SADOVNIK, Oak Ridge National Laboratory

MICHAEL SCHNEIDER, Lawrence Livermore National Laboratory

ALBERT SCIARRETTA, CNS Technologies, Inc.

JONATHAN SMITH, University of Pennsylvania

REBECCA WINSTON, Winston Strategic Management Consulting

Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by STEVE BELLOVIN, Columbia University, and BOB SPROULL, University of Massachusetts Amherst. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring committee and the National Academies.

SUMMARY

1 INTRODUCTION

1.1 A Central Question: How to Achieve Sufficient Confidence in AI-Enabled Systems?

1.2 Study Questions to Be Addressed

1.3 What Do We Mean by “Artificial Intelligence”?

1.4 Current State of the Art of AI

1.5 Current State of the Practice of AI in the DAF

1.6 Algorithmic Warfare Cross-Functional Team (Project Maven) Case Study

2 DEFINITIONS AND PERSPECTIVES

2.1 AI-Enabled Systems

2.2 Role of Data in AI-Enabled Systems

2.3 History of T&E in AI-Enabled Systems

2.4 Human-Machine Teaming

3 TEST AND EVALUATION OF DAF AI-ENABLED SYSTEMS

3.1 Historical Approach to Air Force Test and Evaluation

3.2 AI and DevSecOps/AIOps in the DAF and Commercial Sector

3.3 OSD and DAF T&E Policies for AI-Enabled Systems

3.4 AI T&E in the Commercial Sector

3.5 Contrast of Commercial and DoD Approaches to AI T&E

3.6 Trust, Justified Confidence, AI Assurance, Trustworthiness, and Buy-In

3.7 Risk-Based Approach to AI T&E

4 EVOLUTION OF TEST AND EVALUATION IN FUTURE AI-BASED DAF SYSTEMS

4.1 Introduction

4.2 Appointing a DAF AI T&E Champion

4.3 Establishing AI T&E Requirements

4.4 Culture Change and Workforce Development

4.5 Summary of Implications of Future AI for DAF T&E

4.6 Recommendation Timelines

5 AI TECHNICAL RISKS UNDER OPERATIONAL CONDITIONS

5.1 Introduction

5.2 General Risks of AI-Enabled Systems

5.3 AI Corruption Under Operational Conditions

5.4 Attack Surfaces for AI-Enabled Systems

5.5 Risk of Adversarial Attacks

5.6 Network Security and Zero Trust Implications

5.7 Robust and Secure AI Models

5.8 Research in T&E to Address Adversarial AI

6 EMERGING AI TECHNOLOGIES AND FUTURE T&E IMPLICATIONS

6.1 Trustworthy AI

6.2 Foundation Models

6.3 Informed Machine Learning Models

6.4 AI-Based Data Generators

6.5 AI Gaming for Complex Decision-Making

7 CONCLUDING THOUGHTS

B Public Meeting Agendas

C Committee Member Biographical Information

D Acronyms and Abbreviations

E Testing, Evaluating, and Assessing Artificial Intelligence–Enabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop—in Brief

Preface

At the request of the 96th Test Wing of the U.S. Air Force and Air Force Materiel Command, the National Academies of Sciences, Engineering, and Medicine were asked to convene a committee to conduct a consensus study to examine the Air Force Test Center’s technical capabilities and capacity to conduct rigorous and objective tests, evaluations, and assessments of artificial intelligence (AI)-enabled systems under operational conditions and against realistic threats.

The National Academies of Sciences, Engineering, and Medicine appointed the Committee on Testing, Evaluating, and Assessing Artificial Intelligence-Enabled Systems Under Operational Conditions for the Department of the Air Force to conduct this study, per the Statement of Task found in Appendix A and Box P-1 . The committee held its initial kickoff meeting in April 2022, conducted a data-gathering workshop in June 2022 (a Proceedings of a Workshop—in Brief of which can be found in Appendix E ), and held further data-gathering sessions throughout 2022 and early 2023, including a site visit to Eglin Air Force Base. Agendas for the data-gathering meetings can be found in Appendix B . Biographies of the committee members can be found in Appendix C . Appendix D contains a list of acronyms and abbreviations used in the report.

BOX P-1
Statement of Task

The National Academies of Sciences, Engineering, and Medicine will establish an ad hoc committee to (1) plan and convene a multi-day workshop and (2) conduct a consensus study to examine the Air Force Test Center’s technical capabilities and capacity to conduct rigorous and objective test, evaluation, and assessments of artificial intelligence (AI)-enabled systems under operational conditions and against realistic threats. Specifically, the committee will:

  • Evaluate and contrast current testing and assessment methods employed by the Department of the Air Force and in commercial industry.
  • Consider examples of AI corruption under operational conditions and against malicious cyberattacks.
  • Recommend promising areas of science and technology that may lead to improved detection and mitigation of AI corruption.
  • The committee will provide workshop proceedings—in brief and in a report summarizing the results from the consensus study.

    The National Academies of Sciences, Engineering, and Medicine
    500 Fifth Street, NW | Washington, DC 20001 Terms of Use and Privacy Statement Test and Evaluation Challenges in Artificial Intelligence-Enabled Systems for the Department of the Air Force Get This Book

    The Department of the Air Force (DAF) is in the early stages of incorporating modern artificial intelligence (AI) technologies into its systems and operations. The integration of AI-enabled capabilities across the DAF will accelerate over the next few years.

    At the request of DAF Air and Space Forces, this report examines the Air Force Test Center technical capabilities and capacity to conduct rigorous and objective tests, evaluations, and assessments of AI-enabled systems under operational conditions and against realistic threats. This report explores both the opportunities and challenges inherent in integrating AI at speed and at scale across the DAF.

    READ FREE ONLINE

    Appendix E: Testing, Evaluating, and Assessing Artificial IntelligenceEnabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop - in Brief 163–180

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »

    Show this book's table of contents , where you can jump to any chapter by name.

    « Back Next »

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »

    Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »

    To search the entire text of this book, type in your search term here and press Enter .

    « Back Next »

    Share a link to this book page on your preferred social network or via email.

    « Back Next »

    View our suggested citation for this chapter.

    « Back Next »

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »