Accelerate Test or Lose

  • Published
  • By Col. Matthew Bradley, 53d Wing commander
  • 53d Wing
General Charles Q. Brown, Jr., Chief of Staff of the Air Force (CSAF), penned a short paper titled Accelerate Change or Lose, and challenged the Air Force to “accelerate change to control and exploit the air domain.” He stated: “the Air Force must work differently with other Department of Defense stakeholders, Congress, and both traditional and emerging industry partners to streamline processes and incentivize intelligent risk-taking."
As the Air Force’s sole wing dedicated to operational test and evaluation, the 53d Wing believes our part of General Brown’s challenge is to Accelerate Test or Lose, to deliver the tactical advantage to the warfighter. Too often, we find ourselves beholden to antiquated business practices and the bureaucracy General Brown identifies in his Action Orders (Airmen, Bureaucracy, Competition, Design). Developmental Test (DT) and Operational Test (OT) practices suffer from acquisition and test models designed for the Cold War and CENTCOM conflicts. However, with the Pacing Challenge facing the United States, the Test Enterprise can no longer employ a protracted test paradigm. General Brown stated, “Competitors, especially China, have made and continue aggressive efforts to negate long-enduring U.S. warfighting advantages and challenge the United States’ interests and geopolitical position” (3). We consume too much time in acquisition and test, delaying the delivery of cutting-edge technology to the warfighter. Accelerating test will not be the Holy Grail of acquisition and fielding timeline reduction, but it is incumbent upon every organization in the Air Force to analyze its contribution to the Action Orders presented by General Brown. We must improve the models, infrastructure, and organizational constructs in the Test Enterprise to account for an overall increase in weapon system complexity and the necessity to more-rapidly field new systems. When budgets and time are not constrained, it is prudent to carefully execute every test parameter to ensure there is full confidence in the fledgling systems. Minimal risk is assumed, and the Test community can be reasonably confident it is providing a capable system with limited to no failures. Conversely, executing the minimal number of test points expedites weapon system delivery, but the consequential cost is increased risk and decreased weapon system confidence. In both cases, cost is directly tied to the number of executed test parameters. The figure below illustrates the concept.

We must operate in the vicinity of the risk and cost curves intersection and strive to balance cost, risk, and time with sufficient combat performance. To better meet General Brown’s imperative to Accelerate Change or Lose, I offer the following for Test Enterprise consideration:
Combine Test Programs:
Throughout warfighting history, separate weapon systems have operated together with different integration levels. The integration spectrum ranges from weapon systems operating in proximity to each other (essentially avoiding any significant level of cooperation), to fully integrated systems that share data and achieve combined effects against the threat. The future of warfighting requires every weapon system to directly share data with other systems to build what is called a “kill web,” versus the traditional “kill chain” where each individual weapon system individually attacks the enemy. Unfortunately, the current construct for test planning requires each system to execute in relative isolation where it is scored on its ability to counter individual threat systems. To use a football analogy, this is like evaluating the quarterback’s ability to move the ball against the defense without an offensive line. There is questionable value in evaluating systems only in isolation, like evaluating a quarterback’s ability to take a snap, drop back, and accurately throw a pass without the chaos of competing teammates and opponents directly in front of him. While football teams sometimes divide into position groups for a portion of the practice, the majority of a team’s practice and evaluation occurs with twenty-two players on the field. In the same manner, the ability of an F-22 to counter an SA-20 should not be part of an isolated weapons system test plan, as is currently favored. In combat, a single F-22 would not be tasked to engage an SA-20 site but would rather operate in concert with other weapon systems to engage and counter the entirety of the enemy Integrated Air Defense System (IADS). Grading the F-22 alone does not represent integrated combat capability, and it increases time and cost to accomplish those test points for each new aircraft or Operational Flight Program (OFP). Additionally, as enemy systems become more complex and more difficult to replicate, the data garnered from these one versus one (1 v 1) scenarios does not necessarily prove or disprove blue force survivability or lethality. Instead, new systems should be part of the integrated force test package where all assets are scored together and test planning is accomplished in a coordinated fashion, thus requiring both reduced range time and focused use of limited threat assets.
Remove Classified Read-in and Enterprise IT Obstacles:
In Air Force Doctrine Publication 1 (AFDP-1), General Brown said, “Victory goes to the rapid integrator of ideas. These ideas are driven by training and the distilled knowledge all Airmen bring to the fight." Despite General Brown’s direction, his intent is hampered by security requirements that stovepipe technologies and concepts into Special Access Programs (SAP). These programs require individual read-ins and are traditionally governed by outside organizations that arbitrarily limit the number of people granted access. Personnel are blocked from delivering “distilled knowledge” to the fight because they do not have access to compartmentalized knowledge. Additionally, personnel may have access in one position, but subsequently lose access due to a location or position change. We often joke no one possesses the “Men in Black” neuralyzer, so personnel never forget what they know. However, unless an individual has current program access, they are unable to discuss what they previously learned. The paperwork required to justify a tester gaining a program access is burdensome and requires several levels of coordination. Program read-in requests to the appropriate approval authority require justification of what that individual will work on and whether they can provide material contribution. The answer to the question of “why” someone should have a “read-in” is, “why not?” The security world wants to reduce the risk of personnel leaking classified information, but if we cannot integrate exquisite weapons systems in the test arena due to classification issues, combat mission failure due to lack of interoperability and integration is all but guaranteed. Furthermore, Tactics, Techniques, and Procedures (TTPs) and lessons learned from dissimilar test events are difficult to fully analyze and share due to disparate classifications and classified network constraints. Every new weapon system relies on lessons learned from current fielded systems. The Test Enterprise must be able to contribute to new systems without having to continually justify individual program accesses. It really boils down to whether we trust our Airmen and Guardians. If a person has current access to multiple highly classified programs, what is the point of justifying access to the next classified program that will allow them to integrate with others? Program security is a bureaucratic and stove-piped system established to protect budgets and minimize inputs. In addition to integration, lack of access inhibits the discovery and analysis of potential vulnerabilities. Finally, even if personnel have read-ins, they most likely cannot analyze or transmit test data due to SAP Information Technology (IT) restrictions. Personnel have limited software privileges, are denied access to systems, or do not possess systems altogether in their workspaces. If the United States really wants to understand how its combat forces will integrate, it must integrate through every test phase.
Accept Results from Virtual Environment:
As mentioned previously, weapons systems will not enter combat alone; true test parameters must be accomplished in the aforementioned “kill web” environment (vs. “kill chain”), where multiple players contribute to Find, Fix, Track, and Target (F2T2) the enemy. Given limited range space and the inability to simultaneously execute all capabilities (Global Positioning System [GPS] jamming, electronic attack [EA], cyber-attack, etc.), the outcomes of live-fly test and evaluation sorties are becoming less and less operationally relevant. While it is difficult to flawlessly replicate real life in the virtual environment while accomplishing test at the highest classification levels, it is equally imperative the Test Enterprise embrace virtual test results to achieve the most relevant and comprehensive test environment. You may have heard the statement, “all models are wrong, and some are useful” (attributed to British statistician George Box). The risk incurred in virtual environment testing is that of producing results, conclusions, and making decisions based on an incorrect model. Effects-based models direct the simulator to display results according to specific parameters. For example, when analysts assess a threat system would detect a stealth aircraft at a specific range in a particular environment, an effects‑based threat model will automatically target the aircraft at that range without considering any other parameters. The threat will always see the aircraft at the designated range, and the pilot will always see threat display indications at the same range. Conversely, physics-based models replicate actual interactions between the threat, weapon system, and environment and more accurately represent real-world operations. These physics-based models are very expensive to operate, however, and difficult to acquire. How does one replicate infrared (IR) in the virtual environment? Do we possess precise threat system data and TTPs to validate every model? We may never achieve flawlessly accurate “real world” performance in the virtual environment, but test limitations and range restrictions equally prevent us from realizing flawlessly accurate open-air assessments. Thus, dogmatically clinging to live-fly results is proving to be a fool’s errand. Too many organizations are waiting for the perfect modeling and simulation (M&S) solution to replace live-fly sorties. While we may never achieve that perfect model, the Test Enterprise can use virtual environment results today to hone-in on potential TTPs, and then apply those specific TTPs to carefully choreographed open-air missions for validation. Comparison of open-air and simulator missions increases confidence in both methods when they agree. Subsequently, simulation and virtual environment results can be extrapolated to expand the total number of test runs towards an objective. This combination of live-fly and M&S will decrease an over-reliance on either environment and reduce fielding time. The current effort to develop the Joint Simulation Environment (JSE) and expand the Virtual Test and Training Center (VTTC) at Nellis AFB is an example of using virtual environments to sprint towards increased capability. The VTTC requires an “All In” acquisition and programming effort. Unfortunately, agencies continue to invest in disparate programs that have been compromised by technological limitations. For instance, Distributed Mission Operations (DMO) allowed operators from around the world to instantaneously connect in a combined exercise. While DMO is useful for a command-and-control exercise, aircraft that would operate within a few hundred miles of each other cannot share data at the same rate between simulators separated by thousands of miles. The speed of light actually becomes the limitation for passing data around the world to simulate data passed in a tactical formation. The amount of information shared between aircraft, and the near-instantaneous requirement to fuse shared data, drives high-quality virtual efforts to a single location that can better represent the way actual aircraft share data. Additionally, security limitations result in inaccurate virtual environment results. This is due to IT disconnects with multi-level security, where systems operate at different classification levels. The previously mentioned personnel access limitations apply equally to virtual environment operations. Finally, with several on-going simulator efforts, the U.S. Government pays the same vendor several times for the exact same model to operate in different virtual environments. The prioritization of the VTTC in the Future Years Defense Program (FYDP) for the next five years will achieve higher fidelity test and training to the ultimate benefit of combat aircrew. Gathering data in the virtual environment must be embraced to make timely and informed fielding recommendations and employment decisions.
Accept Increased Risk:
Several concepts in this paper allude to the corporate Test Enterprise predilection to err towards traditional, low-risk weapon system fielding approaches. This culture requires longer timelines to accomplish data points before delivering new capabilities to the warfighter. Unfortunately, time is no longer a resource available to the United States as China has closed the capability gap and is on the precipice of challenging U.S. military dominance. In the past, every fielded weapon system accomplished test against a validated threat system. The simulators replicated live parameters, and TTPs were codified in AFTTP 3-1. Young aircrew flew live missions and simulated missions in accordance with 3-1, and exercises like RED FLAG duplicated what they had observed in the simulator, studied in 3-1, and fought during home-station training events. When those same aircrew flew in combat (Desert Storm, Kosovo, Afghanistan, Iraq), the threats they engaged appeared on displays and operated in the exact same way they had seen in training. The Test Enterprise produced results that directly prepared and validated weapons systems and techniques for combat. This is not the case today. The DoD replicates threats to the best of its ability, but with increased enemy capability in the forms of Low Observable (LO), EA, IR, and passive threats, the confidence in threat replication is not as high as before. Additionally, the combination of all U.S. and partner capabilities has not been concurrently tested due to Federal Aviation Administration (FAA) restrictions (GPS jamming, blue EA, etc), security limitations (both personnel and Operational Security (OPSEC) driven) and reduced high demand/low density (HD/LD) asset availability. Without high-fidelity threat replication and a capability to execute unrestricted blue air TTPs, the best the Test Enterprise can do is produce reports and TTPs based upon limited data, resulting in educated guesses to provide the best-available military advice to strategic leaders. Unfortunately, unlike historical combat operations, the first time aircrew will encounter the true threat will not be in test or training, but rather on Night One. Despite the inability to truly replicate the enemy, the Bureaucracy still requires “high confidence” reports based on high-fidelity data, which does not exist. Holding on to the old system of evaluating all possible test parameters hides the basic fact those parameters are assessed in an environment of reduced fidelity threat replication. As painstakingly detailed and time-consuming as they often are, test reports often mask the reality that the data collected is insufficient and the conclusions drawn are simply a semi-informed best guess. To rapidly deliver warfighter capability, the DoD must accept increased risk and accept fielding recommendations based on the reality of limited available data. The Test Enterprise has no motivation to sneak deficiencies past decision-makers. The very aircrew who test the capability may later find themselves flying those systems in combat. The recommendations made remain true to what the test aircrew, engineers, intelligence, and analyst personnel can determine. They make fielding recommendations to the Major Command (MAJCOM) Commanders, and there are several cases where they recommend not fielding a system. Even so, delaying fielding due to over-zealous “test sufficiency” requirements levied upon test by outside agencies is cumbersome and incurs increased cost and time. DoD must start accepting test recommendations based upon more streamlined test planning and the best military subject matter expert advice without requiring unnecessary trials against antiquated threat systems to satisfy outdated paradigms for test planning.
Combine Test Force(s):
For many years, several offices have discussed combining the United States Air Force Warfare Center (USAFWC), Air Force Test Center (AFTC), and Air Force Operational Test and Evaluation Center (AFOTEC). Collectively known as the Tri-Center, each of these centers has its own charter as part of the Test Enterprise. AFOTEC answers directly to the Vice Chief of Staff of the Air Force (VCSAF) and is responsive to the Director, Operational Test and Evaluation (DOT&E) for major acquisition program oversight. The USAFWC answers to both Air Combat Command (ACC) and Air Force Global Strike Command (AFGSC), executes OT for the majority of Air Force programs, and responds to DOT&E as required when ACC is the lead organization. Finally, AFTC answers to Air Force Material Command (AFMC) and executes a majority of DT through several organizations. Combining these three centers would be politically unpopular, arduous, and likely would confuse authorities, budget lines, and expertise across the entire enterprise. However, combined test forces with matrixed DT, OT, and contractor personnel working together can achieve accelerated results at the unit level where testing is accomplished. A successful example of where this model is already in place is the Operational Flight Program-Combined Test Force (OFP-CTF) at Eglin AFB. It has a single unit commander that rotates between DT and OT every two years, with authority to direct both efforts using personnel supplied by both AFTC and USAFWC through the 96th Test Wing and 53d Wing. A similar construct exists to increase B-21 test efficiency, but without the single unit commander. This organizational framework must be the model for newer systems like Next Generation Air Dominance (NGAD). Additionally, placing OT professionals in AFMC staff or acquisition/developmental test professionals in OT squadrons can enhance collaboration.
Crowd-Source Flight Data:
The increased emphasis on rapid delivery of warfighter capability incurs additional risk of releasing operational capability with previously unnoticed limitations or failures (aka “test escapes”). Under the current model, if combat units discover an anomaly in a fielded weapon system (almost always software related), the issue is referred to OT and DT to fix. DT/OT organizations are typically testing future software iterations and must pause these tests to revert their aircraft to previously fielded configurations in order to analyze and correct the anomaly. This consumes valuable test time for the new software configurations. Major tech companies like Apple are “crowd sourcing” data from all the users on fielded systems. Using the iPhone as an illustration, the phone transmits user and error data back to Apple developers, who then analyze the data to identify and mitigate bugs and produce the next iOS iteration. The same capability exists in test, and it is currently being used by the F-35 program through the Quick Reaction Instrumentation Package (QRIP). Using a 12-pound device loaded onto the F-35 in a non-interference location, combat F-35s continuously collect data. This data is downloaded upon landing and transmitted to Nellis AFB for analysis. If an anomaly is found, it can usually be corrected in a lab without interfering with testing for the next software iteration. The next step in this endeavor is to incorporate Crowd-Sourced Flight Data (CSFD) into the F-22, B-21, B-52J, and other new weapon systems. The data collection solution does not necessarily need to be QRIP, but regardless of how the data is collected, it must be compatible with standardized data ingest, transport, and analysis capabilities. Appropriately, there should be an increased investment in Knowledge Management (KM) to move, store, and analyze this data. Analyzing data will also require dedicated investment in Artificial Intelligence (AI) resources to significantly reduce the time required to search through Petabytes of recorded data. As the Air Force competes for finite budget resources, this requirement should be prioritized to minimize the risk to combat forces by discovering and resolving any failures in fielded weapons systems before they are required in combat.
Acquire Targets for All Ranges Through a Centrally Managed Office:
Unlike aerial targets, there is no System Program Office (SPO) for surface targets. The Air Force and Navy conduct Test and Training on multiple ranges, including the Nevada Test and Training Range (NTTR), the Utah Test and Training Range (UTTR), the Joint Pacific Alaska Range Complex (JPARC), the Pacific Missile Range Facility (PMRF), and the Eglin Gulf Test and Training Range (EGTTR). Each range oversees their specific operations and acquires customer‑prioritized targets that suit range needs. Money is divided at the staff by Program Element Code (PEC). Targets that satisfy training needs are procured by one staff organization, while targets that satisfy test needs are procured by a different staff organization despite requirement overlap. Consider acquiring a threat-representative target that satisfies training needs for a combat-coded squadron. Target acquisition and maintenance would serve the combat wing training requirements for most of the year, and then allow the Test Enterprise to execute live-fire sorties against the target that would most likely destroy it. Then, a new target would be acquired for the next year, and this cycle would repeat. Unfortunately, having this conversation stalls at first contact with the different Staff agencies governing test and training (ACC/A3AR, ACC/A589, Ranges, etc.). Who is going to buy the target? Who is going to maintain it? Who is going to contract with industry across several ranges that conduct individual contracts? The target acquisition bureaucracy and maintenance across all DoD ranges precludes efficient purchase of threat representative capabilities that are necessary to decrease risk for Test Plans. DoD, Headquarters Air Force (HAF), AFMC, ACC, and AFGSC should establish an office to govern target acquisition across all ranges.
Accelerating Test is not the single answer to expediting warfighter lethality. However, each of the above areas address current Test Enterprise components and practices that slow the fielding process for new systems. Every idea presented is “easier said than done,” but CSAF challenged us to tackle these problems. He recently gave himself a “C” for his Accelerate Change or Lose effort, which means most of us get an “F.” We have Commander’s Intent, and we must invest the critical years ahead to overcome weapons capability stagnation driven by pervious decisions to focus solely on Middle Eastern wars. If we do not provide tactical advantage to the warfighter at a rapid pace, we are failing to properly do our jobs and are responsible for future combat failures.
The 53rd Wing provides tactical advantage to the warfighter at the speed of relevance. By testing new operational capabilities and evaluating fielded capabilities, the 53rd Wing is bringing the future faster while answering the warfighter’s demands for integrated, multi-domain capabilities.