Researchers Call for Stricter Standards and Testing to Tackle Harmful AI Outputs
As artificial intelligence (AI) systems become more widely used, instances of potentially harmful outputs—such as hate speech, copyright violations, and inappropriate content—are increasingly being uncovered. Researchers emphasize that the rapid expansion of AI technology has outpaced the development of adequate testing protocols and regulations, leaving users vulnerable to unintended consequences.
Challenges in Ensuring Safe AI Behavior
Despite nearly 15 years of research, experts like Javier Rando, who specializes in adversarial machine learning, acknowledge that reliably controlling AI behavior remains elusive. According to Rando, the field has yet to find robust methods to prevent harmful or unintended AI responses consistently.
One approach to assessing AI risks is red teaming—a practice borrowed from cybersecurity where dedicated testers probe AI models to identify vulnerabilities or harmful behaviors. However, Shayne Longpre, an AI policy researcher leading the Data Provenance Initiative, notes that current red teams are understaffed and lack the breadth of expertise needed.
Expanding AI Testing Beyond In-House Teams
Many AI startups rely on internal evaluators or contracted groups for model testing. Longpre and colleagues suggest that opening testing to independent third parties—including everyday users, journalists, researchers, and ethical hackers—could significantly improve evaluation rigor.
They also highlight the need for multidisciplinary involvement. Some flaws require assessment by specialized experts such as lawyers, medical professionals, or scientists to determine their severity or legality.
Adopting standardized reporting on AI system flaws, along with incentives and information-sharing frameworks, is critical. Such practices are already established in software security and are now urgently needed in AI.
Integrating Governance and User-Centered Evaluation
Combining community-driven testing with policy measures and governance can lead to a better understanding and mitigation of AI risks, says Rando. This approach balances technical scrutiny with regulatory oversight.
Project Moonshot: A Case Study in AI Evaluation
An example of this integrated approach is Project Moonshot, launched by Singapore’s Infocomm Media Development Authority. Developed with partners including IBM and DataRobot, it offers a large language model evaluation toolkit that incorporates benchmarking, red teaming, and continuous testing.
Anup Kumar of IBM Asia Pacific emphasizes that evaluation should be an ongoing process conducted before and after AI model deployment. While many startups have adopted the open-source toolkit, Kumar acknowledges that more extensive efforts are needed to scale these initiatives.
Future plans for Project Moonshot include enabling customization for specific industries and supporting multilingual and multicultural red teaming.
Calls for Higher Regulatory Standards
Pierre Alquier, a statistics professor at ESSEC Business School, warns that AI companies are rushing to release new models without thorough evaluation. He draws parallels to pharmaceutical and aviation industries, where rigorous testing is mandatory before approval.
Alquier advocates for:
- Strict approval criteria for AI models before release.
- A shift toward AI systems designed for narrower, more specific tasks to reduce misuse risks.
- Clearer definitions of what constitutes safe AI behavior.
Broad models like large language models (LLMs), while versatile, present challenges because their wide-ranging capabilities make anticipating all possible misuses nearly impossible.
Conclusion
The AI field is at a critical juncture. As AI systems grow more powerful and integrated into daily life, the need for stringent testing, multidisciplinary evaluation, and regulatory oversight becomes increasingly urgent.
Experts urge technology developers to avoid overstating their safety measures and to adopt comprehensive, transparent evaluation practices to build trust and minimize harm in AI deployment.