Promptfoo

Open-source LLM evaluation and red team testing tool for comparing GPT, Claude, and Gemini performance with declarative settings, and vulnerability scanning for AI applications

Freemium ★ 4.3 🇺🇸 美國
Visit Website ↗

What is Promptfoo

Promptfoo is an open-source LLM evaluation and red team testing tool designed to identify vulnerabilities in AI applications before they go live. It has two main aspects: evaluation and security. The evaluation feature allows users to compare the performance of different models, such as GPT, Claude, and Gemini, on the same set of prompts using simple declarative settings. The security feature includes a vulnerability scanner that systematically attempts to jailbreak models and test for prompt injection and data leakage risks, covering the OWASP LLM Top 10.

One of its major advantages is that evaluations are run locally, directly interacting with the user's LLM, without sending prompts or data to third-party services, which is important for teams handling sensitive data. Promptfoo is an MIT-licensed open-source project with over 20,000 stars on GitHub, and it's used by companies like OpenAI and Anthropic. In March 2026, Promptfoo was acquired by OpenAI, but the official statement maintains that it will remain open-source and MIT-licensed.

Features and Use Cases

Who is it for? It's suitable for engineering teams working on LLM applications, agents, or RAGs, who need to test how changes to prompts affect performance and security. Promptfoo offers a free open-source version and an enterprise solution, which includes guardrails for real-time protection and enterprise support. While it may not be necessary for occasional manual prompt testing, it's essential for formal deployments.

Key Features

  • Declarative settings for comparing GPT, Claude, Gemini, and DeepSeek model performance
  • Red team testing: simulating jailbreaking, prompt injection, and data leakage attacks
  • Vulnerability scanning covering OWASP LLM Top 10 risks
  • Local evaluation execution, directly interacting with LLM, without sending data to third-party services
  • Integration with CI/CD for automated regression testing, with optional real-time guardrails

Pros

  • MIT open-source, local execution, and no sensitive data leakage
  • Comprehensive evaluation and security testing in one package, integrable with CI/CD
  • Used by OpenAI and Anthropic, with high community trust and support

Cons

  • Command-line and configuration file-oriented, with a higher barrier for non-engineering backgrounds
  • Enterprise-level features, such as guardrails, require payment
  • Long-term direction after OpenAI acquisition remains to be observed

Use Cases

  • Comparing different model performances on the same set of prompts
  • Conducting vulnerability scans for LLM applications, including jailbreaking and prompt injection
  • Integrating prompt evaluation into CI/CD for automated regression testing
  • Systematically checking RAG and agent security risks before deployment

Editor's Note

For LLM applications going live, evaluation and security testing are unavoidable. Promptfoo packages these into an open-source, locally executable tool that can integrate with CI/CD. Its use by OpenAI and Anthropic underscores its value. While its future after the acquisition is to be seen, it currently remains MIT open-source. We give it 4.3 stars.

FAQ

Will Promptfoo send my data to the cloud?

No, by default. Evaluations run locally, directly interacting with your configured LLM, without uploading prompts or test data to third-party services, making it more secure for teams handling sensitive data.

Is Promptfoo still open-source after being acquired by OpenAI?

Yes, according to official statements. After the acquisition in March 2026, Promptfoo remains open-source and MIT-licensed, though its long-term direction should be continuously monitored through official announcements.

Related AI Tools

繁體中文版 →