Generative AI — Image generation benchmark
How can a simple prompt tell more about a model than a thousand words?
I have been using the same prompt for 2 years to evaluate and benchmark image generation models.
photography style, family portrait
This prompt is stupidly simple but encapsulates the complexity of image generation:
- Very low context, making the request difficult to analyze and answer.
- It involves people, a highly complex subject with numerous challenging details (number of fingers, positioning of arms or individuals, etc.).
- It also reveals potential cultural biases in a model (the definition of “family” may vary from one country to another).
It’s also an effective way to measure the evolution of models and compare differences between them.
Let’s dive into the results!
Disclaimer : This article will be updated every 6 months (or more according to market announcements)
Deepai.org
17/01/2025
Pixlr.com
17/01/2025
Adobe Express
17/01/2025
Janus Pro
03/02/2025
Midjourney
03/02/2025
Flux V1
03/02/2025
Grok
03/02/2025
My personal ranking :
03/02/2025
1 — Flux V1: for their very natural and qualitative scenes, very natural, could have been made by a photographer
2 — Midjourney : for their scenes and diversity, but many ugly details remains, and the quality is somehow random
3 — Grok : The details are good, but the scenes are very classic and formal, with no diversity
Let me know your thoughts and let me know which other models I could integrate to this benchmark !