Neural network super-resolution model. Fastest upscaler, great for photos, illustrations, anime, and screenshots. 15–30 second processing.
- Scale: 2×, 4×, 8×
- Processing time: ~15–30 seconds
- Face enhance: optional GFPGAN
- Best for: photos, screenshots, anime, illustrations
- Output: very clean, no artifacts
- Open source (BSD-3-Clause)
Stable Diffusion-based upscaler that genuinely adds detail and texture. Best for portraits, nature photography, and artwork. 45–90 seconds.
- Scale: 2×, 4×
- Processing time: ~45–90 seconds
- Creativity + resemblance controls
- Best for: portraits, nature, artistic photography
- Output: richly detailed, AI-enhanced
- Open source (Apache-2.0)
Full specification comparison
| Specification | Real-ESRGAN | Clarity Upscaler |
|---|---|---|
| Architecture | RRDB + GAN | Stable Diffusion XL (tile) |
| Scale factors | 2×, 4×, 8× | 2×, 4× |
| Processing speed | ~15–30 s | ~45–90 s |
| Output style | Clean, natural | Highly detailed, AI-enhanced |
| Face enhancement | Yes (GFPGAN) | No (built-in) |
| Adjustable parameters | Scale, face_enhance | Scale, creativity, resemblance, prompt |
| Best image types | Photos, screenshots, anime | Portraits, nature, art |
| Hallucination risk | Very low | Moderate (at high creativity) |
| Open source | Yes (BSD-3) | Yes (Apache-2) |
| Hosting | Replicate API | Replicate API |
What is Real-ESRGAN?
Real-ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks) is an improved version of the original ESRGAN model, developed by Xinntao Wang. It's trained specifically on real-world degraded images — compression artifacts, blur, noise, low resolution — making it far more practical than earlier SR models that only handled simple bicubic downsampling.
The architecture uses Residual-in-Residual Dense Blocks (RRDB) as the generator, trained against a discriminator that learns what "real high-resolution" looks like. The result is an output that's convincingly sharp without over-sharpening or hallucinating texture that doesn't belong.
When to choose Real-ESRGAN: Speed is important, 8× scale is needed, or you're upscaling screenshots, illustrations, logos, anime, or text-heavy images where hallucinated detail would look wrong.
What is Clarity Upscaler?
Clarity Upscaler (by philz1337x) is a tile-based upscaler built on Stable Diffusion XL with the Juggernaut Reborn checkpoint plus custom LoRAs for detail enhancement. Unlike GAN-based upscalers, it uses a diffusion process to actively generate new pixel content based on what it "thinks" should be there — not just pattern-matching from training data.
The creativity parameter controls how aggressively the AI adds new detail (0.3 = subtle, 0.9 = aggressive regeneration). The resemblance parameter controls how faithfully the output matches the original composition. At resemblance 1.6, the output is very close to the original but sharper. At 0.3, it's a creative reinterpretation.
When to choose Clarity Upscaler: Maximum output quality is the goal, especially for portraits, landscape photography, studio product shots, or any image where real texture richness makes a visible difference.