Complex Scene Composition
Complex scenes are where image models stumble most. Without careful prompting, you get sparse crowds, flat atmospheres, and missed environmental details. The fix is to layer your prompt: anchor the scene, populate it, set the atmosphere, then specify the viewpoint.
The Recipe
[Viewpoint]: [aerial / eye-level / low-angle / over-the-shoulder / drone shot]
of [location — be specific, not "a city"]
Time: [time of day + season if relevant]
Weather/atmosphere: [clear / overcast / foggy / rainy / neon-lit night / etc.]
Population: [empty / sparse / busy / packed — and what people are doing]
Specific details that must be present: [list 2-3 elements you can't leave out]
Key environmental features: [what makes this location recognizable]
Lighting source: [sun position / artificial lights / mixed]
Style: [photorealistic / cinematic / painterly / editorial]
What “populated” actually means
Don’t write “people in the scene.” Write what the people are doing:
- ✅ “dense crowd in winter coats, heads tilted upward toward the fireworks”
- ❌ “crowd of people”
The more specific the activity, the more convincing the crowd.
Example prompts
City event / celebration:
Aerial drone shot directly above Times Square, New Year's Eve at midnight.
Packed shoulder-to-shoulder crowd in winter coats, all faces upturned toward
the sky. Confetti raining down in streams of silver and white. Massive
illuminated billboards and LED signs on all buildings. Glowing sphere visible
at top of frame. Photorealistic, cold blue and warm amber tones, slight motion
blur in the confetti.
Atmospheric environment:
Wide establishing shot of a fog-covered mountain town at dawn. A single lit
bakery window on the main street, warm amber glow cutting through the grey
mist. Cobblestone street, visible breath in the cold air, one figure in a
long coat walking away from the camera. Cinematic, muted palette, gentle
vignette.
Large-scale concept:
Panoramic view of a near-future city where every rooftop is a green garden.
Street level shows pedestrians and cyclists, no cars. Upper floors of
buildings covered in vertical plant walls, solar panels, and small wind
turbines. Bright midday light, clear sky, editorial concept art style,
optimistic tone.
Viewpoint guide
| Viewpoint | Best for |
|---|---|
| Aerial / drone | Crowds, urban density, geography, scale |
| Eye-level | Human stories, street scenes, immersion |
| Low angle | Power, grandeur, architecture, dramatic subjects |
| Over-the-shoulder | Intimacy, following action, personal narrative |
| Extreme close-up | Texture, detail, emotional intensity |
Common failure modes and fixes
Problem: Crowd too sparse → Add “packed”, “shoulder-to-shoulder”, “dense” — quantity words matter
Problem: Location isn’t recognizable → Add 2-3 specific architectural or cultural markers: “Empire State Building visible”, “neon kanji signage”, “red telephone box”
Problem: Flat, lifeless atmosphere → Add a specific light source and weather: “fog rolling in from the water”, “heat haze on the tarmac”, “rays of light through cloud breaks”
Problem: Everything feels the same distance away → Add a foreground element: “a couple in the foreground, blurred city behind them”
🔁 Leftover Remixes
🌶️ Spicy: “Same scene, 3 hours earlier — show the build-up. What does it look like when it’s busy but not yet at peak?”
🧊 Mild: “Same location, completely empty. 5am, nobody around. Focus on the atmosphere of the place without people.”
💰 Budget: “What are the 3 most visually distinctive elements I should include in a prompt for [location] to make it immediately recognizable?”
Prompt examples inspired by Amanda Caswell / Tom’s Guide. Adapted for The Prompt Kitchen.