Video Forge: turning a real-estate listing into a narrated video reel

May 29, 20266 min read

A real-estate listing is already a small pile of assets: a dozen photos, a price, the number of bedrooms and bathrooms, the square meters, the agent's name. Postea Casa takes that pile and turns it into social-media posts. Video Forge is the sibling that takes the exact same pile and turns it into a video — a narrated, watermarked reel you can drop straight into a Reel or a Story.

The two share a worldview. A listing is structured data plus a folder of images, and from that you should be able to generate every marketing artifact a property needs without a human opening an editor. Posts are the still version of that idea. Video Forge is the moving version.

What it is

Video Forge is a FastAPI service living at forge.solutions45.com (port 8086, a plain uvicorn process under systemd — not containerized, because it needs the system FFmpeg binary right there). You POST a composition request: the source photo URLs, the script for the voiceover, the property metadata, the agent details, and which format you want — a fast vertical reel or a longer tour. The request comes back immediately with a job id, because the actual work takes a while and you don't want to hold an HTTP connection open through an FFmpeg encode. The job runs in the background and its status lives in SQLite (video-forge.sqlite), so you poll the job id until it flips to done and hands you a Luna CDN URL.

Authentication is an X-API-Key header, the same model as Luna — one key, one caller, no ambiguity about who's asking.

The pipeline

The interesting part is what happens between "POST a listing" and "here's an MP4." Four stages, each handing off to the next:

1. Download the photos. The request carries URLs, not bytes, so the first thing the job does is fetch every source image — concurrently, with httpx, because a tour can reference a dozen photos and pulling them one at a time would dominate the runtime. They land in a scratch working directory for this job.

2. Generate the voiceover. The script — usually a short narration written from the listing copy — goes to a TTS API, with the voice and tone configurable per request. What comes back is the audio track the whole video is timed against. The voiceover's length is what decides how long each scene gets to breathe; the visuals are cut to the narration, not the other way around.

3. Composite with FFmpeg. This is the muscle. FFmpeg takes the downloaded photos, the voiceover track, and the per-scene timing and builds a multi-scene video — pans across each photo, transitions between scenes, the audio bed underneath. A reel is the fast vertical cut; a tour is the longer walk-through. The text and branding don't get drawn by FFmpeg's own text filters, though — that's where Luna comes in.

4. Upload to Luna. The finished MP4 goes back to Luna CDN through upload_to_luna(), the same authenticated upload path everything else on the server uses, and the job record stores the public cdn_url. From the caller's side the whole thing is "send a listing, get back a link."

Why the watermark comes from Luna, not FFmpeg

The detail I'm proudest of is that Video Forge doesn't draw its own branding. FFmpeg can burn text and logos in with drawtext and overlay filters, but then the brand treatment for a video would live in a different codebase than the brand treatment for a still image — two implementations of the same logo placement, the same diagonal watermark, the same fonts, drifting apart the first time a client changes their logo.

Instead, Video Forge asks Luna for the branding. Luna has an endpoint, /api/overlay/generate, that takes the same body as its cover-generation endpoints but returns a transparent PNG instead of compositing onto a photo — the branding layer alone: logo, diagonal watermark, gradients, all on a clear background, saved nowhere. Video Forge pulls that PNG down and hands it to FFmpeg as an overlay track. The video gets composited under the brand layer, so a clip ends up with the exact same treatment as a still cover generated for the same client. One branding registry on Luna's side, two consumers: stills bake it into the image, motion lays it on top at render time.

That's the whole reason the two services are friends. Luna owns what the brand looks like; Video Forge owns how a video moves. Neither reimplements the other's job.

SQLite as the job ledger

Video composition is slow and the request can't wait on it, so every composition is a row in video-forge.sqlite. The POST creates the row in a queued state, the background task moves it to processing, and it lands on done (with the CDN URL) or error (with what went wrong). The client polls the job id; the server never holds a long connection.

SQLite is the right call here for the same reason it is in Luna: it's a single host service, one writer, no need for a database server sitting next to it. A file on disk with WAL mode is plenty, and it means the job history survives a restart without any extra infrastructure. If FFmpeg dies halfway through an encode, the row is still sitting at processing and I can see exactly which job and which input tripped it.

Where it sits

Video Forge isn't trying to be a general video editor. It does one narrow thing — turn a real-estate listing into a narrated, branded reel — and it leans on the rest of the server for everything outside that lane. Photos and the finished file live in Luna. The branding lives in Luna's registry. TTS is an API call. FFmpeg is a system binary. Video Forge is the thin orchestrator in the middle that knows the order of operations and the timing, and almost nothing else.

That's the same instinct behind every piece of this server: a service should own one job completely and borrow everything else. Postea Casa makes the stills, Video Forge makes the motion, Luna holds the media and the brand, and a listing walks in one end as a folder of photos and comes out the other as a link you can paste into a Story.