Google just dropped something interesting into Google Photos: an Auto frame feature that doesn’t just crop or zoom, but actually repositions the camera after you’ve already taken the shot.
We’ve all been there. You snap a selfie, the smile is perfect, but the wide-angle lens makes your hand look like a giant claw. Or you catch a group shot, but one person’s face is half-hidden. Classic editing tools can’t fix that. Cropping doesn’t change parallax. Zooming in just makes the bad angle worse.
This new approach, detailed in a Google Research blog post by Marcos Seefelder and Pedro Velez, treats your 2D photo as a frozen 3D moment. It figures out where the camera was, what the scene looks like in three dimensions, and then generates what would be visible if you had shot from a different position. It’s live now in Google Photos.
How it works: 3D estimation first, generation second
Most generative editing tools try to just “imagine” new content directly. Google’s method splits the problem into two clear stages, which I think is the right call.
First, it estimates a 3D point map from the original image. For every pixel, it figures out where that surface patch sits in 3D space, and it also approximates the original camera’s focal length. They specifically tuned this model to handle human bodies and faces well, which matters because nothing ruins a portrait faster than a warped face.
Second, it uses classical 3D rendering to project that point map into the new camera position. This part is deterministic, no AI magic. But moving the virtual camera always reveals gaps, areas that were behind the subject or outside the original frame. That’s where the generative model comes in.
A latent diffusion model, trained on pairs of images with known camera parameters, fills those holes. During training, it learns to take the 3D projection of one image and reconstruct the second image from it. At inference, it uses classifier guidance with regional scaling to keep the generated content consistent with the original scene.
What it can actually do
The examples Google showed are telling. They demonstrate adjusting camera angle to include more of a face, lowering the perspective on a selfie, and changing focal length to reduce distortion from wide-angle lenses. These are the exact pain points that make people delete otherwise good photos.
I’ve tested similar features from other companies, and they often produce creepy results, especially around faces. Google’s approach seems more conservative, which I appreciate. The 3D estimation step explicitly limits reconstruction artifacts that could harm identity preservation. That’s smart.
The catch
This isn’t magic. It works best on scenes with clear spatial layout, like portraits, group shots, and landscapes with distinct foreground/background separation. Complex scenes with lots of fine detail or transparency, like hair blowing in the wind or glass reflections, will still trip it up. And the generative inpainting isn’t perfect. You’ll occasionally see artifacts, especially in areas the model had to invent entirely.
Also, it’s only available through the Auto frame feature in Google Photos. No API, no standalone tool. If you’re not a Google Photos user, you’re out of luck.
Where this is going
This is clearly the next step in computational photography. We’ve already seen Google use ML for HDR, portrait mode, and magic eraser. Now they’re manipulating the fundamental geometry of a photo. It’s not hard to imagine this evolving into something that lets you freely navigate a photo like a 3D scene, or even combine multiple photos taken from different angles into a single reframed shot.
For now, it’s a genuinely useful feature that solves a real problem. No more settling for “almost perfect.” That’s a win in my book.
Comments (0)
Login Log in to comment.
Be the first to comment!