apple ml-sharp

Sharp Monocular View Synthesis in Less Than a Second

SHARP Input

3D Gaussian Splats

Interactive View Synthesis

Focal Length: 50mm < 10ms Inference
Depth Map
Move cursor to experience 3D parallax

How it Works

SHARP is not just a model; it's a highly optimized feedforward pipeline designed for speed and fidelity.

Input

1. Input Image

Any single RGB photo

2. SHARP Network

Depth Pro + Gaussian Regression

3. 3DGS Parameters

Generates thousands of Gaussians

4. Real-time Render

100+ FPS free view

Faster than fast.
The power of feedforward.

Most modern view synthesis methods rely on slow iterative diffusion processes. SHARP changes the game by directly predicting 3D structure in a single feedforward pass, eliminating wait times.

Generation Time (Lower is Better)

Diffusion Models (Gen3C, ViewCrafter) ~ 60 - 300 sec
MINUTES
Apple ml-sharp < 1 sec

* Performance measured on NVIDIA A100 GPU. SHARP achieves a 3 order-of-magnitude speedup.

Zero-shot Generalization

No re-training required for specific scenes. SHARP is pre-trained on massive real-world data and works out of the box.

Metric Scale

Generated scenes have physical metric scale. Moving 1 meter forward in VR actually looks like 1 meter.

Core Metrics

25-34%

LPIPS Reduction

Significantly reduced perceptual error compared to previous SOTA, meaning closer to human visual realism.

< 1 sec

Generation Time

Sub-second generation on standard consumer GPUs, saying goodbye to long rendering waits.

100+

Rendering FPS

Generated .ply files are optimized for smooth real-time performance in existing 3DGS viewers.

Frequently Asked Questions

For running inference (generating the 3D model), any standard GPU setup works. However, to use the real-time visualization features provided in the repository, a CUDA-enabled GPU is currently required.

Diffusion models generate views by iteratively denoising an image, which is slow (minutes) and often inconsistent across frames. SHARP is a feedforward network that predicts the 3D structure (Gaussians) directly in one pass (< 1 second), ensuring consistency and speed.

The code and model weights are released under specific licenses. Please check the LICENSE and LICENSE_MODEL files in the GitHub repository for the most accurate and up-to-date usage terms.

You can run the Python inference scripts on Apple Silicon, but the real-time CUDA-based renderer is not compatible. You would need to use a compatible viewer for the generated .ply files.

Ready to dive in?

Join thousands of developers exploring the future of monocular 3D vision. The code is fully open source.

Terminal
1. Setup Environment
BASH
git clone https://github.com/apple/ml-sharp cd ml-sharp conda create -n sharp python=3.13 pip install -r requirements.txt
2. Run Prediction
BASH
# Automatically downloads model & processes image sharp predict \ -i ./input_images \ -o ./output_gaussians \ --render

Acerca de FastVLM

FastVLM: El modelo de lenguaje visual ultrarrápido de Apple que se ejecuta directamente en iPhone, ¡con una salida del primer token hasta 85 veces más rápida!

Partner Links

© 2025 FastVLM. Todos los derechos reservados. | Política de privacidad | Términos de servicio