Modal

Modal - Cloud-based Python Serverless Execution Platform

Overview

Modal is a serverless cloud computing platform designed for Python, letting you run Python code in the cloud without configuring servers. It supports GPU acceleration, automatic scaling, and pay-as-you-go billing, making it especially suitable for ML model deployment, batch data processing, and scheduled tasks. Sign up and receive $30/month in free credit.

Use Cases

1. Machine Learning Model Deployment

Deploy trained LLMs, image generation, or embedding models as cloud APIs with GPU inference acceleration. Modal automatically handles container configuration, load balancing, and elastic scaling—you only need to define the model and service logic.

2. GPU-accelerated Compute Tasks

Compute tasks that require GPUs (e.g., model training, inference, rendering) can request T4, A100, H100 and other GPUs directly on Modal, billed by usage time without the need to maintain GPU servers.

3. Large-scale Batch Processing

Distribute data-processing tasks across thousands of containers to run automatically in parallel—suitable for massive datasets, bulk file conversion, or distributed scientific computing.

Core Features

1. Declarative Container Image Definition

Define the runtime environment using Python code. Supports installing PyPI packages, system dependencies, adding local code modules, or using existing Docker images. Each deployment is automatically built to ensure a consistent environment.

2. Flexible GPU and Resource Configuration

Choose different types and numbers of GPUs based on task needs (from a single T4 up to 8-card H100), and customize CPU cores, memory, and ephemeral disk space. Billing can be based on reserved resources or actual usage.

3. Auto Scaling and Parallel Execution

Use the .map() method to automatically distribute tasks across multiple containers for parallel execution. Supports configuring minimum/maximum container counts, reserved buffer containers, and other strategies to enable elasticity from zero to thousands of instances.

Frequently Asked Questions

What is Modal? What scenarios is it suitable for?

Modal is a Python-focused serverless cloud computing platform. You define functions and runtime environments in Python, and Modal automatically handles container deployment, scaling, and resource management. It's particularly well suited for ML model deployment, GPU training/inference, batch data processing, scheduled tasks, and serverless APIs.

Which GPUs does Modal support? How do I choose?

Modal supports T4, L4 (economical inference), A10, A100, A100-80GB (standard training/inference), L40S (high cost-effectiveness, 48GB), H100, H200 (high-performance training), and B200 (flagship performance). For inference we recommend L40S; for training we recommend H100/A100. You can specify GPUs with @app.function(gpu="A100"); for multiple cards use gpu="H100:8".

How much free credit does Modal offer? How is billing handled?

New users receive $30/month in free credit upon registration. Billing is based on the compute resources used (CPU, GPU, memory, storage) and supports charging based on reserved or actual usage (whichever is higher). Functions do not incur charges when they are not running. See the Modal console for detailed pricing.

Author

Category

Install