What Is Ollama
Ollama is an open-source tool for running large language models locally on your own machine. It handles downloading, configuring, and serving models through a local API, so you can use AI models without sending data to external services.
What Ollama does
Ollama manages the full lifecycle of running local models: pulling model weights from the Ollama model library, loading them into memory, running inference, and exposing a REST API on localhost:11434. You interact with it through the ollama CLI or by making HTTP requests to the local API.
The core workflow is straightforward: install Ollama, pull a model, and start using it. For example, ollama run gemma3 downloads and starts a chat session with Google's Gemma 3 model. Models are cached locally after the first download.
Supported platforms
Ollama runs on macOS, Linux, and Windows. It is also available as an official Docker image (ollama/ollama) on Docker Hub. Installation options include:
- macOS:
curl -fsSL https://ollama.com/install.sh | sh or download Ollama.dmg directly
- Linux:
curl -fsSL https://ollama.com/install.sh | sh with a manual install option
- Windows:
irm https://ollama.com/install.ps1 | iex or download OllamaSetup.exe directly
- Docker: Pull the
ollama/ollama image from Docker Hub
Additional package manager support includes Homebrew, Pacman (Arch), Nix, Helm Chart, Gentoo, and Flox.
Model capabilities
Models available through Ollama support the following capabilities, depending on the specific model:
- Completion -- Text generation and chat. The core capability for conversational AI and text output.
- Vision -- Processing and understanding images alongside text input.
- Tool calling -- Letting models invoke external functions or tools during a conversation.
- Embedding -- Generating vector representations of text, used for search, retrieval, and RAG applications.
- Thinking/Reasoning -- Extended reasoning where the model shows its chain of thought before answering.
- Image generation -- Creating images from text prompts.
- Audio -- Processing audio input, including transcription.
The model library
Ollama provides a model library at ollama.com/library with a catalog of open-source models ready to run. Models include Gemma 3, Llama, Mistral, Phi, and many others across different sizes and specializations. You can also create custom models using a Modelfile, import models, and push models to the library.
REST API and compatibility endpoints
Ollama exposes a local REST API with endpoints for:
/api/chat -- Conversational inference (chat completions)
/api/generate -- Text generation (completions)
/api/embed and /api/embeddings -- Generate text embeddings
/api/pull, /api/push -- Download and upload models
/api/tags -- List locally available models
/api/show -- Show model details
/api/create -- Create a model from a Modelfile
/api/copy -- Copy a model
/api/delete -- Delete a model
/api/ps -- List running models
Ollama also provides OpenAI-compatible endpoints, so tools and libraries built for the OpenAI API can work with Ollama by pointing to localhost:11434:
/v1/chat/completions -- OpenAI-compatible chat completions
/v1/completions -- OpenAI-compatible text completions
/v1/embeddings -- OpenAI-compatible embeddings
/v1/models -- OpenAI-compatible model listing
/v1/responses -- OpenAI Responses API compatibility
/v1/images/generations and /v1/images/edits -- OpenAI-compatible image generation
/v1/audio/transcriptions -- OpenAI-compatible audio transcription
An Anthropic-compatible endpoint is also available at /v1/messages.
Official client libraries
Ollama maintains two official client libraries:
- Python --
pip install ollama (ollama-python)
- JavaScript --
npm i ollama (ollama-js)
Both libraries wrap the REST API and support chat, generation, embeddings, and model management operations. A large ecosystem of third-party integrations exists for other languages and frameworks, including LangChain, LlamaIndex, Spring AI, and SDKs for Ruby, Rust, Go, C++, Java, Swift, .NET, and more.