Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
AI Sovereignty and LLM Local Deployment
- Risks associated with cloud LLMs: data retention, training on user inputs, and foreign jurisdiction issues.
- Ollama architecture: model server, registry, and OpenAI-compatible API.
- Comparison with vLLM, llama.cpp, and Text Generation Inference.
- Model licensing: terms for Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Setup
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback options and AVX/AVX2 optimization.
- Docker deployment and persistent volume mapping.
- Multi-GPU configuration and VRAM allocation strategies.
Model Management
- Downloading models from the Ollama registry: example command 'ollama pull llama3'.
- Importing GGUF models from HuggingFace and TheBloke.
- Quantization levels: analyzing trade-offs between Q4_K_M, Q5_K_M, and Q8_0.
- Managing model switching and concurrent model loading limits.
Custom Modelfiles
- Crafting Modelfile syntax: utilizing FROM, PARAMETER, SYSTEM, and TEMPLATE.
- Tuning Temperature, top_p, and repeat_penalty.
- Engineering system prompts for role-specific behavior.
- Creating and publishing custom models to the local registry.
API Integration
- Using the OpenAI-compatible /v1/chat/completions endpoint.
- Implementing streaming responses and JSON mode.
- Integrating with LangChain, LlamaIndex, and custom applications.
- Managing authentication and rate limiting via reverse proxy.
Performance Optimization
- Configuring context window sizing and managing KV cache.
- Executing batch inference and handling parallel requests.
- Allocating CPU threads and ensuring NUMA awareness.
- Monitoring GPU utilization and memory pressure.
Security and Compliance
- Establishing network isolation for model serving endpoints.
- Implementing input filtering and output moderation pipelines.
- Maintaining audit logs of prompts and completions.
- Verifying model provenance and hash integrity.
Requirements
- Intermediate proficiency in Linux and container administration.
- High-level understanding of machine learning and transformer models.
- Familiarity with REST APIs and JSON.
Audience
- AI engineers and developers transitioning from cloud LLM APIs.
- Organizations with data sensitivity constraints that prohibit the use of cloud models.
- Government and defense teams requiring air-gapped language models.
14 Hours