Köszönjük, hogy elküldte érdeklődését! Csapatunk egyik tagja hamarosan felveszi Önnel a kapcsolatot.
Köszönjük, hogy elküldte foglalását! Csapatunk egyik tagja hamarosan felveszi Önnel a kapcsolatot.
Kurzusleírás
EXO Infrastructure as Code
- Overview of EXO deployment patterns: single-node, multi-node, and RDMA clusters
- Automating dependency installation (Xcode, uv, Node.js, Rust) with configuration management
- Using Nix flakes for reproducible EXO builds and developer environments
- Writing Ansible playbooks or shell scripts for unattended cluster provisioning
Reproducible Builds and CI Integration
- Pinning dependencies and building the dashboard in CI pipelines
- Running EXO smoke tests in GitHub Actions or GitLab CI runners
- Creating golden images and snapshot-based rollback workflows for macOS and Linux VMs
- Versioning custom model cards alongside application code
Cluster Discovery and Networking Automation
- Configuring mDNS and static DNS for reliable libp2p node discovery
- Automating network profile creation and Thunderbolt bridge management on macOS
- Using custom namespaces (EXO_LIBP2P_NAMESPACE) to separate dev, staging, and prod clusters
- Firewall rules and network segmentation for multi-tenant environments
Storage and Model Lifecycle Management
- Designing EXO_MODELS_DIRS and EXO_MODELS_READ_ONLY_DIRS strategies
- Mounting NFS or SAN shares as read-only model repositories for fast provisioning
- Garbage collection of stale caches and versioned weight retention policies
- Automating model pre-downloads and health checks before rolling updates
Monitoring and Alerting
- Shipping EXO logs to centralized logging (ELK, Loki, or Splunk)
- Building Grafana dashboards from EXO_TRACING_ENABLED output
- Alerting on cluster membership changes, OOM events, and inference latency spikes
- Correlating macmon hardware telemetry with model performance regressions
Update, Rollback, and Disaster Recovery
- Staging EXO binary updates in a canary node before fleet-wide rollout
- Model-level rollback: switching between quantized versions without re-downloading
- Backing up and restoring cluster state, custom namespaces, and cached weights
- Documenting recovery runbooks for total cluster rebuild scenarios
Security Hardening and Compliance
- Applying TLS at the reverse proxy layer (nginx, traefik) for the dashboard and API
- Implementing API rate limiting and IP whitelisting for EXO endpoints
- Isolating clusters with VLANs and zero-trust network policies
- Auditing access and maintaining an inventory of deployed models and versions
Követelmények
- Experience with DevOps practices (CI/CD, IaC, container orchestration)
- Familiarity with macOS or Linux system administration and package management
- Understanding of networking, DNS, and storage concepts
Audience
- DevOps engineers
- Infrastructure architects
- SREs responsible for on-premise AI workloads
21 Órák
Vélemények (2)
Craig nagyon aktívan vett részt a képzésben, mindig ellenőrizve, hogy figyelmesek-e vagyunk, alkalmazta a példákat naprakész tevékenységeinkhez és minden alkalommal adott választ, még akkor is, ha az információ nem szerepelhetett a bemutatóban.
Ecaterina Ioana Nicoale - BOOKING HOLDINGS ROMANIA SRL
Kurzus - DevOps Foundation®
Gépi fordítás
Magas elkötelezettség és tudás a képző szakembertől
Jacek - Softsystem
Kurzus - DevOps Engineering Foundation (DOEF)®
Gépi fordítás