Deploying & Interacting with LLMs at Scale

Abstract

This talk demonstrates, through live demos, how to deploy and interact with large language models (LLMs) across cloud and HPC environments. It provides an overview of LLM inference engines and practical deployment strategies on platforms such as Runpod Cloud and Slurm, covering single and multi-replica setups, batch inference configurations, and container-based deployment options, including pod-based and serverless modes. The session also demonstrates how to interact with locally deployed models using AnythingLLM.

Date

Bio

Dr. Muhammad Bilal recently joined the SANDS Lab and the GenAI Center of Excellence at KAUST as an Applied Research Scientist. In this role, he will contribute to the center’s AI application efforts, focusing on translating academic research into impactful real-world solutions.

Before joining KAUST, Dr. Bilal worked for four years at Unbabel, where he started as a senior engineer in the machine translation team and later became a senior engineering manager. In that role, he led teams working on model deployment, AI productization, and AI operations - building deep, practical expertise in scalable AI systems.

Avatar
M. Bilal
Alumni

PhD 2022, now Senior Engineering Manager at Unbabel.