From Vision to Action: Multi-Modal Agentic AI in Real-World Use

About

Modern AI systems are evolving to see, reason and act. In this session, we explore designing an agentic AI system that combines computer vision with large language models (LLMs) to detect uniforms and trigger intelligent, context-aware events like granting access, sending alerts, or logging events. The system architecture includes prompt chaining, lightweight APIs, and agent frameworks, along with safeguards like confidence thresholds and human-in-the-loop logic. Attendees will gain insights into how such systems can be applied across aviation, logistics, retail, and security integrating perception, reasoning, and response for scalable, responsible automation. The session closes with a hands-on demo using synthetic visual inputs and real-time LLM-based decision-making.

Key Takeaways:

  • Learn how to integrate vision models with LLMs for real-world, context-aware decisions.
  • Understand the architecture of agentic AI systems—vision inference, prompt chaining, and orchestration.
  • Explore domain-specific use cases in aviation, logistics, and retail with actionable examples.
  • Watch a live demo of a multi-modal AI agent making decisions from synthetic image inputs.

Speaker

Book Tickets
Download Brochure

Download agenda