Rohan Siva

I am pursuing a B.S. in Electrical and Computer Engineering Honors with a Minor in Stats & Data Science at the University of Texas at Austin, graduating in May 2027.

This summer, I'm working on post-training voice models for CX at Decagon. Previously, I was a Machine Learning Intern at Cisco Hypershield working on cybersecurity data ETL agents. I am also an AI Researcher at the VITA Lab and Center for Autonomy at UT Austin under Prof. Atlas Wang and Prof. Ufuk Topcu, focusing on formal methods, neurosymbolic AI, and VLM-based planning. At the Statistical Learning & AI Group, I work on improving the efficiency of RL algorithms (GRPO/PPO). I also collaborate with Prof. Hao Tang at Peking University on diffusion models for parallelizable chain-of-thought reasoning.

My research interests include embodied AI, RL, Agents, vision-language models, diffusion models, and mechanistic interpretability. I focus on building AI systems that can perceive, reason, and act in complex environments, with an emphasis on uncertainty quantification and neurosymbolic approaches for planning and perception.


Publications

What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning

Under Submission
Siva, R., Bhatt, N. P., Yang, Y., Lee, S., Gadde, N., Ellis, C., Velasquez, A., Wang, Z., Topcu, U.

A4D maps visual observations into a shared functional latent space structured around affordances (e.g., "movable"), enabling robot planning based on task-relevant object functionalities rather than appearance alone. We achieve 94% inference accuracy on existing affordances — over 15% above state-of-the-art — while enabling 100x faster inference.

Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

MLSys, 2025 (Oral) 🏆
Bhatt, N. P., Yang, Y., Siva, R., Milan, D., Topcu, U., Wang, Z.

A novel framework for uncertainty-aware multimodal planning using conformal prediction for perception uncertainty and FMDP to quantify decision uncertainty, with formal verification guarantees. Building on this, we implement active sensing and automated refinement via SFT to meet task specifications, reducing variability by 40% and improving task success by 5%.

UNCAP: Uncertainty-Guided Planning Using Natural Language Communication for Cooperative Autonomous Vehicles

AAMAS, 2026 (Oral + Best Paper Nominee) 🏆
Bhatt, N. P., Li, P., Gupta, K., Siva, R., Milan, D., Hogue, A. T., Chinchali, S. P., Fridovich-Keil, D., Wang, Z., Topcu, U.

A framework for uncertainty-guided planning in cooperative autonomous vehicles using natural language communication. Leverages uncertainty quantification to improve coordination and decision-making in multi-agent autonomous driving scenarios.

VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation

NEUS, 2026 (Oral) 🏆
Bhatt, N. P., Yang, Y., Siva, R., Samineni, P., Milan, D., Wang, Z., Topcu, U.

A neurosymbolic approach combining vision-language models with cache-enabled planning for zero-shot robot navigation. Enables rapid exploration and transfer learning in novel environments without task-specific training.

kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation

Under Submission
Siva, R., Cheung, K., Li, L., Sundaram, G.

An AI agent that translates natural language specifications into production-ready Kubeflow Pipelines. Introduces ReQuesAct to clarify user intent prior to pipeline synthesis, with retrieval-augmented tool generation and LLM-based validation. Achieves a 3x improvement in extraction/loading success and 25% higher transformation accuracy over state-of-the-art agentic baselines.

RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification

Under Submission
Yang, Y., Bhatt, N. P., Samineni, P., Siva, R., Wang, Z., Topcu, U.

A novel framework for safety-verifiable reinforcement learning by learning safety-separable latent spaces that enable efficient neurosymbolic plan verification. Combines the representational power of deep learning with the formal guarantees of symbolic methods for scalable safety verification in complex environments.