Anubhav Paras Interested in the art of making anything (or anyone) learn something.. :)

I am a Software Engineer at Zoox, where I work on decision-making algorithms and design scalable and distributed reinforcement learning (RL) systems for autonomous robo-taxis. My work blends research and production engineering to move ideas from academic papers to large-scale, deployed systems.

I went through a “Four-Year Transform” during my bachelors in Electrical and Electronics Engineering from VNIT, Nagpur. After graduating, I worked at Morgan Stanley in Bengaluru as a full-stack developer before pursuing a Master’s in Robotics at the University of Maryland, College Park.

Currently, I am exploring action-conditioned world models for learning environment dynamics and how in-context learning can be integrated into world models or video-based planners to enable task and embodiment adaptation with improved sample efficiency. More broadly, my interests span both the applied and theoretical foundations of deep learning and RL in robotics to learn generalized and robust long-horizon policies.

Research interests: Reinforcement Learning , Generative World Models, Multimodal LLMs, Interpretability, Computer Vision , Robotics .

View Publications Contact

Recent Activities

Fresh work and updates from the last few months.

Jan 2026: Created this fun project Reels to Video to extract relevant video clips from a video.
Jan 2026: New video on Are Forward KL and Reverse KL always Mode-Covering and Mode-Seeking? The Asymmetry of KL Divergence.

Education

University of Maryland, College Park, USA

Master of Engineering, Robotics | Jan 2020 - Dec 2021 | GPA: 4.0/4.0

Teaching Assistant for CMSC426 - Computer Vision (Spring 2021).

Visvesvaraya National Institute of Technology, Nagpur, India

Bachelor of Technology, Electrical and Electronics Engineering | Jul 2012 - May 2016 | GPA: 8.64/10

Student Mentor (2014-16), UG Academic Affairs Secretary - Student Council (2015-16)

Experience

Zoox, United States

Software Engineer | Jan 2022 - Present

Working on designing scalable and distributed reinforcement learning (RL) framework and algorithms to learn autonomous driving behavior policies.
Co-developed a high-throughput, low-latency JAX-based RL simulator (published in DDADS Workshop, CVPR 2024).
Architected the Imitation Learning pipeline from the ground up featuring a Perceiver-style transformer model and efficient data loaders to establish a faster, memory-efficient foundation for RL training.
Enabled large-scale training by integrating Mosaic Data Streaming (MDS), eliminating critical data-copy bottlenecks and streaming massive datasets directly from cloud storage.
Improved RL policy learning and sample efficiency by introducing Prioritized Experience Replay (PER) in distributed settings to focus training on the most informative experiences.
Engineered a custom metric database tool with optimized queries and dedicated database instances, achieving a 98% reduction in data response time and accelerating model validation and analysis.
Authored and filed patents on Generative Reward Models, Interpretable Reward Decomposition, Retrieval-Augmented Policy Generation, and Hardware-Accelerated RL.

Zoox, United States

Research Intern | Jun 2021 - Dec 2021

Built statistical tools for evaluating performance, robustness, and generalization of learned policies.
Investigated quality impacts of sharing weights between value and policy networks.

GAMMA Lab, University of Maryland

Independent Researcher | Feb 2021 - May 2021

Worked with Dr. Aniket Bera on adversarial agent behavior learning in autonomous driving using deep multi-agent reinforcement learning.
Implemented adversarial reward formulation and defense pipeline to enhance robustness of self-driving agents.

Morgan Stanley, India

Software Engineering Manager | Aug 2016 - Dec 2019

Developed distributed web applications for client onboarding and account management used by 15k+ financial advisors.
Created and deployed frontend applications using Angular and REST web services using Java8, Spring, JMS MQs, DB2.
Built an automated meeting-minutes generator using Flask, AWS Transcribe, RNNs, and Markov chains. [Code]
Served as a subject matter expert (SME) for Blockchain, Object Oriented Design, and Machine Learning (ML).
Delivered sessions on ML pipelines, feature engineering, and neural network design and architectures.

SINE Lab, IIT Bombay

Summer Intern | May 2014 - Jul 2014

Built a dynamixel-based snake robot with sidewinding, sliding, and rolling gaits. [Code, Video]

Publications & Preprints

Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning

M Harmel, A Paras, A Pasternak, N Roy, G Linscott

DDADS Workshop, CVPR 2024

arXiv
Adversarial Agent Behavior Learning in Autonomous Driving Using Deep Reinforcement Learning

A Srinivasan, A Paras, A Bera

Preprint, 2021

arXiv
FM Based Localization and Mapping System with Real Time Implementation on FPGA

A Dubey, A Kulkarni, A Paras, A Deole, AS Gandhi, KM Bhurchandi

IEEE ICTC, 2015

IEEE,Demo

Patents

Scalable Generative Reward Models for Autonomous Driving Policies

A Paras

Filed 2025
Interpretable Reward Function Decomposition for Autonomous Driving Policies

A Paras

Filed 2025
Retrieval Augmented Policy Generation - Learning to drive from rules of the road

A Paras

Filed 2025
Preference-Based Reinforcement Learning for Autonomous Vehicles

A Paras, M Harmel, A Pasternak

Filed 2024
Reinforcement Learning Policy Test Framework

A Paras, M Harmel, A Pasternak

Filed 2024
Hardware-Accelerated Reinforcement Learning for Machine-Learned Vehicle Control Model

M Harmel, A Paras, A Pasternak, N Roy, G Linscott

Filed 2023: US Patent No: 63/610,387 (Dec 2023)

Projects

Interpretability in Autonomous Driving

Sept 2025

Visual attribution analysis of RL agents.
Code
GestureGAN Optimization for Robotics Applications with Neural Architecture Search

Dec 2020

Optimized cross-view image generation using MobileNet (5.7x parameter reduction) on Nvidia Jetson.
Code, Report
Image Segmentation with Superpixels and Zoom-out Features

Dec 2020

Given an image computed superpixels and classified each superpixel as one of the 9 classes of MSRC v1.
Code
Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) Controllers

Dec 2020

Controllers to balance two inverted pendulums on a moving cart using LQR and LQG.
Code
Quadrotor Autonomous Control

Nov 2020

Low-level autonomous control and tracking of quadrotor using reinforcement learning - Proximal Policy Optimization (PPO).
Code
Self-Adjusting Roadmaps using Low Density Probabilistic Roadmaps (LD-PRM)

May 2020

Path planning using self-adjusting roadmaps for unknown environments.
Code
Visual Odometry - Camera motion estimation

May 2020

Plotting 3D motion of a car-mounted camera.
Code
Color Segmentation

Apr 2020

Underwater Buoy detection using Gaussian Mixture Models (GMM) and Expectation-Maximization (EM) Algorithm.
Code

Academic Services

Reviewer

International Conference on Learning Representations (ICLR): 2025, 2026
ICLR Workshop on World Models: 2026
Artificial Intelligence and Statistics Conference (AISTATS): 2025, 2026
ICCV LIMIT Workshop: 2025
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS): 2025
Conference on Robot Learning (CoRL): 2024
Conference on Neural Information Processing Systems (NeurIPS): 2024, 2025
NeurIPS Workshop on MATH-AI: 2024
NeurIPS Workshop on Machine Learning Compression: 2024
International Symposium of Robotics Research (ISRR): 2024

Panelist

Driving Decisions: The Future of Autonomy

May 2025

Addressed an audience of 100+ participants on learning-based decision making in autonomous driving, AI and human preference alignment and safety evaluation for self-driving cars, and student career preparation.

Event

Contact

Open to collaborations, speaking engagements, and research discussions.

Email: anubhavp@terpmail.umd.edu

Anubhav Paras Interested in the art of making anything (or anyone) learn something.. :)

Recent Activities

Education

University of Maryland, College Park, USA

Visvesvaraya National Institute of Technology, Nagpur, India

Experience

Zoox, United States

Zoox, United States

GAMMA Lab, University of Maryland

Morgan Stanley, India

SINE Lab, IIT Bombay

Publications & Preprints

Patents

Projects

Interpretability in Autonomous Driving

GestureGAN Optimization for Robotics Applications with Neural Architecture Search

Image Segmentation with Superpixels and Zoom-out Features

Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) Controllers

Quadrotor Autonomous Control

Self-Adjusting Roadmaps using Low Density Probabilistic Roadmaps (LD-PRM)

Visual Odometry - Camera motion estimation

Color Segmentation

Academic Services

Reviewer

Panelist

Contact