Anubhav Paras Interested in the art of making anything (or anyone) learn something.. :)

I am a Software Engineer at Zoox, where I work on decision-making algorithms and design scalable and distributed reinforcement learning (RL) systems for autonomous robo-taxis. My work blends research and production engineering to move ideas from academic papers to large-scale, deployed systems.

I went through a “Four-Year Transform” during my bachelors in Electrical and Electronics Engineering from VNIT, Nagpur. After graduating, I worked at Morgan Stanley in Bengaluru as a full-stack developer before pursuing a Master’s in Robotics at the University of Maryland, College Park.

Currently, I am exploring action-conditioned world models for learning environment dynamics and how in-context learning can be integrated into world models or video-based planners to enable task and embodiment adaptation with improved sample efficiency. More broadly, my interests span both the applied and theoretical foundations of deep learning and RL in robotics to learn generalized and robust long-horizon policies.

Research interests: Reinforcement Learning , Generative World Models, Multimodal LLMs, Interpretability, Computer Vision , Robotics .

Recent Activities

Fresh work and updates from the last few months.

Education

University of Maryland, College Park, USA

Master of Engineering, Robotics | Jan 2020 - Dec 2021 | GPA: 4.0/4.0

Teaching Assistant for CMSC426 - Computer Vision (Spring 2021).

Visvesvaraya National Institute of Technology, Nagpur, India

Bachelor of Technology, Electrical and Electronics Engineering | Jul 2012 - May 2016 | GPA: 8.64/10

Student Mentor (2014-16), UG Academic Affairs Secretary - Student Council (2015-16)

Experience

Publications & Preprints

  • Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning
    M Harmel, A Paras, A Pasternak, N Roy, G Linscott
    DDADS Workshop, CVPR 2024
  • Adversarial Agent Behavior Learning in Autonomous Driving Using Deep Reinforcement Learning
    A Srinivasan, A Paras, A Bera
    Preprint, 2021
  • FM Based Localization and Mapping System with Real Time Implementation on FPGA
    A Dubey, A Kulkarni, A Paras, A Deole, AS Gandhi, KM Bhurchandi
    IEEE ICTC, 2015

Patents

  • Scalable Generative Reward Models for Autonomous Driving Policies
    A Paras
    Filed 2025
  • Interpretable Reward Function Decomposition for Autonomous Driving Policies
    A Paras
    Filed 2025
  • Retrieval Augmented Policy Generation - Learning to drive from rules of the road
    A Paras
    Filed 2025
  • Preference-Based Reinforcement Learning for Autonomous Vehicles
    A Paras, M Harmel, A Pasternak
    Filed 2024
  • Reinforcement Learning Policy Test Framework
    A Paras, M Harmel, A Pasternak
    Filed 2024
  • Hardware-Accelerated Reinforcement Learning for Machine-Learned Vehicle Control Model
    M Harmel, A Paras, A Pasternak, N Roy, G Linscott
    Filed 2023: US Patent No: 63/610,387 (Dec 2023)

Projects

  • Interpretability in Autonomous Driving preview

    Interpretability in Autonomous Driving

    Sept 2025

    Visual attribution analysis of RL agents.

    Code
  • GestureGAN optimization preview

    GestureGAN Optimization for Robotics Applications with Neural Architecture Search

    Dec 2020

    Optimized cross-view image generation using MobileNet (5.7x parameter reduction) on Nvidia Jetson.

    Code, Report
  • Image segmentation project preview

    Image Segmentation with Superpixels and Zoom-out Features

    Dec 2020

    Given an image computed superpixels and classified each superpixel as one of the 9 classes of MSRC v1.

    Code
  • LQR and LQG controllers preview

    Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) Controllers

    Dec 2020

    Controllers to balance two inverted pendulums on a moving cart using LQR and LQG.

    Code
  • Quadrotor autonomous control preview

    Quadrotor Autonomous Control

    Nov 2020

    Low-level autonomous control and tracking of quadrotor using reinforcement learning - Proximal Policy Optimization (PPO).

    Code
  • Self-adjusting roadmaps preview

    Self-Adjusting Roadmaps using Low Density Probabilistic Roadmaps (LD-PRM)

    May 2020

    Path planning using self-adjusting roadmaps for unknown environments.

    Code
  • Visual odometry preview

    Visual Odometry - Camera motion estimation

    May 2020

    Plotting 3D motion of a car-mounted camera.

    Code
  • Color segmentation preview

    Color Segmentation

    Apr 2020

    Underwater Buoy detection using Gaussian Mixture Models (GMM) and Expectation-Maximization (EM) Algorithm.

    Code

Academic Services

Reviewer

  • International Conference on Learning Representations (ICLR): 2025, 2026
  • ICLR Workshop on World Models: 2026
  • Artificial Intelligence and Statistics Conference (AISTATS): 2025, 2026
  • ICCV LIMIT Workshop: 2025
  • IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS): 2025
  • Conference on Robot Learning (CoRL): 2024
  • Conference on Neural Information Processing Systems (NeurIPS): 2024, 2025
  • NeurIPS Workshop on MATH-AI: 2024
  • NeurIPS Workshop on Machine Learning Compression: 2024
  • International Symposium of Robotics Research (ISRR): 2024

Panelist

Driving Decisions: The Future of Autonomy
May 2025
Addressed an audience of 100+ participants on learning-based decision making in autonomous driving, AI and human preference alignment and safety evaluation for self-driving cars, and student career preparation.

Contact

Open to collaborations, speaking engagements, and research discussions.

Happy Learning! 🤝
Keep Smiling.. 🙂