Hello World!!đź‘‹

I've joined Snap as a Machine Learning Engineer working on Generative 3D.

Previously, I was a grad student at Carnegie Mellon University in the Master of Science in Computer Vision program advised by Dr. Shubham Tulsiani and graduated in December 2024.
I interned at Apple with the Vision Pro team in Summer 2024 as a Computer Vision Research Intern researching 3D reconstruction for AR / VR applications using Gaussian Splatting.

Prior to CMU, I was working as a Computer Vision Engineer - II at Wobot.ai where I deployed CV solutions serving 10k+ customers and worked with Dr. Biplab Banerjee.
I completed my Bachelor of Engineering in Computer Science from Government College of Engineering, Nagpur (GCOEN).

When I'm not working, I like to play snooker, read books or listen to music.

Resume | Email

Publications

StyleSplat - In Submission

S. Jain*, A. Kuthiala*, P. Sethi, P. Saxena, “StyleSplat: 3D Object Style Transfer with Gaussian Splatting”

multimodal_project_results

Listen Then See - CVPR 2024

P. Sethi*, A. Agrawal*, CMS Lezcano*, I. Heredia*, “Listen Then See: Video Alignment with Speaker Attention”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024

JSD_safeRL

Jensen-Shannon Divergence in Safe Multi-Agent RL - ICLR 2024

R. Zawar, P. Sethi, R. Roy, “Jensen-Shannon Divergence in Safe Multi-Agent RL”, in ICLR, Tiny Paper Track, 2024

Projects

generalizable_3dgs_arch

Generalizable, Sparse and Unposed 3D Gaussian Splatting

We propose a novel approach that, given a few RGB images of a previously unobserved scene with or without known poses, produces a 3D scene representation in a single feed-forward pass. Specifically, our method predicts 3D Gaussians that can be rendered from any novel view at interactive speeds.

gif_tune_teaser

GIF Tune

GIF-Tune is a one-shot tuning strategy for continuous text-to-GIF synthesis. The model is trained on a single text-GIF pair and can generate GIFs from any text prompt.

uav_inference

UAV Detection

Real-time Unmanned Aerial Vehicle (UAV) detection system. The objective of the project is to make a real-time embedded drone detection system for a flying vehicle from the infrared data. The model should detect UAV in presence of varying UAV sizes/types, altitudes, distances and lighting conditions.

follow_me_uav_res

Smart AI Autonomous Drone (Person Tracking, Intruder Detection)

A smart autonomous drone with Object Tracking and Object Detection capabilities. This project was done in 2 phases: Person Tracking algorithm with Tello Drone. Find related files here Making a Custom Autonomous Drone with real-time intruder detection alerts. Find related files here

fragmentation_result

Framgmentation Analysis using HED

Fragmentation Analysis is a key check used by mining engineers after blasting to determine the efficacy of blast or blast accuracy. It focuses on checking the average size of rocks/fragments generated after blast, This is a image processing / computer vision approach using Holistically Nested Edge Detection Algorithm (HED)

fume_res

Fume Analysis

Fume Analysis is a key check used by mining engineers while blasting. It focuses on checking specific fumes which are toxic in nature, This is a image processing / computer vision approach to filter and find percentage of these toxic fumes based on colors

benford's_res

Image Forgery Detection Using Benford's Law

This project focuses on detecting a specific form of image forgery known as a copy-move attack, in which a portion of an image is copied and pasted elsewhere.