
Listen Then See - CVPR 2024
P. Sethi*, A. Agrawal*, CMS Lezcano*, I. Heredia*, “Listen Then See: Video Alignment with Speaker Attention”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024
I've joined Snap as a Machine Learning Engineer working on Generative 3D.
Previously, I was a grad student at Carnegie Mellon University in the Master of Science in Computer Vision program advised by Dr. Shubham Tulsiani and graduated in December 2024.
I interned at Apple with the Vision Pro team in Summer 2024 as a Computer Vision Research Intern researching 3D reconstruction for AR / VR applications using Gaussian Splatting.
Prior to CMU, I was working as a Computer Vision Engineer - II at Wobot.ai where I deployed CV solutions serving 10k+ customers and worked with Dr. Biplab Banerjee.
I completed my Bachelor of Engineering in Computer Science from Government College of Engineering, Nagpur (GCOEN).
When I'm not working, I like to play snooker, read books or listen to music.
Resume | Email
S. Jain*, A. Kuthiala*, P. Sethi, P. Saxena, “StyleSplat: 3D Object Style Transfer with Gaussian Splatting”
P. Sethi*, A. Agrawal*, CMS Lezcano*, I. Heredia*, “Listen Then See: Video Alignment with Speaker Attention”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024
R. Zawar, P. Sethi, R. Roy, “Jensen-Shannon Divergence in Safe Multi-Agent RL”, in ICLR, Tiny Paper Track, 2024
We propose a novel approach that, given a few RGB images of a previously unobserved scene with or without known poses, produces a 3D scene representation in a single feed-forward pass. Specifically, our method predicts 3D Gaussians that can be rendered from any novel view at interactive speeds.
GIF-Tune is a one-shot tuning strategy for continuous text-to-GIF synthesis. The model is trained on a single text-GIF pair and can generate GIFs from any text prompt.
Real-time Unmanned Aerial Vehicle (UAV) detection system. The objective of the project is to make a real-time embedded drone detection system for a flying vehicle from the infrared data. The model should detect UAV in presence of varying UAV sizes/types, altitudes, distances and lighting conditions.
A smart autonomous drone with Object Tracking and Object Detection capabilities. This project was done in 2 phases: Person Tracking algorithm with Tello Drone. Find related files here Making a Custom Autonomous Drone with real-time intruder detection alerts. Find related files here
Fragmentation Analysis is a key check used by mining engineers after blasting to determine the efficacy of blast or blast accuracy. It focuses on checking the average size of rocks/fragments generated after blast, This is a image processing / computer vision approach using Holistically Nested Edge Detection Algorithm (HED)
Fume Analysis is a key check used by mining engineers while blasting. It focuses on checking specific fumes which are toxic in nature, This is a image processing / computer vision approach to filter and find percentage of these toxic fumes based on colors
This project focuses on detecting a specific form of image forgery known as a copy-move attack, in which a portion of an image is copied and pasted elsewhere.