Hello World!!đź‘‹

I'm a grad student at Carnegie Mellon University in the Master of Science in Computer Vision program currently advised by Dr. Shubham Tulsiani expected to graduate in December 2024.
I'm currently interning at Apple in the Vision Pro team as a Computer Vision Research Intern researching 3D reconstruction for AR / VR applications using Gaussian Splatting.
Previously I was working as a Computer Vision Engineer - II at Wobot.ai where I deployed CV solutions serving 10k+ customers.
I completed my Bachelor of Engineering in Computer Science from Government College of Engineering, Nagpur (GCOEN).
When I'm not working, I like to play snooker, read books or listen to music.

Resume | Email

Publications

StyleSplat - ECCV 2024

S. Jain*, A. Kuthiala*, P. Sethi, P. Saxena, “StyleSplat: 3D Object Style Transfer with Gaussian Splatting”, Proceedings of the Conference on European Conference on Computer Vision (ECCV) Workshops, 2024 (In Review)

multimodal_project_results

Listen Then See - CVPR 2024

P. Sethi*, A. Agrawal*, CMS Lezcano*, I. Heredia*, “Listen Then See: Video Alignment with Speaker Attention”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024

JSD_safeRL

Jensen-Shannon Divergence in Safe Multi-Agent RL - ICLR 2024

R. Zawar, P. Sethi, R. Roy, “Jensen-Shannon Divergence in Safe Multi-Agent RL”, in ICLR, Tiny Paper Track, 2024

Projects

generalizable_3dgs_arch

Generalizable, Sparse and Unposed 3D Gaussian Splatting

We propose a novel approach that, given a few RGB images of a previously unobserved scene with or without known poses, produces a 3D scene representation in a single feed-forward pass. Specifically, our method predicts 3D Gaussians that can be rendered from any novel view at interactive speeds.

gif_tune_teaser

GIF Tune

GIF-Tune is a one-shot tuning strategy for continuous text-to-GIF synthesis. The model is trained on a single text-GIF pair and can generate GIFs from any text prompt.

uav_inference

UAV Detection

Real-time Unmanned Aerial Vehicle (UAV) detection system. The objective of the project is to make a real-time embedded drone detection system for a flying vehicle from the infrared data. The model should detect UAV in presence of varying UAV sizes/types, altitudes, distances and lighting conditions.

follow_me_uav_res

Smart AI Autonomous Drone (Person Tracking, Intruder Detection)

A smart autonomous drone with Object Tracking and Object Detection capabilities. This project was done in 2 phases: Person Tracking algorithm with Tello Drone. Find related files here Making a Custom Autonomous Drone with real-time intruder detection alerts. Find related files here

fragmentation_result

Framgmentation Analysis using HED

Fragmentation Analysis is a key check used by mining engineers after blasting to determine the efficacy of blast or blast accuracy. It focuses on checking the average size of rocks/fragments generated after blast, This is a image processing / computer vision approach using Holistically Nested Edge Detection Algorithm (HED)

fume_res

Fume Analysis

Fume Analysis is a key check used by mining engineers while blasting. It focuses on checking specific fumes which are toxic in nature, This is a image processing / computer vision approach to filter and find percentage of these toxic fumes based on colors

benford's_res

Image Forgery Detection Using Benford's Law

This project focuses on detecting a specific form of image forgery known as a copy-move attack, in which a portion of an image is copied and pasted elsewhere.