← Back to Professional

SPOTTER - AI Fitness Platform

Architected and deployed a full-stack platform to analyze exercise form in real-time from video, integrating machine learning models for pose estimation and movement classification with a responsive infrastructure (AWS, Flask, nginx) to deliver immediate technique feedback to users.

Media

Design Decisions

For a graduate-level Fullstack IoT course with Prof Tingjun Chen, we were tasked with building something unique. Going into my junior year with a desire to work out more, I decided that building a tool to act as a coach while I exercised would be a cool project. To begin, I ended up breaking the problem into three main parts: first was setting up the infrastructure so communication was tight and responsive; second, was classifying the specific exercise; third was giving feedback tailored to that specific exercise. So in designing this, we knew that we needed a way to glean the "landmark" joints from the body in the video frame. after investigating frameworks like Openpose, YOLO, and Mediapipe, we settled on Mediapipe as its speed was slightly better while not making sacrifice on accuracy, and the documentation was more thoughtful. We also wanted to avoid scope creep; so we focused on building just for pushups and squats. We also set a inference time constraint of 3 seconds, as that's roughly the amount of time you need for one rep. For infrastructure, we initially had a simple setup where the RaspberryPi SPOTTER device would communicate directly to a GPU node- an nginx server on my home PC with an RTX3080- over Duke WiFi. We actually ran into a lot of issues with that setup where our connection would be terminated randomly (it was bad- usually only 2-3 minutes per connection). We learned this when I tested at a Starbucks and found that the reliability was remarkably better on their WiFi. Logically, it was something with Duke's WiFi that was interfering; we deduced that it was Duke's cybersecurity flagging our constant polling as suspicious activity. Seeing that our peers were using AWS with decent reliability (and seeing our deadline coming up soon), we decided to switch. We ended up using AWS as a "proxy" node, which essentially passed the same info. However we did need to forward that info to the original GPU node, since cloud GPU instances are frightfully expensive (and we felt that provisioning with spot pricing wasn't fast enough for a demo). So essentially we just had a publisher-subscriber architecture going on, where AWS was just passing the baton. For classification, we were initially considering building an ML model to take in the set of "joint points" in the image, however we found that it was simpler, more efficient, and honestly easier to build a simple heuristic that just analyzed the relative heights of the key joints. (Your shoulders would be more likely to be in line with your hips and ankles vs a squat). Implementing the feedback was a little tricky. The base method was simple: we analyzed the joint angles to determine if there was any under-extension or over-extension, and this worked perfectly well for still images. However, considering the still image in the context of the entire rep, things fall apart. Since we were polling every 1-2 seconds, we had no idea if the still was taken at the "peak" of the rep, at the "rest" of the rep, or somewhere in between. So we changed our approach; I had the idea to use a simple correlation between frames to determine how much "action/movement" there is in the shot. After tuning to be resistant to background noise, I got it to be pretty reliable. We then changed the polling to only trigger once the "action" settled (indicating that the athlete is either at the "peak" or "rest"). We then modified our classification heuristic to also account for these new states; pushup_UP, pushup_DOWN, squat_UP, and squat_DOWN. It is noteworthy that, even with this reduced polling, the original infrastructure was still getting flagged and terminated.

Key Learnings & Takeaways

For the final presentation, we did a successful technical defense, and a successful live demo. (Never before has 50% of my grade rested on my ability to do a pushup...) One of the key things I took away from this project was strengthening my understanding of systems and infrastructure design. I found it incredibly satisfying to build a system that could improve the hot mess that I call my pushup form, and have it perform responsively and reliably while bypassing cybersecurity restrictions.