Skip to main content

Command Palette

Search for a command to run...

VideoPod — Building an AI-Powered SaaS to Simplify Video Processing

Updated
5 min read
VideoPod — Building an AI-Powered SaaS to Simplify Video Processing
S
I’m a full-time software professional with a robust foundation in computer science and modern application development, currently deepening my expertise in databases, cloud infrastructure, DevOps, and containerized systems. Let’s connect and discuss tech, innovation, and opportunities to build something amazing!

The Problem Statement

What if a 40-minute meeting could be reduced to 5 minutes of only the important moments?

For example, a 40-minute team meeting can be automatically converted into 6–8 short clips highlighting key discussions, saving time and improving productivity.

A Journey of Challenges and Growth

Developing a SaaS application has been one of the most enlightening experiences of my engineering journey so far. From ideation to deployment, this project tested my technical skills and taught me invaluable lessons that no book or tutorial could ever cover. Here is the story of my project—its purpose, features, challenges, and the ingenious solutions I discovered along the way.


The Idea: Simplifying Video Processing

The inspiration for this project stemmed from the increasing demand for video editing tools that are accessible, efficient, and tailored to specific needs. Most available tools are either overly complex or too basic to handle detailed processing. I wanted to create a platform where users could:

  • Compress large videos.

  • Stabilize shaky footage.

  • Extract key moments using AI.

  • Convert videos to different formats like MP3, MOV, and AVI.

  • Split videos at specific intervals.

  • Resize videos to desired dimensions.

Thus, my SaaS application was born—an intuitive, browser-based tool leveraging FFmpeg and AI to handle video processing efficiently.


Core Features

Here’s what the application currently offers:

  1. Video Compression: Reducing file size without significant quality loss.

  2. Stabilization: Fixing shaky videos to create smooth playback.

  3. AI Key Moment Extraction: Using VOSK for transcription and Google Gemini API to identify important segments. These moments are split and returned as individual clips.

  4. Format Conversion: Converting videos to MP3, MOV, and AVI formats seamlessly.

  5. Video Splitting: Allowing users to split videos at specific timestamps.

  6. Dimension Resizing: Adjusting video dimensions to match specific requirements.

Currently, the application accepts video uploads of less than 500 MB and 720p resolution due to server limitations.


The Technical Backbone

FFmpeg: The Heart of the Project

FFmpeg powers most of the video processing functionalities. Its versatility allowed me to implement compression, format conversion, stabilization, and splitting with ease. However, using FFmpeg also came with its challenges, particularly in terms of resource usage and deployment.

AI Integration for Key Moment Extraction

Initially, I planned to use OpenAI’s Whisper for transcription. However, its high memory requirements caused my server to crash. Switching to VOSK, a lightweight yet powerful speech recognition library, resolved this issue. Using the transcriptions generated by VOSK, I leveraged Google Gemini API to identify key moments, which FFmpeg then processed into separate video segments.


Architecture

The application follows a microservices architecture with containerized components:

Deployment: The Real Test

Deploying the application brought an entirely new set of challenges that pushed me to learn and adapt in real-time.

Server Constraints

The backend runs on a Virtual Private Cloud (VPC) with limited resources: 1 GB RAM, 35 GB storage, and 1 CPU. These constraints forced me to make tough decisions, such as limiting video uploads to 500 MB and 720p resolution. Here’s how I tackled specific deployment issues:

  1. High Memory Usage:

    • Initially, the application used 100% of the CPU, causing processes to shut down with a 139 error code. By analyzing container logs, I realized the need to restrict resource usage.

    • I limited Docker containers to use 70% CPU and 50% RAM, and added 1 GB swap memory. This ensured the server remained stable during intensive processing.

  2. Python Version Conflicts:

    • Differences in Python versions between my local environment and the server led to compatibility issues. Using Docker, I containerized the application, ensuring consistency across environments.
  3. NGINX Configuration Tweaks:

    • NGINX initially blocked large video uploads, so I increased the maximum upload size to 500 MB.

    • Since video processing takes significant time, NGINX’s default timeout was triggering premature responses. I modified the timeout settings to accommodate longer processing durations.

  4. Resource-Heavy FFmpeg Tasks:

    • Processing high-resolution videos consumed excessive resources. To manage costs, I restricted uploads to 720p resolution and 500 MB size.

The Deployment Stack

Frontend

The frontend is built with Next.js and deployed on Vercel. Its clean and intuitive interface ensures a seamless user experience.

Backend

The backend is containerized with Docker and deployed on DigitalOcean. This setup provides the flexibility and control needed to manage server-side operations effectively.


Lessons Learned

  1. Practical Experience Beats Theory:

    • Challenges like CPU throttling, memory management, and NGINX configurations were concepts I had only vaguely encountered before. Solving these problems hands-on taught me more than any book could.
  2. The Importance of Logging:

    • Analyzing Docker container logs was instrumental in diagnosing and fixing resource-related issues.
  3. The Power of Optimization:

    • Simple adjustments, like limiting resolution and adding swap memory, made a significant difference in server performance.
  4. Adaptability:

    • Switching from Whisper to VOSK saved the project from stalling. Staying flexible and exploring alternatives is crucial in development.

Conclusion

Building this SaaS application was a journey filled with hurdles and triumphs. It not only enhanced my technical expertise but also strengthened my problem-solving skills. Each challenge, from FFmpeg optimization to deployment struggles, taught me lessons that I’ll carry forward in my career.

If you’re embarking on a similar project, my advice is simple: embrace the failures. They’re not setbacks but stepping stones to success. And remember, the best learning happens outside the classroom—when you’re debugging, experimenting, and pushing your limits.

Feel free to check out the repository of VideoPod. I hope this journey inspires you to take on your own ambitious projects!