I built TAWK because I was tired of typing. That is genuinely the whole origin story. I spend hours every day writing messages, emails, documents, and notes. My thoughts move faster than my fingers, and I kept wishing I could just talk and have the words appear on screen. So I built the thing I wanted.
TAWK is a voice-to-text Mac app. You press a keyboard shortcut, speak, and your words appear wherever your cursor is. It runs entirely on your machine using OpenAI's Whisper model. No cloud. No subscriptions. No sending your voice data to anyone's server. You buy it once for $19 at gettawk.com, and it is yours.
What I want to share here is not a polished startup narrative. It is the actual, messy reality of taking a personal itch and turning it into a shipped product that real people pay for. The decisions that mattered. The technical landmines I stepped on. And what I would tell anyone thinking about building their own thing.
The Decision to Build
I looked at every voice-to-text option on the Mac. Apple's built-in dictation requires an internet connection and is mediocre at best. Whisper-based solutions existed but most were command-line tools or janky wrappers. The polished options were subscription-based, which always felt wrong for something this simple. I wanted it to work offline, be private, cost a flat fee, and just work.
Nothing fit. So on a weekend, I opened my editor and started building.
The best side projects start when you cannot find the thing you need and decide to make it yourself.
The Tech Stack: Python, Whisper, and rumps
I chose Python because I know it well and because Whisper has excellent Python bindings. For the Mac menu bar interface, I used rumps, a lightweight library for building macOS status bar apps. The core flow is simple: listen for a global keyboard shortcut, record audio from the microphone, run Whisper's small model on the recording, and paste the transcribed text at the cursor position.
Whisper's "small" model hits the sweet spot between accuracy and speed. It is fast enough to feel real-time on any modern Mac while being accurate enough for everyday dictation. And because it runs locally, there is zero network latency. You finish speaking and the text appears in under a second.
Why Offline-First Was Non-Negotiable
People dictate private things. Emails to their partner. Journal entries. Messages to their therapist. Business strategy notes. The idea of sending all of that to a server felt fundamentally wrong. I made the decision early on that TAWK would process everything on-device. Period.
This was also a business decision. Cloud-based speech-to-text means ongoing API costs, which means you either eat the margin or charge a subscription. Offline processing means my costs are essentially zero after the sale. That let me charge a one-time $19 fee and keep it there. No recurring revenue for me, but a much better deal for the customer. And a much simpler business.
The Prototype Was Easy. The Product Was Hard.
Getting a working prototype took a weekend. Press a key, record, transcribe, paste. Done. I used it myself for a few weeks and it worked great on my machine. Then I tried to turn it into something other people could install. That is when the real work started.
PyInstaller: The Bundling Nightmare
macOS apps are bundles. You cannot just tell someone to install Python, pip install a bunch of dependencies, and run a script. You need a proper .app that people can drag into their Applications folder. I used PyInstaller for this, and it was the most time-consuming part of the entire project.
PyInstaller does not automatically bundle Whisper's model assets. Files like mel_filters.npz that Whisper needs at runtime were simply missing from the bundle. I had to manually specify them in the PyInstaller spec file. Then there were the PyTorch dependencies, which are massive. The initial bundle was over a gigabyte. I spent days trimming unnecessary CUDA libraries and other artifacts that are not needed on macOS.
Code Signing and Notarization
Apple requires apps to be signed and notarized before macOS will run them without scary warnings. This is reasonable from a security perspective but brutal for indie developers to implement. Here is what I learned the hard way:
- You must sign every single Mach-O binary inside the app bundle individually. Not just the
.soand.dylibfiles, but standalone executables likeprotocand the Python framework itself. - The
--deepflag forcodesignis tempting but Apple explicitly discourages it for notarization. You need to sign each binary from the inside out. - Notarization can get stuck "In Progress" for large apps that include PyTorch. This is a documented behavior. You just have to wait.
- macOS TCC (Transparency, Consent, and Control) uses CDHash for permission tracking. Ad-hoc signed apps get a new CDHash on every rebuild, which means users lose their microphone permissions every time you ship an update. You need a proper Developer ID certificate.
I spent more time on signing and notarization than on the actual voice-to-text functionality. This is the part nobody tells you about when they say "just ship it."
The LSUIElement Discovery
For menu bar apps on macOS, you need to set LSUIElement: true in your Info.plist. Without it, your app shows up in the Dock and the Cmd+Tab switcher, which is wrong for a utility that should live quietly in the menu bar. I also learned that calling setActivationPolicy_(Accessory) after showing a modal window causes all sorts of focus and rendering issues. The fix was to let LSUIElement handle it from the start and not touch the activation policy at runtime.
The Keyboard Event Gotcha
TAWK pastes transcribed text by simulating keyboard events using CGEventPost. What I did not realize is that CGEventPost inherits the current modifier key state. So if the user was holding Shift when they triggered TAWK, the pasted text would come out all caps or with weird characters. The fix was to explicitly clear modifier flags with CGEventSetFlags(event, 0) before posting each keystroke. A tiny detail that took hours to debug.
Going From "It Works" to "It Ships"
The gap between a working prototype and a shippable product is enormous. Here is everything I had to build beyond the core functionality:
- A proper onboarding flow that guides users through granting microphone and accessibility permissions
- File-based logging, because when your app is a
.appbundle, stdout and stderr are invisible - Automatic update checking
- A landing page at gettawk.com with Stripe checkout
- A license key system
- A download delivery mechanism through GitHub Releases
Each of these is straightforward in isolation. Together, they represent weeks of work that have nothing to do with voice-to-text and everything to do with being a real product.
The One-Time Payment Decision
The SaaS playbook says charge monthly. Recurring revenue. Higher lifetime value. I understand the math. But TAWK is a utility. It does one thing. It does not use a server. There are no ongoing costs on my end. Charging people $8 a month for something that runs entirely on their hardware felt dishonest.
So I set it at $19 one-time. It felt right. It is cheap enough that the purchase is impulsive but expensive enough that people take the product seriously. Every time someone buys TAWK, they own it forever. That simplicity has value that does not show up in a revenue model spreadsheet.
Not every product needs to be a subscription. Some of the best tools are the ones you buy once and they just work.
What I Would Tell Anyone Building a Side Project
Build for yourself first. The strongest products come from genuine personal need. You are your own best user because you have real context, real frustration, and real taste about what the solution should feel like.
Ship ugly, ship fast. The first version of TAWK was embarrassingly rough. The menu bar icon was wrong. The onboarding was confusing. But it worked. Getting real users on a rough product teaches you more in a week than perfecting it in a vacuum for a month.
The "last mile" is 80 percent of the work. Getting something working on your machine is maybe 20 percent of the project. Code signing, notarization, licensing, landing pages, support infrastructure, and distribution are the other 80 percent. Budget for it.
Pick boring technology you know. I built TAWK in Python, which is not the "right" language for a Mac app. Swift would have been more native. But I know Python deeply, and that let me move fast. When your goal is shipping, familiarity beats elegance every time.
Logging saves your sanity. Add file-based logging from day one. When a user reports a bug, you need to be able to see what happened. For distributed desktop apps, this is not optional.
Keep it simple. TAWK does one thing. Press a key, speak, see text. Every feature request I get asks me to add something that would make it more complicated. I say no to almost all of them. The power of the product is in its simplicity.
Where TAWK Is Today
TAWK is live at gettawk.com. It is signed, notarized, and running on Macs around the world. I still use it every day. Every time I dictate a message or a note or a paragraph of this blog post, I am using my own product. That feedback loop is the reason it keeps getting better.
Building TAWK reminded me of something I already knew but sometimes forget in my day job scaling a large company: the best things start small, start personal, and start with someone scratching their own itch. If you have an idea for something you wish existed, build it. The world does not need another planning document. It needs another shipped product.