Vibe Coding the MIT Course Catalog

Vibe Coding the MIT Course Catalog

As a journalist, I've had my fair share of navigating complex systems and finding creative solutions to seemingly insurmountable problems. Recently, I made the bold decision to leave Microsoft and join MIT's Media Arts and Sciences program. The transition brought an immediate problem: how do you navigate course selection when faced with the "unknown unknowns"? You can easily find courses you already know you want to learn, i.e., "known unknowns." But discovering courses you never knew existed, courses that might reshape your thinking entirely, requires different tools altogether.

MIT's official course catalog runs on what appears to be a CGI script. The technology dates back to the 1990s. You cannot search as you type. Everything feels slow. The popular student-built alternative, Hydrant, offers fast search but displays details for one course at a time. Neither tool works well for browsing or screening multiple options simultaneously. More importantly, both tools remain stubbornly human-centric in an age where LLMs should help us make better decisions.

A New Challenge

I decided to build Courseek, a new tool that solves this problem while testing how far I could push AI-assisted development. GitHub Copilot had become an expensive alternative to autocompleting functions and generating boilerplate. Could I go all in on vibe-coding this time? In addition, I want to test whether LLM can successfully use my personal (Un)framework. It would be nice to break the React + Tailwind duopoly.

Scraping the MIT Course Catalog

MIT's course catalog contains around 2.3k courses. I scraped them from MIT Course Picker with an embarrassingly simple script: The script outputs a JSON array in the console, which I copied and counted the tokens using OpenAI's tokenizer - 343k token for the entire catalog! The count feels high but JSON is more verbose than plain text and several metadata fields are irrelevant for LLM use, so we have essentially established an upper bound.

Understanding the Current Tools

Before building anything new, I spent time with the existing tools to understand their strengths and frustrations. Hydrant represents the best of student-built solutions. Its search functionality works great, and the interface feels modern compared to MIT's official catalog. But Hydrant forces a linear browsing experience. You search, click a result, read the course description, and repeat. This workflow breaks down when you want to compare multiple courses or get a sense of the landscape within a particular domain.

A New Approach

The official MIT course picker suffers from deeper structural problems. Beyond its dated interface, the tool lacks any real-time feedback. Every search requires a full page reload, making exploratory browsing painful. You cannot easily filter by multiple criteria simultaneously, and the results display minimal information, forcing yet more clicks to understand what each course actually covers.

Breaking Free from Human-Centricity

Both tools share a deeper architectural flaw: they assume humans are the primary users. In an age where AI agents should help us navigate complex decision spaces, these interfaces remain stubbornly human-centric. They lack the structured, machine-readable formats that would enable LLMs to guide course discovery intelligently.

The Future of Courseek

After this exploration, I set a clear goal: achieve search-as-you-type performance while displaying multiple course details simultaneously. The first MVP was a single HTML page with all course data embedded. Type a query, show or hide results with JavaScript. It worked but felt janky. The UI would stutter when processing large result sets.

Solving the Problem

In the second attempt, I pivoted to a web worker approach, separating rendering from data processing. The main thread stayed responsive, but the results still felt sluggish with over 1,000 matches. Debouncing helped but introduced unpredictable latency that made interactions feel dead.

Enter virtualization through @lit-labs/virtualizer, a pre-release library that solved the problem elegantly. Only visible rows render to the DOM, regardless of result count. Halfway through development, I discovered that Hydrant already exposes machine-readable data at hydrant.mit.edu/latest.json. My manual scraping was unnecessary.

Building the Search Engine

The real test came when building the search engine. For a corpus under 3,000 documents, I suspected that brute-force string matching could still deliver real-time results. The hunch proved correct—no inverted index needed.

However, AI-assisted programming hit its limits when attempting to build the search engine using LLMs. I found myself constantly reprompting and manually tweaking edge cases. The system struggled with courses like "How To Make (Almost) Anything." Should parentheses be required in search terms? What about special characters in "C++"?

A New Solution

The final implementation settled on a three-stage pipeline: matching, scoring, and highlighting. Each stage is a pure function, making the system simple to reason about and test.

The search logic combines matching and scoring into a single step. A course is considered a match if its relevance score is greater than zero. The scoring algorithm prioritizes exact matches and matches at the beginning of words, which aligns with user expectations for search. A score of zero indicates no match. To provide visual feedback, the matched parts of the course title are highlighted.

Keeping Course Data Current

The final challenge was keeping course data current. Hydrant's API endpoint at https://hydrant.mit.edu/latest.json lacks CORS headers, preventing direct browser access. I added a GitHub Actions cron job to fetch this endpoint every 24 hours:

  workflow 'Courseek'
    on: [push]
    jobs:
      build-and-deploy:
        runs-on: ubuntu-latest
        steps:
          - name: Checkout code
            uses: actions/checkout@v2
          - name: Fetch course data
            run: curl https://hydrant.mit.edu/latest.json | jq '.'
          - name: Rebuild and deploy
            run: npm run build && gh-pages:deploy