Tutorial2026-07-28·13 min read

Building an AI Job Agent: Architecture Deep Dive

JobApplier.site isn't a ChatGPT wrapper. It's a multi-stage pipeline that combines GPT for natural language tasks with deterministic logic for everything else. Here's how it's built — open-source, auditable, and running on your machine.

Architecture Overview

The agent follows a pipeline architecture with 6 stages:

  1. Scan — Discover new job postings from LinkedIn, career pages, and ATS portals
  2. Filter — Apply title, location, company, and freshness filters
  3. Score — Calculate semantic match score between job description and your profile
  4. Tailor — Generate a per-JD resume and cover letter using GPT
  5. Apply — Fill and submit the application using ATS-native form handlers
  6. Track — Log the application, store screenshots, and update the dashboard

Stage 1: The Scanner

The scanner runs on a configurable schedule (default: every 6 hours). It checks multiple sources:

  • LinkedIn search — Uses browser automation with stealth driver to search for roles matching your configured keywords and locations
  • Company career pages — Direct HTTP requests to known ATS endpoints (Greenhouse API, Lever API, Ashby GraphQL)
  • RSS/Atom feeds — Some companies publish job feeds that we poll

Every discovered job gets a unique hash based on company + title + location. Duplicates are detected and skipped. The deduplication window is configurable (default: 90 days).

Stage 2: The Filter Chain

Not every job found is worth applying to. The filter chain runs 5 passes:

  1. Title filter — Regex matching against include/exclude patterns. Include: "senior engineer", "staff", "backend". Exclude: "intern", "director", "HR".
  2. Location filter — Geo-matching with support for "Remote", specific countries, and metro areas
  3. Company filter — Include/exclude list with optional cooldown periods (don't re-apply to a company within 60 days)
  4. Freshness filter — Skip postings older than N days (default: 14)
  5. Blacklist filter — Skip specific job IDs that were manually rejected

Each filter logs its decision. You can see why any job was skipped in the dashboard.

Stage 3: The Match Scorer

This is where GPT earns its keep. The scorer sends the job description and your master resume to GPT-5.4-mini with a structured prompt that returns:

  • Overall match score (0-100) — Weighted combination of skills, experience level, and cultural fit
  • Skills overlap — Which required skills you have and which you're missing
  • Level alignment — Whether the seniority matches your experience
  • Red flags — Deal-breakers like "must be in office in NYC" when you're remote-only

Jobs scoring below your threshold (default: 65) are skipped. The threshold is the single most impactful configuration parameter — setting it too low wastes applications, too high misses opportunities.

Stage 4: The Tailoring Engine

For each job that passes filtering and scoring, the engine generates:

  • Tailored resume YAML — GPT rewrites your master resume YAML with JD-specific keyword injection, bullet reordering, and title adjustment
  • PDF rendering — The tailored YAML is rendered into a professional PDF using a Jinja2 template
  • Cover letter — Company-specific, anti-cliche cover letter referencing details from the JD

The cost per tailoring pass is approximately $0.07-0.15 depending on resume length and JD complexity (GPT-5.4 input + output tokens).

Stage 5: ATS Form Handlers

This is the part that's NOT GPT. Each ATS platform has a dedicated form handler written in Python:

  • Greenhouse handler — Understands Greenhouse's form structure, custom questions, file upload API, and EEO section
  • Lever handler — Handles Lever's single-page application flow and resume parsing quirks
  • Ashby handler — Uses Ashby's GraphQL API for structured submission
  • Workday handler — The most complex — handles multi-step forms, login flows, and CAPTCHA detection

Critical questions (work authorization, visa sponsorship, salary) use deterministic handlers — hardcoded answers from your config, never GPT. This prevents hallucination on questions where a wrong answer means automatic rejection.

Freeform questions ("Why do you want to work here?") go through GPT with anti-cliche prompting that references specific details from the company and JD.

Stage 6: Tracking and Dashboard

Every application is logged to a SQLite database with:

  • The original JD, match score, and filter decisions
  • The tailored resume PDF and cover letter text
  • Screenshots of the submitted application form
  • Timestamps, ATS platform, and company details
  • Application status tracking (applied → callback → interview → offer/reject)

The web dashboard (FastAPI + vanilla JS) provides a filterable view of all applications, statistics, and export functionality.

Anti-Detection Layer

Running on top of the pipeline is the anti-detection layer:

  • Warmup scheduler — 14-day ramp from low to normal activity
  • Rate limiter — Gaussian-distributed delays between actions
  • Session manager — Daily caps, maximum session duration, weekend exclusion
  • Stealth driver — Undetected Chrome with fingerprint patches
  • CAPTCHA detector — Pauses on CAPTCHA, alerts user, waits for manual resolution

Tech Stack

  • Language: Python 3.11+
  • Browser automation: undetected-chromedriver + Selenium
  • LLM: OpenAI GPT-5.4 / GPT-5.4-mini (via API)
  • Database: SQLite (local) / PostgreSQL (SaaS)
  • PDF generation: Jinja2 + WeasyPrint
  • Dashboard: FastAPI + vanilla JS
  • Config: YAML files

Run It Yourself

The entire agent is open-source under MIT license. Clone the repo, configure your YAML files, add your OpenAI API key, and run. Your data stays on your machine — we never see your credentials, resumes, or applications.

Or if you prefer a managed experience, the SaaS version handles hosting, updates, and adds team features.

← Back to all posts

Ready to automate your job search?

3 AI-tailored applications per month. Free forever.

Get Started Free