Skip to main content

Stop Using Opus for Everything: A Cost-Effective Vibe Coding Workflow with Cursor and GitHub Copilot

Yilin Fang
Author
Yilin Fang
PhD Student @ OSU CSE

If you are like me, you default to the most powerful model for every task. Opus for planning? Opus. Writing a docstring? Opus. Adding a CLI flag? Believe it or not, Opus. It feels safe — why risk a worse answer when you can just use the best?

The problem is that for most coding tasks, you are paying 9x more (in Copilot premium requests, compared with Haiku) for a difference you will never notice. After burning through my premium requests in record time, I redesigned my workflow from scratch. Here is what I landed on.

Disclaimer: All pricing, model names, premium request costs, and free-tier availability mentioned in this post are as of February 2026. These details change frequently — check the latest Cursor and GitHub Copilot documentation for the most up-to-date information.

My Setup
#

I have two subscriptions running in parallel, costing me $30/month total:

  • Cursor Pro ($20/month): Built-in AI editor features (Tab completion, inline edits, chat, Composer/Agent mode) plus $20 in API credits for premium models. The API budget powers your daily in-editor work — inline edits, Composer, and chat — and doubles as a fallback for planning when Copilot premium requests run out.
  • GitHub Copilot Pro ($10/month): Free unlimited access to base-tier models (GPT-4o, GPT-4.1, GPT-5-mini) and 300 premium requests for stronger models. Premium models cost different amounts per call: Claude Opus 4.5 costs 3, Sonnet 4.5 costs 1, and Haiku 4.5 costs 0.33. I use Copilot through OpenCode, an open-source terminal-based coding agent.

The key insight is that these two tools excel at different things, and the models within them should be matched to task difficulty, not to your anxiety about output quality.

Assign Each Tool Its Strength
#

Cursor is best for interactive, in-editor work — writing code, refactoring, quick Q&A while you are actively editing files. Its Tab completion and inline diff experience is unmatched for moment-to-moment coding.

OpenCode + Copilot is best for agentic, multi-step tasks — the kind where you describe a goal and the agent plans, creates files, runs commands, and iterates autonomously from the terminal.

There is also an infrastructure reason I keep agentic work in the terminal rather than in a GUI editor: I frequently develop on remote servers, and internet connections drop. If you are running a long agentic session in Cursor and your connection dies, you lose the entire session — the agent stops mid-task, and there is no way to resume. By running OpenCode inside a tmux session on the server, the agent keeps working even if your local machine disconnects. You just reattach to the tmux session when you are back online. For any task that takes more than a few minutes, this reliability difference matters a lot.

The Real Game Changer: Model Selection
#

Different models on Copilot consume different amounts of premium requests:

ModelCost per callTotal calls from 300 requests
Opus 4.53 requests100
Sonnet 4.51 request300
Haiku 4.50.33 requests~900
GPT-4o / GPT-4.1 / GPT-5-miniFreeUnlimited

This means using Opus for everything gives you only 100 calls over the billing period, while the free-tier models give you unlimited calls at zero premium cost. The question is: when does the quality difference actually matter?

Understanding Model Failure Modes
#

Instead of thinking about what each model is good at, think about where each one breaks down.

Free-tier models (GPT-4o, GPT-4.1, GPT-5-mini) fail when the task requires deep reasoning over complex codebases, when instructions are highly specific and the model needs to follow them precisely without drift, or when the output needs to be consistent across a long generation. They are solid for standard code generation from clear specs, but they fall short on tasks that require careful architectural thinking.

Haiku fails when the prompt is long with many interleaved constraints, when the task requires multi-step reasoning where each step depends on the previous one, when the code involves subtle correctness requirements like concurrency or cryptographic protocols, or when it needs to maintain consistency across a long output.

Sonnet fails when the problem requires genuine creativity in system design, when debugging requires understanding deep interactions across a large codebase, when the task involves reasoning about security properties or adversarial inputs, or when you need the model to push back on your plan and identify flaws you have not considered.

Opus rarely fails on coding tasks. When it does, it is usually because the problem is under-specified or the context window is insufficient, not because of reasoning limitations.

The Escalation Rule
#

The habit to build: always start one level lower than your instinct says.

Your instinct says Opus? Try Sonnet first. Your instinct says Sonnet? Try Haiku first. Your instinct says Haiku? Try a free-tier model first. If the output is clearly inadequate — not slightly different from what a stronger model might produce, but actually wrong or missing the point — then escalate.

In practice, you will find that free-tier models handle a surprising amount of implementation work when given a clear plan. Haiku and Sonnet pick up the moderately difficult tasks, and you only truly need Opus for the top 5–10% of problems.

Does This Task Even Need a Plan?
#

Before thinking about models, ask a more basic question: does this task need a plan at all?

A plan is only valuable when there are decisions to make before writing code. If there are no meaningful decisions, planning is pure overhead.

No plan needed: Adding a flag to a CLI tool. Writing docstrings. Fixing a known bug. Writing a single function with a clear signature. Anything you can describe in one or two sentences.

Plan needed: The task involves multiple files. There are design choices that affect downstream code. The task has ordering dependencies. You are not sure what the right approach is. The scope is large enough that a model could lose track mid-implementation.

The rough heuristic: if you would sketch something on paper before coding it yourself, the model also needs a plan.

Six Patterns for Every Situation
#

Pattern 1: Direct Execution, No Plan
#

When: Small, mechanical, self-contained tasks.

Flow: You → Free-tier model (GPT-4o / GPT-4.1 / GPT-5-mini)

Example: “Add a --seed argument to train.py and set torch.manual_seed.” Nothing to plan. One model, one step, zero premium cost.

Pattern 2: Self-Planned Execution
#

When: Moderately complex, but one model can both plan and execute in a single session.

Flow: You → Sonnet (plan + implement)

Example: “Refactor the data loading to support both CSV and HDF5 with a common interface.” Sonnet figures out the abstraction, decides on the structure, and writes the code.

Pattern 3: Opus Plans, Sonnet Implements
#

When: Hard design decisions, but straightforward implementation once the spec exists.

Flow: You → Opus (plan) → Sonnet (implement each step)

Example: “Design a pipeline that extracts CFGs from binaries, converts them to graph representations, runs GNN inference, and produces a structured report.” Opus decides module boundaries, data formats, and API contracts. Sonnet implements each module.

Pattern 4: Opus Plans, Free-Tier Implements
#

When: Complex design, but each implementation step is mechanical enough that a free-tier model can handle it.

Flow: You → Opus (plan) → Free-tier model (implement each step)

Same as Pattern 3, but when the individual steps are pure translation from spec to code. This is the most cost-efficient pattern for large tasks — you spend 3 premium requests on the Opus plan, and zero on implementation.

The catch: You need to verify that the free-tier model is following the plan correctly. If it drifts, escalate that specific step to Haiku or Sonnet.

Pattern 5: Sonnet Plans, Free-Tier Implements
#

When: The task needs a plan, but the design decisions are routine.

Flow: You → Sonnet (plan) → Free-tier model (implement each step)

Example: “Set up experiment configs using Hydra with separate groups for model, dataset, training, and logging.” Multi-file with dependencies, but well-established patterns. Sonnet plans, GPT-4.1 or GPT-5-mini executes. Total cost: 1 premium request.

Pattern 6: Fallback — Cursor Plans, Free-Tier Implements
#

When: You have run out of Copilot premium requests for the month, but you still need an AI-assisted workflow.

Flow: You → Cursor API (plan with a premium model) → OpenCode with free-tier models (implement)

This is your safety net. Cursor’s $20 API budget can power planning sessions using Opus, Sonnet, or GPT-5.2 inside the editor. You then take that plan to OpenCode and execute it with Copilot’s unlimited free-tier models. It is not as seamless as having everything in one tool, but it keeps you productive when your premium budget is exhausted.

The Cost Difference Is Staggering
#

For a complex, 10-step feature implementation:

StrategyPlannerImplementerCost (premium requests)
Always OpusOpusOpus33
Opus + SonnetOpusSonnet13
Opus + HaikuOpusHaiku6.3
Opus + Free-tierOpusGPT-4.1 / GPT-5-mini3
Sonnet + Free-tierSonnetGPT-4.1 / GPT-5-mini1
Fallback (Cursor + Free-tier)Cursor APIGPT-4.1 / GPT-5-mini0 premium (uses Cursor $)

The “always Opus” approach costs 33 premium requests for one feature. The Opus + Free-tier pattern costs 3 for comparable quality — an 11x improvement. And once you run out of premium requests, the fallback pattern lets you keep working at zero premium cost by shifting planning to Cursor’s API budget.

The Complete Decision Flowchart
#

Step 1 — Does this task need a plan? Can you describe the full solution in one or two sentences with no ambiguity? If yes, no plan — go to Step 2a. If no, plan needed — go to Step 2b.

Step 2a — No plan. Who implements? Mechanical and well-defined → free-tier model. Moderate reasoning required → Haiku or Sonnet. Security-critical or subtle correctness → Sonnet or Opus.

Step 2b — Plan needed. Who plans? Routine design decisions → Sonnet plans. Genuinely hard design decisions → Opus plans. Out of premium requests → Cursor API plans.

Step 3 — Who implements the plan? Mechanical steps → free-tier model. Steps requiring moderate reasoning → Haiku or Sonnet. Critical steps needing the strongest model → Opus for those specific steps only.

Where Each Subscription Dollar Goes
#

ResourceUse For
Copilot premium (Opus)Architecture design, complex debugging, security-critical review
Copilot premium (Sonnet/Haiku)Moderate planning, non-trivial implementation, self-planned execution
Copilot free-tier (GPT-4o, GPT-4.1, GPT-5-mini)Routine implementation from a plan, simple tasks, unlimited usage
Cursor $20 API budgetInline edits, Composer, chat — your daily in-editor driver. Also fallback planning when Copilot premium runs out
Cursor Tab completionAlways-on, low-cost autocompletion while you type

Practical Tips
#

Track both budgets. Check Copilot premium usage and Cursor API spend regularly. The goal is to exhaust Copilot premium requests right around the end of the billing cycle, not halfway through. If you are burning too fast, shift more implementation work to free-tier models.

Use project-level config files. Both Cursor (.cursorrules) and OpenCode support system prompts. Write one describing your project structure and conventions. It pays for itself immediately — especially for free-tier models, which benefit the most from clear context.

Save your Opus plans. Store them as markdown files in docs/plans/. They double as documentation and can be re-fed to free-tier models later if you need to revisit or extend a feature.

Use Opus for plan revisions, not code fixes. If the implementation has a design flaw, do not waste tokens patching around it with weaker models. Go back to Opus for a revised plan. If it is just a syntax error, even GPT-4o can handle it.

The first message matters most. In an agentic session, the first message sets the trajectory. If you are going to spend premium requests at all, spend them on the first planning message, then drop to free-tier for execution.

When premium runs out, do not panic. Switch to the fallback pattern: plan in Cursor using the $20 API budget, implement in OpenCode with free-tier models. The workflow is slightly less convenient but still highly productive.

The One Rule
#

Use the strongest model for decisions. Use the cheapest model for execution. Planning is decision-making. Implementation, when the plan is detailed enough, is execution. Keep these separate, and you will get Opus-quality results at close to free-tier cost.

And when your premium requests run out? You still have Cursor and unlimited free-tier models. The workflow adapts. Your productivity does not have to stop.