Category: AI Tools

  • AI content generators charge by the token—here’s what you’re actually paying for

    AI content generators charge by the token—here’s what you’re actually paying for

    If you’re using AI to write drafts, generate social captions, or summarise research, you’ve seen the pricing: $0.002 per 1,000 tokens, $20 for 500,000 tokens, or a monthly credit pool that resets whether you use it or not. But unless you’ve dug into the billing docs, you probably don’t know what a token actually is—or why your 300-word article sometimes costs twice as much as another one the same length.

    Token-based pricing isn’t new. OpenAI, Anthropic, Cohere, and most API-first AI platforms use it. What is new is how many solo operators are now running these tools daily without understanding the unit economics. That gap shows up as surprise overage charges, underpriced client work, or abandoned workflows because “AI got too expensive.”

    Tokens are not words

    A token is a chunk of text the model processes. It’s usually a word, part of a word, or a punctuation mark. The exact split depends on the tokeniser the model uses—and different models tokenise differently.

    Claude uses a tokeniser that averages about 1.3 tokens per word in English. GPT-4 is similar. That means a 1,000-word article is roughly 1,300 tokens. But if you’re writing in a language with more complex characters, working with code, or including lots of special formatting, the ratio climbs. A Markdown-heavy draft with tables and links can push 1.8 tokens per word.

    This matters for budgeting. If you’re charging a client $50 for a 1,500-word AI-assisted article and you assume 1,500 tokens, you’ll underestimate your input cost by 30% or more once you factor in the prompt, context, and output.

    Input tokens cost less than output tokens

    Most AI platforms charge different rates for input (what you send) and output (what the model returns). As of mid-2025, Claude‘s Sonnet 3.5 charges $3 per million input tokens and $15 per million output tokens. GPT-4o is $5 input, $15 output.

    If you’re pasting a 2,000-word style guide into every prompt to keep the AI on-brand, that’s roughly 2,600 input tokens—every single time. Run that 100 times in a month and you’ve burned through 260,000 tokens before the model writes a word. At $3 per million, that’s $0.78. Not huge, but it adds up if you’re also including example posts, research notes, or previous drafts in the context window.

    Output costs more. A 1,000-word draft is 1,300 tokens of output. Generate 100 of those and you’re at 130,000 output tokens, or about $1.95 at Claude’s rates. Combined with input, a modest content operation can easily hit $50–$75/month in API costs—before you factor in revisions, which double or triple the token count.

    How to track what you’re actually spending

    Most AI platforms show token usage in the dashboard, but it’s often buried. In the OpenAI Playground, token counts appear after each response. In the API, you get them in the response payload. If you’re using a wrapper tool like Writesonic, Jasper, or Copy.ai, token reporting is inconsistent—some show it, some don’t, and some round aggressively.

    For client work or internal budgeting, track tokens at the API level. If you’re calling Claude or GPT-4 directly, log the usage object in each API response. It breaks out input tokens, output tokens, and total tokens. Export that to a spreadsheet once a week and you’ll see exactly where the spend concentrates.

    If you’re using a third-party tool, ask support how they bill tokens. Some apply a markup. Others bundle token costs into flat-rate plans but throttle you after a threshold. Jasper, for example, moved to word-based credits in 2024, but those credits map back to token estimates under the hood—and the exchange rate isn’t published.

    One non-obvious way to cut token costs

    Stop regenerating entire drafts when you only need to fix one section. Most AI tools let you highlight a paragraph and re-run just that part. If you’re using the API, trim your context window: instead of sending the full 3,000-token style guide every time, send a 200-token summary. Test whether a shorter prompt gets you 90% of the quality at 40% of the cost.

    Also: use cheaper models for simpler tasks. GPT-4o-mini costs $0.15 per million input tokens and $0.60 per million output—10x cheaper than GPT-4o. Claude’s Haiku is similarly cheap. If you’re generating meta descriptions, social captions, or reformatting lists, the cheaper model is usually fine. Save the expensive one for long-form drafts where nuance matters.

    Token pricing is transparent once you understand the math. The opacity comes from not tracking usage and not knowing which levers to pull. If you’re spending more than $20/month on AI content tools, you’re past the point where rough estimates work. Start logging tokens, compare input vs. output costs, and test cheaper models for repetitive tasks.

    Want more breakdowns like this? Subscribe to One Two Three Send and get one operator-focused deep-dive every day—no fluff, no vendor pitches.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

  • AI writing tools forget your style guide—here’s how to fix it

    AI writing tools forget your style guide—here’s how to fix it

    You’ve spent three sessions training Claude to write in your brand voice. It nails your tone, mirrors your sentence structure, finally stops using “delve” and “unlock.” Then you open a new chat two days later and it’s back to corporate buzzword soup.

    This isn’t a bug. It’s how context windows work. And if you’re using AI to draft blog posts, social captions, or email sequences at scale, you’re losing hours re-teaching the same preferences every time you start fresh.

    The fix isn’t more detailed prompts. It’s building a reusable style anchor that travels with you across sessions, tools, and team members.

    Why AI forgets your voice

    Most AI models—Claude, ChatGPT, Gemini—treat each conversation as a contained context window. When you close the chat or hit token limits (usually 100,000–200,000 tokens depending on the model), everything you taught it evaporates.

    Some tools offer “memory” features or custom instructions, but they’re shallow. ChatGPT’s custom instructions cap at 1,500 characters. Claude Projects can hold more, but most operators work across multiple tools depending on the task. Your style guide needs to be portable, not locked into one platform’s feature set.

    The other problem: vague instructions don’t work. Telling an AI to “write conversationally” or “be punchy” produces different output every time. You need examples, constraints, and a reference text it can pattern-match against.

    Build a style anchor document

    A style anchor is a 500–800 word plain-text document that lives in your notes app, project folder, or password manager. You paste it into the start of every new AI session before asking it to write anything.

    Here’s what to include:

    • Voice principles: Three to five concrete rules. Not “be casual”—instead, “Use contractions. Start sentences with conjunctions. Write like you’re replying to an operator email, not publishing a press release.”
    • Forbidden words and phrases: List the clichĂ©s and jargon your industry overuses. For online-business writing, that’s usually “leverage,” “unlock,” “game-changer,” “dive deep,” “robust.”
    • Sentence structure preferences: Max sentence length, whether you allow one-sentence paragraphs, how you handle lists.
    • Three example paragraphs: Pull these from your best-performing posts. The AI will mimic the rhythm, syntax, and vocabulary distribution.
    • Formatting conventions: How you use em dashes, whether you write “email” or “e-mail,” if you use Oxford commas, how you format tool names.

    Keep it under 1,000 words. Longer anchors eat into the AI’s working memory and slow down responses.

    How to use it in practice

    Every time you open a new chat or switch projects, paste the full style anchor as your first message. Then prompt normally.

    If you’re working in a tool with persistent memory (Claude Projects, ChatGPT with a dedicated GPT), load the anchor once and reference it explicitly: “Follow the style guide I provided. Now write an intro for a post about WordPress caching plugins.”

    For team workflows, store the anchor in a shared doc. Anyone drafting content pastes it in before prompting. This keeps voice consistent even when three people are writing under the same byline.

    One non-obvious trick: version your anchor. When you notice the AI drifting or you refine your preferences, save the updated version as style-anchor-v2.txt. This lets you A/B test tone changes without losing the original.

    What this fixes (and what it doesn’t)

    A good style anchor eliminates 80% of voice drift across sessions. You’ll stop rewriting AI drafts from scratch and spend more time editing for accuracy and structure.

    It won’t fix factual errors, and it won’t teach the AI your audience’s specific pain points. You still need to brief it on context for every piece: who you’re writing for, what problem you’re solving, what the reader should do next.

    It also won’t replace editorial judgment. AI drafts still need a human pass for logic gaps, unsupported claims, and the occasional hallucinated stat. But you’ll spend that time on substance, not rewriting every sentence to sound like you.

    If you’re generating more than five pieces of content per week with AI—blog posts, social threads, email sequences, product docs—the style anchor pays for itself in saved editing time within a week.

    Want more operator-focused breakdowns of AI tools, hosting infrastructure, and traffic strategy? Subscribe to One Two Three Send and get one tactical deep-dive every day, no fluff.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

  • AI prompt libraries grow stale faster than you think

    AI prompt libraries grow stale faster than you think

    Most solo operators treat AI prompt libraries like recipe books: collect a few dozen good ones, save them in Notion or a text file, and pull them out whenever you need a blog intro or a product description.

    The problem is that prompts aren’t recipes. They’re instructions written for a specific version of a specific model at a specific point in time. When the model updates—and Claude, ChatGPT, and Gemini all push updates every few weeks—your carefully curated library starts misfiring.

    A prompt that generated tight 150-word summaries in April might produce 300-word essays in June. A content-rewriting prompt that preserved your brand voice last month might flatten it this month. And you won’t notice until you’ve already published three pieces that sound slightly off.

    Why prompts degrade faster than you expect

    Model updates don’t just improve accuracy or speed. They shift behavior in ways the companies building them don’t always document.

    OpenAI’s GPT-4 updates have quietly changed default verbosity at least twice in 2026. Claude‘s June model refresh altered how it interprets role-based instructions—prompts that began with “You are a copywriter” now trigger different output than they did in May. Google’s Gemini updates adjust tone calibration, especially for business and marketing tasks.

    None of these changes show up in release notes. You only notice when your output drifts.

    If you’re using a prompt library you built three months ago, you’re running instructions optimized for a model that no longer exists. The syntax still works, but the results have shifted enough that you’re spending more time editing than you were before.

    What breaks first

    Not every prompt degrades at the same rate. The ones that fail fastest share a few characteristics.

    Tone and voice prompts. Instructions like “write in a casual, conversational tone” or “match the voice of a skeptical industry analyst” are the most fragile. Models recalibrate tone with almost every update, and what felt conversational in April can read as chatty or flat by June.

    Length constraints. Prompts that specify word count—”write a 200-word summary” or “keep the intro under 100 words”—stop working reliably after a few updates. Models don’t ignore the instruction, but their idea of what constitutes 200 words shifts. You’ll get 250, then 180, then 220.

    Negation instructions. Prompts that tell the model what not to do—”don’t use jargon,” “avoid clichĂ©s,” “don’t start with a question”—become unreliable quickly. Models interpret negation differently across updates, and a prompt that successfully blocked fluff last month might let it through this month.

    Multi-step prompts. If your prompt includes more than two conditional instructions—”if the topic is technical, use examples; if it’s strategic, cite data”—it’s more likely to misfire after an update. Models handle conditional logic inconsistently, and updates often change how they prioritize competing instructions.

    How to build a prompt system that survives updates

    The goal isn’t to create prompts that never need revision. It’s to build a system that makes revision fast and obvious.

    Version your prompts. Tag each saved prompt with the date you last tested it and the model version it was written for. When you notice output drift, you’ll know whether to tweak the prompt or rewrite it entirely. A prompt that worked well for Claude in April might need only a single word change, or it might need a full rewrite.

    Use examples, not adjectives. Instead of “write in a confident, authoritative tone,” show the model a paragraph that demonstrates the tone you want and ask it to match that style. Example-based prompts degrade more slowly because they anchor the model to concrete output rather than abstract descriptors.

    Test prompts in pairs. Run the same prompt twice with slightly different phrasing and compare the output. If both versions produce similar results, the prompt is stable. If they diverge significantly, the instruction is ambiguous and will drift further as the model updates.

    Keep a changelog. When you revise a prompt, note what changed and why. Over time, you’ll see patterns—certain types of instructions that break predictably, specific phrasings that hold up across updates. That pattern recognition cuts your maintenance time in half.

    When to rebuild instead of revise

    Some prompts aren’t worth saving. If you’ve revised a prompt three times in two months and it still produces inconsistent output, the instruction set is probably too complex or too vague for the current model generation.

    Rebuilding doesn’t mean starting from scratch. Pull a recent output you liked, reverse-engineer what worked, and write a new prompt from that foundation. You’ll spend 15 minutes now instead of 45 minutes spread across six frustrating revisions over the next quarter.

    If you’re using Claude or another AI assistant as part of your content workflow, plan to audit your prompt library once a month. Test your five most-used prompts, compare output to your archived examples, and update anything that’s drifted. It’s faster than editing your way out of stale instructions.

    Reply to this piece if you’ve built a prompt versioning system that works. I’m tracking what solo operators are doing to keep their AI workflows stable without spending half their week on maintenance.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

  • AI content rewriters lose SEO context after three paragraphs

    AI content rewriters lose SEO context after three paragraphs

    AI rewriting tools promise to refresh old content, adapt pieces for different audiences, or polish drafts into publishable posts. In practice, most lose the thread after a few hundred words—and take your search rankings with them.

    The problem isn’t that the output reads poorly. It’s that the rewritten version drifts from the search intent and semantic context that made the original rankable. You end up with smoother prose that Google understands less clearly than the draft you fed in.

    Why context collapse happens

    Large language models process text in chunks, typically 512 to 2,048 tokens depending on the tool. When you ask Claude, ChatGPT, or a dedicated rewriter to rework a 1,500-word article, the model receives the full document but applies transformations paragraph by paragraph or section by section.

    Early paragraphs get rewritten with the full document in short-term memory. By the time the model reaches paragraph eight or nine, it’s prioritising local coherence—making each sentence flow from the last—over global semantic alignment with your target keyword and the questions that keyword implies.

    The model doesn’t forget your instructions. It just weighs them against an increasing pile of local context. Synonym substitution, sentence restructuring, and tone shifts compound. A post that ranked for “WordPress CDN setup” becomes a post about “content delivery configuration for WordPress sites”—technically accurate, lower search overlap.

    What you lose in the rewrite

    Three things degrade faster than readability:

    • Keyword density and placement. If your original placed the target phrase in the first 100 words, the H2, and the conclusion, the rewrite scatters it or replaces it with near-synonyms that don’t carry the same search volume.
    • Semantic clustering. Google’s algorithms look for related terms that confirm topic relevance—”DNS,” “origin server,” “cache purge” in a CDN article. Rewriters often swap these for vaguer language or drop them entirely in favour of smoother transitions.
    • Internal link anchor context. If you linked to a related post with anchor text like “WordPress object caching,” the rewrite might turn that into “another caching method” or a generic “learn more,” weakening the semantic signal between pages.

    You can recover readability by editing. You can’t easily recover ranking momentum once Google recrawls a diluted version and adjusts your position.

    When rewriting works anyway

    Short-form content survives AI rewrites better. A 400-word product description or email gives the model less room to drift. The entire piece fits comfortably in the context window, and the model can hold your intent steady from open to close.

    Rewriting also works when you’re not targeting search traffic. If you’re adapting a blog post into a LinkedIn update, a newsletter section, or a Twitter thread, semantic SEO doesn’t matter. Clarity and platform fit do. The model’s tendency to simplify and tighten becomes an asset.

    And if you’re refreshing content that never ranked well in the first place, you have little to lose. A rewrite that shifts keyword focus might accidentally improve relevance for a better query.

    How to preserve SEO during AI rewrites

    The most reliable fix is to rewrite in smaller sections and provide keyword guardrails in every prompt. Don’t send the full article and ask for a rewrite. Send the introduction, specify your target keyword and two related terms, then move to the next section.

    Explicit instructions help: “Rewrite this section. Keep the phrase ‘WordPress CDN setup’ in the first sentence. Retain all mentions of ‘origin server,’ ‘DNS,’ and ‘cache purge.’ Improve readability without changing technical terminology.”

    After the rewrite, run both versions through a keyword density checker or a semantic SEO tool like Surfer or Clearscope. Compare term frequency for your target keyword and the top twenty related phrases. If the rewrite drops five or more high-value terms, flag those paragraphs and restore the language manually.

    Some operators skip AI rewriting for ranking content entirely. They use the model to generate alternate introductions or tighten conclusions, but leave the body untouched. It’s slower than a one-click rewrite, but it doesn’t trade rankings for polish.

    If you’ve run AI rewrites on posts that used to rank and seen traffic drop in the weeks after republishing, this is likely why. The content didn’t get worse—it just stopped answering the question Google thought it answered before.

    Have a question about using AI tools without breaking what already works? Reply to this email—I read every message and often turn answers into future posts.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

  • AI assistants hallucinate pricing data—here’s how to verify

    AI assistants hallucinate pricing data—here’s how to verify

    You ask Claude or ChatGPT for a quick comparison: “Which newsletter platform costs less for 10,000 subscribers?” The model replies with confident numbers—$79/month here, $99/month there—and you make a purchasing decision based on that data.

    Then you click through to the actual pricing page and discover the real number is $129, or that the plan it recommended doesn’t exist anymore, or that the feature you need is only available on enterprise.

    AI assistants hallucinate pricing information more often than almost any other category of data. They blend outdated documentation, conflated product tiers, and invented numbers into answers that sound authoritative but cost you real money when you act on them.

    Why pricing data breaks LLMs

    Large language models train on static snapshots of the web. SaaS companies change their pricing every six to eighteen months—new tiers, revised limits, seasonal promotions, regional variations. The model’s training cutoff means it’s often working from information that’s twelve months stale or older.

    Worse, many pricing pages live behind JavaScript paywalls or login gates, so the training corpus captures incomplete or misleading fragments. The model fills gaps by interpolating from similar tools, which works tolerably well for feature descriptions but fails catastrophically for hard numbers.

    You’ll also see blended answers: the AI might pull a base price from one tier, a subscriber limit from another, and a feature list from a third, then present them as a single coherent package that doesn’t exist on any real plan.

    How to verify AI-generated pricing claims

    Treat every pricing figure, plan name, or feature-limit claim from an AI assistant as a research starting point, not a fact. Here’s the verification checklist:

    • Go directly to the vendor’s pricing page. Don’t rely on third-party review sites or affiliate comparison tables—those go stale even faster than the AI’s training data.
    • Check the date on any cited source. If the AI links to a blog post or help doc, look at the publish date. Anything older than six months is suspect for pricing.
    • Open the plan details or feature matrix. Don’t assume the headline price includes what you need. Verify the specific limits—sends per month, team seats, API access—that matter to your use case.
    • Test with a pricing calculator if available. Tools like Mailchimp, ConvertKit, and Brevo offer interactive calculators that show exactly what you’ll pay at your subscriber count. Use them.
    • Email sales for enterprise or custom plans. If the AI mentions an enterprise tier, assume the pricing it provides is invented. Those numbers rarely appear on public pages.

    For high-stakes decisions—annual contracts, multi-tool migrations, anything over $500/year—don’t verify once. Check again the week before you commit. SaaS companies announce pricing changes with as little as thirty days’ notice, and your six-week evaluation window can span a price hike.

    Where AI pricing answers do work

    AI assistants handle relative comparisons better than absolute numbers. If you ask, “Which is generally cheaper for small lists, Substack or Beehiiv?” the model can give you a directionally accurate answer because the relationship holds even when the exact figures drift.

    They’re also useful for surfacing lesser-known tools in a category. You might not have heard of a newer platform, and the AI can introduce you to it—but you’ll still need to verify the details yourself.

    Use AI to draft your shortlist and identify decision criteria. Then do the pricing research manually, in a spreadsheet, with current numbers from each vendor’s site.

    What to do if you’ve already bought based on AI output

    If you signed up for a service and the price or features don’t match what the AI told you, most SaaS companies offer refunds within seven to thirty days. Contact support immediately, explain the discrepancy, and ask for a prorated refund or a plan adjustment.

    For annual contracts, you have less flexibility, but it’s still worth asking. Some vendors will let you switch tiers or pause the subscription if you catch the issue within the first billing cycle.

    Document what the AI told you—screenshot the conversation—so you have a record of the claim. It won’t obligate the vendor to honor a hallucinated price, but it helps frame the conversation as a misunderstanding rather than buyer’s remorse.

    Have you caught an AI assistant inventing pricing data? Reply with the tool and the claim—I’m tracking which categories hallucinate most often, and I’ll share the patterns in a future issue.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

  • AI writing assistants lose your brand voice after 10,000 words

    AI writing assistants lose your brand voice after 10,000 words

    You feed Claude or ChatGPT your style guide, brand voice doc, and three sample articles. The first draft comes back clean. The fifth is still coherent. By draft twelve, you’re rewriting entire sections because the output sounds like a SaaS landing page written by committee.

    The problem isn’t the model. It’s context decay—and most solo operators don’t notice it until they’re deep into a content sprint.

    How context windows actually behave in production

    Modern AI assistants advertise context windows between 100,000 and 200,000 tokens. That sounds massive. In practice, a 2,000-word article with formatting consumes roughly 3,000 tokens. A detailed style guide adds another 1,500. Add three reference articles, a content brief, and iterative edits, and you’re at 15,000 tokens before you hit “generate.”

    The issue isn’t hitting the hard limit. It’s recency bias. AI models weight recent inputs more heavily than older ones. Your brand voice document, uploaded at the start of the session, fades in influence as the conversation grows. By message twenty, the model is prioritizing your last three corrections over the foundational voice rules you set up front.

    This isn’t speculation. Run the same prompt in a fresh session and in a thread with fifteen prior exchanges. The tone, sentence structure, and word choice drift measurably. The fresh session respects your style guide. The deep thread defaults to generic clarity.

    Where the breakdown happens

    Three scenarios accelerate context decay:

    Iterative editing. You ask for a rewrite of paragraph four. Then paragraph seven. Then a punchier intro. Each edit adds tokens. The model starts optimizing for your edits rather than your original brief. If your edits are vague (“make it snappier”), the output drifts toward the model’s base training—usually bland, corporate prose.

    Multi-article sessions. You’re batching content. Article one turns out great. Article two is fine. Article three reads like it was written by a different person. The model is still referencing article one’s context, but it’s now buried under 20,000 tokens of intermediate work. Your style guide is functionally invisible.

    Supplemental instructions mid-thread. You realize the model isn’t using contractions, so you add a note: “Use contractions. Write like a person.” That instruction applies to the current output, but it doesn’t retroactively fix the earlier drift. Worse, it competes with your original style guide, which may have said something more nuanced.

    How to architect around it

    The fix isn’t to abandon AI writing tools. It’s to structure your workflow so the model never has to remember too much at once.

    Start fresh for each piece. Don’t reuse threads across articles. A new session costs you thirty seconds of setup but guarantees your brand voice sits at the top of the context stack. If you’re batching content, open a new chat per article. Yes, you’ll paste your style guide multiple times. That redundancy is weight, not waste.

    Anchor instructions at both ends. Put your core voice rules in the first message and repeat the two most important points in your content brief. Example: if your style guide says “no jargon” and “lead with specifics,” embed those phrases in the article prompt itself. Repetition reinforces priority in the model’s attention mechanism.

    Use system prompts where available. Claude lets you set a system prompt that persists across a conversation. ChatGPT offers custom instructions. Both sit outside the regular context window and don’t decay. Load your brand voice there. Keep it under 300 words—short, imperative statements work better than discursive guidelines.

    Separate editing from generation. If you’re deep into revisions and the tone starts slipping, don’t keep editing in the same thread. Copy the draft into a fresh session, paste your style guide, and ask for a single-pass cleanup. The model will treat your draft as raw input and apply the voice rules uniformly, rather than layering fixes onto fixes.

    What this means for content operations

    If you’re publishing once a week, context decay is invisible. If you’re running a content engine—daily newsletters, multi-author blogs, high-volume SEO plays—it’s the difference between consistent voice and a patchwork of tones.

    The operators who scale AI writing successfully treat it like a stateless function. Each invocation gets the same inputs. No conversation persists long enough to drift. Workflows that rely on “the model will remember” break at volume.

    Track this in your own work. Open your last five AI-generated drafts. Read them in sequence. If draft five sounds meaningfully different from draft one—and you didn’t change your instructions—you’re watching context decay in action.

    One Two Three Send covers the tools and workflows that power solo operations. If you’re running content at scale, subscribe for weekly breakdowns of what works, what breaks, and what costs you time when no one’s watching.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

  • AI token counters lie—here’s how to bill clients accurately

    AI token counters lie—here’s how to bill clients accurately

    If you bill clients for AI-assisted work—copywriting, image generation, data extraction—you’ve probably noticed that the token count in your API dashboard doesn’t match the estimate your prompt tool gave you. Sometimes the difference is negligible. Other times it’s 20% or more, and you’re left trying to explain why the invoice doesn’t match the quote.

    Token counting isn’t standardized across models, and the tools we use to estimate cost rarely account for the invisible overhead that APIs add. Here’s what’s actually happening, and how to track usage in a way that holds up when a client asks questions.

    Why token counts don’t match between tools

    Most AI token counters—like the estimators built into prompt libraries or third-party calculators—use open-source tokenizers that approximate how a model splits text. OpenAI’s tiktoken library is the most common. It’s accurate for GPT-3.5 and GPT-4, but it doesn’t account for function calling, system messages, or the formatting overhead that APIs inject when you use features like JSON mode or structured outputs.

    Anthropic’s Claude models use a different tokenizer entirely. If you’re using a calculator built for OpenAI and then running prompts through Claude, your estimate can be off by 15–30%. The variance gets worse with non-English text, code blocks, or anything that includes special characters.

    Then there’s the API wrapper problem. If you’re using a tool like LangChain, Make, or Zapier to call an AI model, the platform often adds its own metadata—timestamps, user IDs, retry logic—that inflates token usage without appearing in your prompt preview.

    What your API dashboard is actually counting

    Your API provider’s usage dashboard is the source of truth, but it’s counting more than you think. Every API call includes:

    • System messages that set model behavior (e.g., “You are a helpful assistant”).
    • Function definitions if you’re using tools or structured outputs.
    • Conversation history if you’re maintaining context across multiple turns.
    • Formatting tokens for JSON mode, which adds schema instructions behind the scenes.

    A 500-token prompt can easily become 650 tokens by the time it hits the API. If you quoted a client based on the prompt alone, you’re eating the difference.

    OpenAI’s API returns token counts in the response object (usage.prompt_tokens and usage.completion_tokens), so you can log the actual cost per call. Anthropic does the same in the usage field. If you’re not capturing that data, you’re guessing.

    How to track usage without undercharging

    The simplest fix: log every API response and pull token counts directly from the provider. If you’re using Python, store the usage object in a CSV or database after each call. If you’re using a no-code tool like Make or Zapier, add a step that writes the token count to a Google Sheet or Airtable base.

    For client work, I run a weekly script that sums token usage by project tag and multiplies by the current API rate (OpenAI charges $0.01 per 1K prompt tokens and $0.03 per 1K completion tokens for GPT-4). That number goes into the invoice as a line item, and I attach a CSV export if the client asks for proof.

    If you’re quoting a fixed price, pad your estimate by 25% to cover system message overhead, retries, and any follow-up calls the client requests mid-project. Estimators are useful for ballpark numbers, but they shouldn’t be the final quote.

    When flat-rate pricing beats usage tracking

    Some operators skip per-token billing entirely and charge a flat rate per deliverable—$200 for a landing page, $500 for a lead magnet, regardless of how many API calls it takes. This works if you’re confident in your workflow and don’t want to explain token math to every client.

    The tradeoff: you absorb cost volatility. If a client requests three rounds of revisions, you’re paying for the extra tokens. If you’re using a model like GPT-4 or Claude Opus, a single long-context project can cost $15–$30 in API fees. Flat pricing makes sense when your process is repeatable and your margins are wide enough to cover outliers.

    For subscription clients—content retainers, weekly reports—I set a token budget per month (e.g., 100K tokens) and log usage in a shared dashboard. If they go over, the next invoice includes an overage charge at $0.02 per 1K tokens. That rate is higher than my cost, but it discourages scope creep and keeps the math transparent.

    Want more breakdowns like this? Subscribe to One Two Three Send for weekly deep-dives on the tools and workflows that power solo content businesses.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

  • AI assistants leak context when you switch between projects

    AI assistants leak context when you switch between projects

    If you’re using ChatGPT, Claude, or Gemini to draft emails, write product descriptions, and brainstorm content ideas in the same session, you’ve probably noticed the tone starting to bleed. A client brief starts sounding like your newsletter. Your course outline picks up jargon from a product spec you wrote an hour ago.

    That’s not you losing focus. It’s the assistant remembering too much.

    How AI memory works across conversations

    Most conversational AI tools maintain context within a single chat thread. That’s useful when you’re iterating on a draft or debugging a workflow. But many platforms now also persist memory across threads—storing facts, preferences, and patterns from past sessions to make future responses feel more tailored.

    ChatGPT’s memory feature, for example, stores details like your writing style, preferred tools, and recurring projects. Claude offers a similar feature called “custom instructions” that carries forward between chats. Gemini ties memory to your Google account, pulling in context from Gmail, Docs, and prior conversations.

    The problem: these tools don’t automatically segment memory by project, client, or business vertical. If you’re a solo operator juggling freelance work, your own newsletter, and a side product, the AI treats it all as one continuous job.

    That means a prompt like “write a welcome email” might generate copy that sounds like the SaaS client you were working with yesterday—not the D2C product you’re launching today.

    When bleed happens and why it matters

    Context bleed shows up most clearly in tone, vocabulary, and assumed audience. If you’ve been drafting B2B landing pages all morning and then ask the AI to write a casual Twitter thread, you’ll often get something that splits the difference: too formal for social, too chatty for a white paper.

    It’s worse when you’re working under NDA or handling sensitive client material. Even if you’re not pasting in proprietary data, the AI might pick up on industry-specific language, product names, or strategic framing and echo it back in unrelated work. That’s a compliance risk and a professionalism problem.

    For operators running multiple revenue streams—affiliate content, a paid newsletter, consulting, a course—context bleed also dilutes your brand. Your newsletter starts to sound like your client’s voice. Your course copy picks up affiliate-review phrasing. Readers notice when the tone shifts, even if they can’t articulate why.

    How to isolate context without losing efficiency

    The simplest fix: use separate chat threads for separate projects. Don’t rely on the AI to infer boundaries. Start a fresh conversation every time you switch contexts.

    If your AI tool supports memory or custom instructions, turn it off for client work. In ChatGPT, you can disable memory in settings under “Data Controls.” Claude lets you clear custom instructions per chat. Gemini’s memory is harder to partition, but you can use incognito mode or a separate Google account for client sessions.

    For operators who bill multiple clients or run distinct content verticals, consider using different AI accounts entirely. A free-tier ChatGPT account for client drafts, a paid Claude subscription for your own content. It’s redundant, but it enforces a hard boundary.

    Another tactic: explicitly reset context in your prompt. Start each session with a short instruction that overrides prior memory. Something like: “Forget previous projects. This is a cold email for a B2C e-commerce brand selling outdoor gear. Tone: direct, benefit-focused, no jargon.” It’s not foolproof, but it reduces bleed.

    When memory is worth keeping

    Context persistence isn’t inherently bad. If you’re working on a single long-term project—building out a course, drafting a book, developing a content calendar—memory helps the AI stay consistent without you having to repeat background in every prompt.

    The key is intentional segmentation. Treat AI memory like browser cookies: useful within a defined scope, risky when it leaks across domains.

    If you’re a solo operator, that means thinking through which projects share a voice and which need to stay isolated. Your newsletter and your Twitter threads? Probably fine to share context. Your newsletter and a white-label ghostwriting gig? Keep those separate.

    Most AI tools don’t yet offer project-level memory management. Until they do, the burden is on you to create those boundaries manually—through new threads, separate accounts, or hard resets in your prompts.

    One Two Three Send covers AI tools, workflow automation, and the infrastructure that keeps solo operators running. If this kind of tactical breakdown is useful, subscribe for one article like this every day.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.

  • AI image generators bill you per API call—here’s the math

    AI image generators bill you per API call—here’s the math

    Most solo operators experimenting with AI image generation start with a web interface—DALL·E’s playground, Midjourney’s Discord bot, or Stability AI’s DreamStudio. The pricing feels simple: buy credits, burn through them, top up when you run out.

    Then you scale. You want to automate thumbnail generation for blog posts, create social assets in bulk, or build a tool that generates images on demand. Suddenly you’re looking at API pricing, and the math gets complicated fast.

    Here’s what actually happens when you move from casual generation to programmatic use, with real numbers from three major platforms.

    DALL·E 3: pay per resolution and quality tier

    OpenAI’s DALL·E 3 API charges based on image resolution and quality setting. As of May 2026, standard quality at 1024Ă—1024 costs $0.040 per image. HD quality at the same resolution jumps to $0.080. If you drop to 1024Ă—1792 (portrait or landscape), HD pricing climbs to $0.120 per image.

    That means 1,000 standard blog thumbnails cost $40. If you want HD quality for each, that’s $80. For a daily newsletter with five images per issue, you’re looking at $1.20 per send in HD, or $730 per month if you publish Monday through Friday.

    DALL·E 3 doesn’t offer volume discounts. You pay the same rate whether you generate ten images or ten thousand. The API is fast—typically under ten seconds per generation—but there’s no batch pricing, no prepaid tiers, and no way to lock in a lower rate.

    Midjourney: seat-based pricing, not per-image

    Midjourney doesn’t sell API access the way OpenAI does. Instead, you subscribe to a plan that gives you a monthly GPU time allowance. The Basic plan costs $10/month for roughly 200 images (about 3.3 hours of GPU time). The Standard plan is $30/month for around 900 images (15 hours). Pro runs $60/month for 1,800 images (30 hours), with an option to buy additional GPU hours at $4 per hour.

    If you’re automating image generation, Midjourney’s Discord-first architecture creates friction. There’s no official REST API yet. Third-party wrappers exist, but they scrape the Discord bot and risk rate limits or account suspension. For reliable programmatic use, Midjourney isn’t viable—even though the per-image cost on a Standard plan works out to about $0.033, cheaper than DALL·E 3.

    Stable Diffusion: self-hosting vs. hosted APIs

    Stable Diffusion is open-source, which changes the cost structure entirely. You can run it locally or on your own cloud instance, paying only for compute. A mid-tier GPU instance on AWS (g5.xlarge with an NVIDIA A10G) costs around $1.006 per hour on-demand. If you generate 100 images per hour, that’s roughly $0.01 per image—75% cheaper than DALL·E 3 standard quality.

    But self-hosting requires setup: installing dependencies, managing model weights, handling queues, and monitoring uptime. For solo operators generating fewer than 500 images a month, the overhead usually isn’t worth it.

    Hosted Stable Diffusion APIs solve this. Stability AI’s own API charges $0.01 per image for SDXL (1024Ă—1024). Replicate offers SDXL at $0.0055 per image, billed per compute second. Both are significantly cheaper than DALL·E 3, but image quality and prompt adherence vary more widely. You’ll burn extra generations refining prompts.

    Hidden costs: retries, storage, and moderation

    Every AI image API occasionally returns unusable output—cropped faces, garbled text, or results that ignore your prompt entirely. DALL·E 3 is the most reliable, but you’ll still retry 5–10% of generations. Stable Diffusion can require three or four attempts to get a usable image, especially with complex prompts.

    Factor retries into your budget. If your effective cost per usable image is 1.2Ă— the API’s listed price, a $0.01 Stable Diffusion call becomes $0.012. A $0.04 DALL·E call becomes $0.048.

    Storage adds up too. A single 1024Ă—1024 PNG averages 1.5–2 MB. Generate 10,000 images and you’re storing 20 GB. At $0.023/GB/month on AWS S3, that’s $0.46/month—not huge, but it scales linearly. If you’re generating images for a public-facing tool, you’ll also need a CDN. Cloudflare’s free tier works for light use; beyond that, budget $0.01–0.02 per GB transferred.

    Content moderation is another variable cost. DALL·E 3 includes built-in filtering, but Stable Diffusion doesn’t. If you’re accepting user prompts, you’ll need a moderation layer—either OpenAI’s moderation endpoint ($0.0001 per request) or a third-party service like Sightengine, which starts at $39/month for 5,000 images.

    When self-hosting makes sense

    Self-hosting Stable Diffusion pays off when you’re generating more than 2,000 images per month and can batch them efficiently. Spin up a GPU instance, queue 500 generations, process them in parallel, then shut the instance down. You’ll pay for an hour or two of compute instead of thousands of individual API calls.

    For sporadic use—ten images one day, none for a week—stick with a hosted API. The convenience premium is worth it.

    If you’re choosing between DALL·E 3 and Stable Diffusion APIs, run a quality test first. Generate twenty images with identical prompts on both platforms. If DALL·E 3 nails the prompt 90% of the time and Stable Diffusion needs three tries per usable image, DALL·E’s 4Ă— higher price might still be cheaper per good output.

    Want more breakdowns like this? Subscribe to One Two Three Send for weekly operator-focused analysis of tools, pricing, and infrastructure decisions.

  • AI prompt libraries don’t scale past twenty prompts

    AI prompt libraries don’t scale past twenty prompts

    If you’ve been using AI tools for more than three months, you probably have a growing collection of prompts saved somewhere. A Notion database. A Google Doc. Maybe a folder of text files with names like newsletter-intro-v3-final.txt.

    The problem isn’t saving prompts. The problem is finding the right one when you need it—and knowing whether the version you saved six weeks ago is still the best approach.

    Most prompt libraries fail around the twenty-prompt mark. Here’s why, and what actually works when you’re running a content business that depends on consistent AI output.

    The retrieval problem nobody talks about

    Prompts aren’t like recipes. You don’t browse them. You need them in context, under pressure, often mid-workflow.

    A Notion database works great when you have five prompts and remember what each one does. At twenty, you’re scanning titles. At forty, you’re using Notion’s search and hoping you tagged it correctly. At sixty, you’ve forgotten half of them exist.

    The failure mode isn’t storage—it’s retrieval. You need the prompt that generates product comparison tables, but you can’t remember if you called it “compare-products” or “product-table-builder” or “comparison-prompt-v2”. So you either waste five minutes searching or you write a new one from scratch, which defeats the purpose of saving prompts in the first place.

    Text files are worse. Folder hierarchies help until you need a prompt that could live in two categories. Do you file “write a cold-email follow-up” under Email or Sales or Outreach? You’ll forget. Six months later, you’ll create a duplicate.

    What works: context-based systems, not archives

    The operators I know who’ve solved this use one of three approaches, depending on how they work.

    Custom instructions in the AI tool itself. Both ChatGPT and Claude let you set default instructions that apply to every conversation. If 80% of your prompts share the same voice, format, or constraints—”always write in second person,” “keep paragraphs under three sentences,” “never use exclamation marks”—bake that into the tool. You’ll still need specific prompts for specific tasks, but you’ve eliminated the repetitive setup.

    Claude‘s Projects feature takes this further. You can create a project for, say, newsletter writing, upload your style guide and past issues, and set project-level instructions. Every conversation in that project starts with that context loaded. You’re not hunting for the right prompt—you’re working in the right environment.

    Snippet expansion tools. If you’re using prompts across multiple AI tools—ChatGPT for brainstorming, Claude for drafting, Perplexity for research—a snippet manager like TextExpander or Espanso beats a Notion database. Type a short trigger (;newsletter-intro) and it pastes the full prompt, wherever you are. No context switching. No hunting.

    The catch: snippet tools don’t handle nested prompts or conditional logic well. If your prompt has variables or depends on prior output, you’ll need something more structured.

    A single, linear prompt doc. This sounds too simple to work, but I’ve seen it succeed with operators who run high-volume content operations. One Google Doc. Chronological. Every new prompt gets added to the top with a date and a two-line description of what it does and when you used it. No folders. No tags. Just Cmd+F and a date range.

    The advantage: you don’t have to predict future search terms. You search for the outcome (comparison table) or the date you remember using it (April), and it surfaces. The disadvantage: it only works if you actually write those two-line descriptions. Most people don’t.

    The bigger issue: prompts drift

    Even if you solve retrieval, there’s a second problem. Prompts aren’t static. Models improve. Your writing style changes. The task evolves.

    The “write a newsletter intro” prompt you saved in February might produce worse output than a simpler prompt today, because GPT-4 in May behaves differently than GPT-4 in February. Or because you’ve tightened your house style and the old prompt encourages the wrong tone.

    If you’re saving every prompt variation, your library becomes a junk drawer. If you’re overwriting old prompts, you lose the ability to compare results or roll back when a new version underperforms.

    The cleanest solution I’ve seen: version prompts like code. Keep a changelog at the top of each prompt file. v1: original. v2: shorter intros. v3: removed rhetorical questions. When you update a prompt, you document why. Three months later, when output quality drops, you know which change to revert.

    This works in snippet tools, too—just add a version tag to your trigger. ;newsletter-intro-v3 instead of ;newsletter-intro. You keep the old version accessible without cluttering your main workflow.

    When to stop collecting prompts entirely

    Here’s the contrarian part: most solo operators would get better results from fewer saved prompts and more iteration in-session.

    If you’re saving fifty prompts for fifty micro-tasks, you’re fighting the way modern AI tools actually work. They’re conversational. They improve with feedback. A mediocre starting prompt plus two rounds of clarification often beats a “perfect” saved prompt used cold.

    The prompts worth saving are the ones that encode hard-won constraints—word counts, formatting rules, audience definitions, brand voice—that you’d otherwise have to re-explain every time. Everything else is just a starting point.

    Save the structure. Improvise the rest.

    Using AI tools to run your content operation? Subscribe to One Two Three Send for weekly breakdowns of what actually works—no hype, no fluff.

    Heads up — some links in this article are affiliate links. If you sign up through them, we may earn a small commission at no extra cost to you. We only recommend tools we use ourselves.