
Don't pick your AI assistant from a leaderboard or a hype thread. Run three fifteen-minute tests on your own brand and funnel, then commit to the one that needs the least editing on the work you actually do.
Episode 1 of AI for Marketers. We start with a fast news rundown, then the anchor tutorial: how to choose and evaluate an AI assistant for your own marketing work, and how to keep reading the field as models change month to month.
News this week (June 2-9, 2026)
Tutorial: picking and evaluating your assistant
Sort the seven major assistants by your real work, your data privacy, your existing stack, and budget (the ~$20/mo individual tier is the standard). Then read the field with your own brand-voice test, research/grounding test, and factual probe instead of trusting gamed leaderboards. The biggest pitfall: hallucination (~3.1-19.1%, citations worst at ~12.4%). Demand sources and click every link. Check data controls before pasting anything sensitive.
Let's start with the week in AI marketing. Quick snapshot, not a ranking, and where a vendor's own numbers show up I'll flag them as reportedly.
Biggest story first. Around the second and third of June, Google announced a new toggle inside Search Console that lets a website opt out of AI Overviews, AI Mode, and AI Overviews in Discover, while still showing up in regular search results. This came from the UK competition regulator, which reportedly calls it a world first, and it's meant to give publishers more leverage over content licensing. A few things matter here. It won't actually take effect until June seventeenth, when Google starts acting on the signal. It does not cover the Gemini app, so opted-out content can still surface there. And Google says it won't use the toggle as a ranking signal in normal search. For a marketer, this is an all-or-nothing trade. You either stay visible in AI answers or you don't, with no granular control. So the smallest next action is simple. Do not flip that toggle reflexively before the seventeenth. First go look at your new AI impression data and see how much visibility you'd actually be giving up.
That data is our standing visibility check, and it's the second story. On June third, Google added generative-AI performance reports to Search Console, isolating impressions from AI Overviews, AI Mode, and generative AI in Discover for the first time. What's in there is impressions only, broken down by page, country, device, and date. What's missing is a lot. No clicks, no click-through rate, no query or keyword data, and no real per-link position, because every link in an AI Overview shares one position. Google notes the total impression count doesn't change. This is just a breakout of numbers that were always bundled in. For scale, Google reportedly says AI Overviews now has over two and a half billion monthly users, and AI Mode over one billion. On referral traffic, Goodie's 2026 report found ChatGPT's share of measurable business-to-business AI referrals reportedly fell from about eighty-nine percent in mid-2025 to roughly sixty-three percent, while Claude surged to about eighteen and a half percent, Gemini rose to about eleven, and Perplexity to about seven. Those four are now reportedly about ninety-nine percent of measurable AI referrals. Claude reportedly hit a single-month peak above twenty-seven percent in April. Google's own AI referrals stay largely unmeasured. So open that new report and see which of your pages already appear in AI answers.
Two quick product items. HubSpot upgraded its CRM connectors. The Claude connector now uses a SQL-based retrieval method for faster, more accurate results on big datasets, with aggregations and joins, and the ChatGPT connector now reaches campaigns, landing pages, blog posts, and more. These launched back in 2025, so this is the upgrade, not the debut. The Claude connector reportedly needs a paid Claude plan. You can now ask grounded questions like which contacts opened but didn't click. Try connecting one to a sandbox portal and running a single cross-object query.
And on June ninth, Constant Contact launched an app inside ChatGPT for small businesses and nonprofits. You can generate full email campaigns from plain prompts, spin up subject-line variations, keep your tracking codes, and publish straight back to Constant Contact. Reportedly, customers using its AI features send emails about twenty-three percent faster. Smallest next action, link an account and draft one campaign end to end to test the publish-back handoff.
Welcome to episode one. This is where the whole show begins, so let's set the table. Over the next thirty or so episodes we're going to climb a single ladder, from pasting a prompt and getting generic slop, all the way up to AI doing real, reliable, on-brand work for your marketing. But you can't climb anything until you've picked the tool you're going to stand on. So today's job is choosing and evaluating an AI assistant. Notice I said the job, not the tool. The specific assistants will keep changing. The skill of picking one, and knowing when your pick has gone stale, will not.
Before we go further, let me clear up some words, because a few of them get used loosely and that confusion costs people money.
Start with the most important distinction. An assistant is not a model. The assistant is the product you log into and chat with. ChatGPT, Claude, Gemini. The model is the engine running inside it. Things with names like GPT-5.5, Claude Opus 4.8, Gemini 3.1 Pro. One assistant offers several models, and it swaps them constantly, often without telling you. This matters because when somebody says ChatGPT got worse this week, what usually happened is the default engine quietly changed under the same app. A real example. According to TechCrunch, OpenAI swapped ChatGPT's default model to something called GPT-5.5 Instant on the fifth of May this year. Same app you opened yesterday, different engine answering you today. Hold onto that idea. It comes back later.
Next term, the context window. That's how much text the model can hold in its head at one time. Your instructions, plus whatever you paste in, plus its reply, all counted together in units called tokens. A token is roughly three-quarters of a word. The bigger the window, the more you can dump in at once. A whole brand guide, ten blog posts, a long transcript, and have it reason over all of it together. The frontier tools in mid-2026 have big windows. Gemini 3.1 Pro and the top ChatGPT tier advertise around a million tokens. Anthropic made a million-token window standard on Claude with no extra charge. To put a million tokens in human terms, that's roughly a seven-hundred-thousand-word document. You will rarely fill it, but having the room changes what's possible.
Third term, web grounding, sometimes just called browsing. This is whether the assistant actually looks things up on the live web right now, or whether it's answering only from what it memorized during training, which has a cutoff date. Grounded answers come with sources you can click. Ungrounded answers can be confidently, fluently out of date. Perplexity is the clearest example of a grounding-first design. Every answer ships with inline links to where it got the information.
Fourth, training on your data. This is whether the company can use what you type to make future versions of the model smarter. Consumer tiers often default to yes. Business and enterprise tiers usually default to no. We'll come back to this, because it's the factor beginners skip and later regret.
Fifth, free versus paid. Every major assistant has a free tier, but free throttles you. Fewer messages a day, an older or cheaper model, a smaller context window. The paid individual tiers cluster around twenty dollars a month, and that's what unlocks the current flagship.
And last, multimodal, sometimes omnimodal. That just means the tool handles more than text. Images, documents, audio, video. OpenAI describes its current model as natively omnimodal. When you hear those words, think, it can see and hear, not only read.
Okay. Words sorted. Let's walk the landscape.
There are seven assistants worth knowing in mid-2026. I'm going to describe them as interchangeable examples, not a ranking, because the rankings move every month and the volatile bits I'll flag as reportedly.
First, ChatGPT from OpenAI. This is the default, the one everybody's heard of, the all-rounder. Writing, brainstorming, analysis, images, working with data. Its current model is GPT-5.5, which came out in late April, and the free default became GPT-5.5 Instant in early May. It's omnimodal. On pricing, reportedly there's a free tier with a limited, older model. A cheaper Go tier around eight dollars a month that went global in January. The Plus tier at twenty dollars a month, which gets you deep research, video generation, and an agent mode. A Pro tier at a hundred or two hundred dollars a month with that roughly million-token context. And business seats around twenty to thirty dollars, plus custom enterprise. For marketing, it's the default for copy drafts, repurposing content, ad variations, built-in images, deep research, and file analysis. Important caveat. Its consumer tiers default to training on your chats.
Second, Claude from Anthropic. Claude's strengths are long-form writing, careful reasoning, handling documents, following a voice or style closely, and admitting when it's not sure, which is genuinely useful. Its models are Claude Opus 4.8, released in late May, plus Sonnet and Haiku versions. The million-token context is standard. Pricing, reportedly, free tier, Pro at twenty dollars a month, a Max tier at a hundred to two hundred, team seats around twenty-five dollars, and enterprise. The fit is brand-voice writing, editing long drafts, summarizing big documents, and nuanced rewrites. Web search is available.
Third, Gemini from Google. Its superpower is the deep tie-in with Google Workspace. Gmail, Docs, Sheets, Slides, Drive. Plus strong native image generation, with image tools nicknamed Nano Banana. Its models are Gemini 3.1 Pro with the million-token context, and a faster Flash version from mid-May. Pricing, reportedly, a free tier, a Plus tier around eight dollars a month, an AI Pro tier at about twenty dollars with the full Pro model and big context, and an Ultra tier starting near a hundred dollars a month. Crucially, it's bundled into Google Workspace Business Standard, which is about fourteen dollars per user a month on annual billing, at no extra charge. So if you already live in Google's tools, Gemini is your strongest and cheapest option.
Fourth, Microsoft Copilot. This is AI living inside Microsoft 365. Word, Excel, PowerPoint, Outlook. Pricing, reportedly, Copilot Pro at twenty dollars a month, though it needs a Microsoft 365 base subscription. A Premium bundle around twenty dollars that includes the apps, Copilot, and a terabyte of storage. Business seats around eighteen dollars, rising to twenty-one in July. And enterprise around thirty. Its fit is unbeatable when your documents, email, and decks already live in Microsoft 365. And here's a quiet fact. Copilot is partly built on OpenAI's models, so its output often overlaps with ChatGPT's.
Fifth, Perplexity. This one's different. It calls itself an answer engine. Research and live web grounding, with inline citations on every single answer. Pricing, reportedly, free tier, Pro at twenty dollars a month or two hundred a year, a Max tier at two hundred a month that includes its Comet browser and a computer agent, and enterprise seats around forty dollars. That Comet browser went free in mid-March. Perplexity's fit is the research and grounding job. Market sizing, competitor scans, finding statistics with links you can verify. It's central to any work on AI-search visibility.
Sixth, Meta AI. It's free, with no paid tier, and it's embedded right inside WhatsApp, Instagram, Facebook, and Messenger. It runs on Meta's Llama 4 model. Its fit is the social-native marketer. Caption help, quick images, fast answers right inside your direct messages. Think of it as a convenient free layer, not a long-form workhorse.
Seventh, Grok from xAI. Its thing is a tight integration with X, the platform formerly called Twitter, including real-time access to the X firehose, and a deliberately unfiltered persona. Pricing, reportedly, free through X, then tiers running from ten dollars a month up to a heavy tier at three hundred, plus business seats around thirty. Its fit is real-time social and trend monitoring on X. Niche, unless X is core to your work.
Let me give you the cheat-sheet version, the one-breath summary. For writing, brand voice, and long documents, reach for Claude or ChatGPT. For sourced research and the live web, Perplexity, or turn browsing on. If you live in Google Workspace, Gemini. If you live in Microsoft 365, Copilot. For native images, Gemini's Nano Banana, ChatGPT, or Grok's image tool. For free and social-native, Meta AI. For real-time X, Grok. And notice the pattern in the money. The twenty-dollar-a-month individual tier is the de facto standard across ChatGPT Plus, Claude Pro, Gemini AI Pro, Perplexity Pro, and Copilot Pro. They've all landed on roughly the same price.
Now, the landscape is the map. The decision is how you actually pick. So here's the framework, and the good news is it survives every model swap, because it's about you, not about this month's leaderboard.
Question one. What work do you do most? Literally sort your week by hours. If most of your hours are writing, that points to Claude or ChatGPT. If it's research, Perplexity or browsing. If it's analysis, spreadsheets and campaign data, that's Copilot inside Excel or Gemini inside Sheets. If it's images, Gemini's Nano Banana, ChatGPT, or Grok. Don't pick for the glamorous thing you do twice a year. Pick for the thing you do every Tuesday.
Question two. Where does your data go? This is the privacy and training question, and it's the one beginners skip every time. Consumer tiers frequently default to training on your chats. Business and enterprise tiers default to not training. Per OpenAI's own data controls documentation, ChatGPT Business, Enterprise, and education plans, plus the developer interface, are excluded from training by default, while consumer plans use your chats to improve models unless you go into Settings, then Data Controls, and turn it off. Here's a gotcha that surprises people. Even with training turned off, clicking the thumbs-up or thumbs-down button on a reply can authorize that specific chat for training. And there's typically a thirty-day retention window for abuse monitoring. So the rule is simple. Client-confidential data goes in a business or enterprise account. Never in a consumer free tier.
Question three. What's your existing stack? The cheapest win in all of AI is the assistant already bundled into a tool you're paying for. Google Workspace users already get Gemini on Business Standard for about fourteen dollars a user. Microsoft 365 users already get Copilot. Don't go pay twenty dollars for a second tool that does the same job your existing suite already includes. Don't pay twice.
Question four. Budget. Think of three rungs. The free rung, capable but throttled and often an older model. The roughly twenty-dollar individual rung, which is the standard flagship tier. And the team or enterprise rung, eighteen to forty dollars a seat, which buys you no-training-by-default plus admin controls. Most solo marketers live happily at one or two of those twenty-dollar subscriptions.
Question five. One assistant, or a small stable? Start with one. Lowest friction, and you learn it deeply instead of poking at five things shallowly. Add a second only when a recurring job, something you keep needing, is something your first tool keeps doing badly. A mature setup tends to be two or three tools. One writing workhorse, one research tool, and whatever's already bundled in your office suite. What you want to avoid is collecting subscriptions you never actually open.
Now we get to the heart of this whole episode. The durable skill. And it's this. Learn to read the field for yourself, instead of trusting a leaderboard or a hype thread. Because this month's winner is not next month's winner, and the only thing that lasts is your ability to evaluate a tool for your work.
Let me show you how fast the field moves. Just in the first five months of 2026, look at the churn. A new Gemini Flash in mid-May. GPT-5.5 in late April, and then that quiet free-default swap on the fifth of May. Claude Opus 4.8 in late May. A new Grok. Perplexity making its browser free in mid-March. Any best AI assistant article you read is stale within weeks. So here's the mindset. Expect churn. Re-test, don't re-read.
And please, be skeptical of the public leaderboards and the vibes rankings. Here's why. There's an old idea called Goodhart's Law. When a measure becomes a target, it stops being a good measure. The crowdsourced arenas where people blind-vote on AI answers have become influential enough that the labs actively optimize to win them. So a high score there tells you people preferred an answer in a blind vote. It does not tell you the answer was accurate, or on-brand, or good at your specific task. The technical benchmarks, the ones with names like MMLU or SWE-Bench, measure coding, math, and reasoning. None of them measure, writes our newsletter in our voice. And the hype threads on social? They measure novelty, not fit.
So here's the fix, and it's the most important thing I'll say today. Run your own small evaluations. I'll give you three repeatable tests, each about fifteen minutes, and you run them on your own brand and your own funnel, not on some generic example.
Test one, the brand-voice test. Paste in three to five of your best existing pieces, things you're proud of, and say, study the voice in these, now write a new email about whatever topic in exactly this voice. Then judge. Does it sound like you, or does it sound like generic AI? And here's the trick that makes it honest. Have someone who actually knows your brand read it cold, without knowing AI wrote it, and flag where the voice drifts. One documented trial trained on about twenty posts and reached roughly eighty-five percent tone match. That's your bar to compare against.
Test two, the research and grounding test. Ask it a real question one of your customers would ask. Something like, what's the best tool in my category for this use case in 2026? Demand sources. Then click every single link. Does it cite real, current pages? Or does it invent sources that look plausible and lead nowhere?
Test three, the factual-accuracy probe. Ask it something you already know cold. A statistic, a date, a competitor's pricing. Something you can check in two seconds. Because here's the logic. If it's confidently wrong about the thing you can verify, you cannot trust it on the things you can't. That's the whole point of the probe.
And don't score these on vibes. Use a shared rubric. Five or six things. Accuracy. On-brand voice. Did it cite sources. How much editing effort it needed. Speed. And whether it hedged when it wasn't sure, which is actually a good trait, not a weakness.
Where do you watch the field without checking it every single day? Three places. The assistant's own model release notes or changelog. The company blogs. And this show's news segment, which is exactly why we open every episode with one. Re-run your three tests when outputs visibly change, or when a new flagship ships. Quarterly is plenty. You do not need to chase this daily.
Alright. Let's make this concrete with a copyable starter workflow. This is the thing to actually do this week. No code, no automation, nothing fancy. Just you, a real task, and a couple of free accounts.
Step one. Pick a real task you actually owe this week. Not a toy. For example, a one-hundred-fifty-word promo email for your June webinar, or a one-paragraph competitor summary with sources.
Step two. Pick two or three assistants to test. A good starter trio is one writing-leaning tool, Claude or ChatGPT, one research-leaning tool, Perplexity, and whatever's bundled in your office suite, Gemini if you're on Google, Copilot if you're on Microsoft. Free tiers make this cost you nothing.
Step three. Write one prompt, and reuse it word for word in each tool. Same prompt every time, so it's a fair test. Your prompt should include who you are, your audience, your goal, your voice with two or three samples pasted in, and the format and length you want. Here's a skeleton you can adapt. You're writing for my brand, a business that sells whatever to whoever. Match the voice in these samples, and paste them. Write a one-hundred-fifty-word webinar promo email with a subject line and a call to action. Use only facts I provide, and if you need a fact I didn't give you, ask for it instead of inventing it.
Step four. For the research task, demand citations and open every link to confirm they're real and current.
Step five. Score each tool on that five-line rubric. Accuracy, on-brand, cited sources, editing effort, speed. One sheet, three columns.
Step six. Run the factual probe in each. Ask the thing you already know the answer to.
Step seven. Decide and commit for about thirty days. Pick the one, or two, that needed the least editing on the work you actually do. Re-run only when a new flagship ships or the outputs visibly change.
Step eight, and this one is non-negotiable. Before you paste anything sensitive, check the data setting. In ChatGPT that's Settings, then Data Controls, and there's an equivalent in each tool. Client-confidential material goes in a business or enterprise account. Or you simply don't paste it.
Now I have to warn you about the single most expensive mistake beginners make. Trusting the output as fact. The technical word is hallucination, which just means the AI confidently makes something up. A fabricated statistic or a fake citation in your published content can quietly torch your credibility.
Here's the scale of it in mid-2026. Frontier hallucination rates run anywhere from about three percent to about nineteen percent, depending on the model and the task. No 2026 model can be trusted on facts without verification. And citations are the worst category of all. Even with extended reasoning turned on, the citation hallucination rate is around twelve percent. The assistants invent plausible-looking sources, complete with fake reference numbers. They look completely real.
So how do you recognize it? The tell is an answer that's fluent, specific, and unsourced. Confident numbers with no link. Or links that lead to a dead page, or to something totally unrelated. And flip it around. A model that says, I'm not certain about this, or that asks you for the fact instead of guessing, is actually behaving better. Reward that.
How do you avoid it? Demand sources and click them. Cross-check any statistic, quote, or claim before you publish. Turn on web grounding for anything factual. And never, ever publish a citation you didn't open yourself. The extended-thinking modes roughly cut the rate in half, but they do not eliminate it.
And a couple of related traps to keep in the back of your mind. One, pasting confidential client data into a consumer tier that might train on it. You recognize it when you notice you're on a free or personal login and you just pasted a client's unreleased plan. The fix is a business account, or don't paste it. Two, the free tier silently serving you the older, weaker model. You recognize it by message caps and a model badge you don't recognize. The fix is to check which model you're actually on before you judge the tool. And three, picking by hype instead of testing. You recognize it when you catch yourself saying, I switched because of a thread, not because I tested it on my own work.
So here's where we land. The tools will change. They'll change next month, and the month after that. But the job doesn't. Sort your week by hours, mind where your data goes, use what's already bundled, and pick one tool to start. Then read the field with your own three tests on your own brand and your own funnel, and trust what you can verify over what you merely read in a ranking. Do the starter workflow this week. Pick a real task, run two or three assistants, score them, and commit for thirty days. That's the foundation. Everything we build in this show stacks on top of it.