How do I trigger Goal mode in the Codex desktop app?

The /goal command is currently an experimental Codex CLI feature, not a documented desktop app command. For desktop app users, the equivalent capability is using Codex Cloud (chatgpt.com/codex), which is async-first by design — open a thread, give it a long-horizon objective, and walk away.

What is AGENTS.md in Codex?

AGENTS.md is a plain text file Codex reads at the start of every session, carrying your tone preferences, terminology, and folder conventions. The Codex desktop app has a UI for it in Settings → Personalization → Custom instructions — no file editing required.

The Codex Playbook for Procurement Teams | Molecule One

Practical Playbook · 2026 Edition

The Codex Playbook for Procurement Teams

Q: Is Codex the same as ChatGPT?

No. ChatGPT is OpenAI's conversational assistant; Codex is OpenAI's agent, included with every paid ChatGPT plan (Plus, Go, Pro, Business, Edu, Enterprise). Codex reads and writes files on your machine, runs multi-step workflows, executes skills, schedules automations, and pursues long-horizon objectives in cloud sandboxes.

Q: Does OpenAI train on our data when we use Codex?

By default, no — for ChatGPT Business and Enterprise plans. The default for Plus and Pro consumer plans is opt-out; you can disable training in settings. Business and Enterprise contracts explicitly exclude training.

A practical guide for procurement practitioners, analysts, category managers, sourcing leads, contract managers, buyers, SRMs, who want to use OpenAI's Codex as a working tool on their own desk. Not as a coding assistant. As a general-purpose agent that takes 30% off your week, starting tomorrow.

By Molecule One

16 Sections · ~20,000 words

No developer skills required

The Procurement Playbook Series

📘

You are here

The Codex Playbook for Procurement Teams

A practical guide for procurement practitioners using OpenAI's Codex day-to-day. Setup, skills, automations, credit budgeting, and a 30/60/90 rollout plan.

📗

Companion

The Claude Cowork Playbook for Procurement Teams

The sibling playbook for teams in the Anthropic / Claude ecosystem. Mirror structure, different stack. Read both if you're choosing or already running both.

★

How to use this document set

This is a long document. Most people will not read it end to end on the first sitting, and that's fine. We've written it so you can land in the section that matches where you are today.

1. New to Codex, Start at Section 2 (Getting Set Up). Walk through it in order through Section 5 (Building Your First Skill). Stop there, build the Supplier Research Brief, and come back when you're ready for more.

2. Already using Codex CLI or the IDE extension, Skim Section 2, then jump to Section 5 (Building Your First Skill) and Section 6 (The Procurement Skill Library). The orientation you have as a developer doesn't fully transfer when you start using Codex for procurement work, Section 7 (Automation) and Section 8 (Connectors) cover the real differences.

3. Rolling out to a procurement team, Read the Introduction, then jump to Section 11 (Managing Credits), Section 12 (Rolling This Out to a Team), and Section 13 (What Not to Use Codex For Right Now). Come back to the early sections for context once your governance position is set.

✦

Introduction

Why this playbook exists, and what makes Codex different.

For most of the last decade, the tools sold to procurement teams have shared a pattern: vendor builds an opinionated platform, procurement reshapes its workflow to fit the platform, and three years later half the team is back in Excel. The reshaping is the expensive part. The reshaping is also the part that almost never gets factored into the ROI deck.

We wrote a Claude Cowork playbook last quarter on the same premise: that the era of bending procurement workflows around new technology is over, and the tools that win from here will be the ones that bend to the workflow instead. Cowork delivered on that promise. Quietly, while everyone was watching Claude, OpenAI was rebuilding Codex from a developer-only coding assistant into a general-purpose desktop agent. The April 2026 desktop app update was titled, in OpenAI's own words, "Codex for almost everything." That phrase deserves a procurement test.

So we ran one. Since the April 16 "Codex for almost everything" release, we've been running the same procurement workflows through Codex that we ran through Cowork, supplier research, RFP drafting, spend variance scans, category strategy briefs, scorecard refreshes, contract renewal radars, and we kept notes on what worked, what didn't, and what only worked one way. This playbook is those notes, structured.

A few things up front, because they shape the entire document. First, Codex was built for engineers and that origin is still visible in places. Some setup decisions feel awkward if you've never opened a terminal, we route around those wherever we can. Second, Codex moves fast. OpenAI shipped updates to the desktop app, the CLI, and the cloud surface roughly every week through Q2 2026. By the time you read this, at least two things in here will have improved. We've flagged the parts most likely to age. Third, this is a guide for procurement professionals inside the OpenAI / ChatGPT ecosystem. The honest reality is that most procurement teams don't pick their AI ecosystem, your organization is already on one stack or another, usually because IT or engineering chose first, or because a company-wide license deal locked the decision in before procurement was at the table. We include an honest comparison with Claude Cowork in Section 3 for the small number of readers who genuinely have the freedom to choose, but the rest of this document assumes Codex is the tool on your desk and gets on with showing you how to make it earn its keep.

This playbook is your starting point. What you build with it is yours.

Who This Playbook Is For and How to Use It

Audience, prerequisites, and reading paths

Primary audience

You're a procurement practitioner, an analyst, category manager, sourcing lead, contract manager, tactical buyer, or supplier relationship manager. You spend your day doing the actual procurement work: supplier research, RFPs, spend variance scans, contract analysis, scorecard refreshes, vendor onboarding, the things that fill a calendar.

You don't write code, you don't want to write code, and you don't need anyone's permission to start using a better tool on your own desk. You're not waiting for a committee to approve a transformation program. You want to know whether Codex can take 30% off your week starting tomorrow morning, and if so, exactly how to set it up before lunch.

This playbook is written for that person. It's task-first, not strategy-first. Every section asks the same question: can you use this tomorrow? If the answer is no, the section doesn't survive.

If you're a procurement leader thinking about a wider rollout, the team-level material is in Section 12, but the rest of the document is sized to be useful to a single working professional reading on a Tuesday afternoon with two RFPs on her desk.

Prerequisites

To follow this playbook, you'll need three things:

A ChatGPT plan that includes Codex access, that means Plus, Go, Pro, Business, Enterprise, or Edu. Section 2 covers which tier to pick.

A workstation (Mac or Windows) you can install the Codex desktop app on.

A folder on that workstation you're willing to let Codex read and write to. The desktop app uses what OpenAI calls a "local environment", you pick the folder, Codex stores its config in a hidden .codex/ subfolder, and your files stay on your machine. Nothing in this playbook requires you to connect a GitHub account or sync work to a remote repository. (If you later decide to use Codex Cloud, Codex in Slack, or the GitHub code-review integration, those surfaces do need GitHub. We flag where that becomes relevant in Section 8.)

Choose your reading path

Three ways to use this document.

Start with foundations. If you're new to Codex, read Sections 1–5 in order. By the end you'll have Codex installed, an AGENTS.md file tuned to how you work, your first skill built, and a working sense of what the tool actually does on real procurement work. About a 90-minute investment, and most of that is hands-on, not reading.

Level up. If you've already played with Codex casually but never built anything reusable, start at Section 4 (What Codex Uniquely Does Well) and read through Section 9. This is where the leverage is, the difference between using Codex as a fancier chatbot and using it as an agent that finishes work while you're in meetings.

Bring it to your team. If you've been using Codex solo for a few weeks and a colleague is asking how to start, jump to Section 12 (Rolling This Out to a Team) and Section 13 (What Not to Use Codex For). These are the sections you'll send to the colleague, and to the manager whose sign-off you'll eventually want for the team plan.

Getting Set Up

Plan, install surfaces, AGENTS.md, plugins, and the first 30 minutes

This is the longest setup section in the playbook. If you do nothing else, do this. Most of the value from Codex comes from the decisions you make in the first hour of using it.

The plan you need

Codex is included with every paid ChatGPT plan, Free, Go, Plus, Pro, Business, Edu, and Enterprise (OpenAI's official line). The tiers that matter for procurement work are below, and we've checked every number against OpenAI's published pricing page rather than retyping what some blog said:

Plan	Price	Codex usage	When to pick it
ChatGPT Free	$0	Light usage, baseline	Curiosity only. You'll hit limits in your first real workflow.
ChatGPT Go	$8/mo	Modest uplift over Free	Almost no one's right answer. Skip.
ChatGPT Plus	$20/mo	1× (the baseline every other plan is measured against)	Solo trial. One person, light use, decide-in-30-days.
ChatGPT Pro $100	$100/mo	10× Plus through May 31, 2026 (drops to 5× from June 1)	Daily individual use. Launched April 9, 2026, deliberately priced to match Anthropic's Claude Max.
ChatGPT Pro $200	$200/mo	20× Plus on an ongoing basis, plus a temporary 25× boost on the 5-hour limits through May 31, 2026	Power user. Long Cloud runs and parallel cloud tasks.
ChatGPT Business	$20/seat/mo (annual) or $25/seat/mo (monthly) as of April 2, 2026	Pooled, admin controls; Codex-only seats available with no fixed seat fee (pay-as-you-go on tokens)	Team rollout. The minimum tier for shared governance.
ChatGPT Enterprise	Custom, third-party deal data clusters around ~$45–$75/seat/mo, ~$60 typical, with a 150-seat minimum and an annual contract	Custom, with Compliance API, OpenTelemetry export, SSO/SCIM/EKM/RBAC	Regulated industries. Audit logging, retention controls, larger orgs.

A note on the pricing model. On April 2, 2026, OpenAI moved Codex from per-message billing to token-based billing, but at that time the change applied only to Business customers and new Enterprise contracts. Plus and Pro tiers are still billed on the older message-based card and are scheduled to migrate over the following weeks (existing Enterprise/Edu accounts started migrating April 23). What this means in practice: Codex usage is increasingly measured in input/output tokens the way the OpenAI API has always been, and a heavy day (a multi-hour Cloud Goal-mode run plus three parallel cloud sandboxes) can consume several days' worth of light use. If you're paying for the seat yourself or watching a small budget, you'll feel this. Section 11 covers credit budgeting in detail.

The four surfaces (and the one that matters most for you)

Codex isn't one app. It's a suite of surfaces, and a meaningful chunk of the procurement-onboarding confusion comes from people not knowing which one to use. Here's the field:

Codex Desktop App (Mac / Windows). Released as a full general-availability product alongside the April 2026 upgrades. This is the surface this playbook spends 90% of its time on. It's the closest analog to Claude Cowork: a window on your machine where you talk to Codex, point it at files, run skills, schedule automations, and watch it work. Install this first.

Codex Cloud (web). Browser surface at chatgpt.com/codex. You log in, kick off a task, close the tab, and check back later. Codex runs the task in OpenAI's own sandboxes, no local resource use. This is where you'll run long autonomous sessions (Goal mode) and parallel jobs. Install nothing; just bookmark the URL.

Codex in Slack. A Slack app you add to your workspace. Mention @Codex in any channel, hand it a prompt, and it creates a cloud task and replies in the thread. Setup is 5 minutes. The procurement use cases for this are larger than they sound and Section 8 covers them.

Codex IDE extension. Plugin for VS Code, JetBrains, Cursor, or Windsurf. Skip unless someone on your team is technical and wants the developer surface. You'll never need this for procurement work.

Codex CLI. Terminal tool. Skip entirely for this playbook. We'll mention it twice in a sidebar; otherwise it does not exist for your purposes.

Your workspace folder

The desktop app needs a folder on your machine to read from and write to. The same way Cowork uses a workspace folder, Codex uses what it calls a project folder, and you point Codex at it during setup.

A folder structure that holds up:

~/CodexProcurement/
  /00_inbox          → scratch space, exports you haven't processed
  /01_categories     → one subfolder per category you manage
  /02_suppliers      → one subfolder per active supplier
  /03_contracts      → contract working copies
  /04_rfps           → RFPs in progress, archived RFPs
  /05_skills         → your saved skills
  /06_templates      → reusable doc templates
  /99_archive        → done work

The numeric prefixes are a quiet trick: Codex sorts alphabetically when it scans a folder, and the prefixes let you keep the most-used folders at the top.

Custom instructions (AGENTS.md): your highest-leverage setup move

This is the single highest-leverage setup move in this whole document. Good news for the non-technical reader: you don't have to open a text editor to do it.

What this is, in one sentence: a set of instructions Codex reads at the start of every session, telling it how you work, your tone preferences, your team's terminology, your folder conventions, the things you never want it to touch. Think of it as the one-pager you'd hand to a new contractor on their first day. Under the hood it's a file called AGENTS.md, but you almost never need to touch the file directly. The Codex desktop app has a UI for it.

The UI path (recommended for most readers). Open Codex. Press Cmd+, (Mac) or Ctrl+, (Windows), or pick Settings from the app menu. Go to the Personalization section. You'll find:

A choice of default personality, Friendly, Pragmatic, or None. For procurement work we recommend Pragmatic; it produces cleaner, less-padded prose. Pick None if you want zero personality flavor.
A free-text field labelled custom instructions. Paste the starter block below into it and edit the bracketed placeholders for your role, company, and category. Save.

That's it. The app writes your text into your personal AGENTS.md automatically and Codex reads it on every session (official confirmation).

The file path (only if you need it). If you want to share instructions across the whole team, or you want different instructions per project, you may want to edit the file directly. Codex looks for AGENTS.md in two places: globally (~/.codex/AGENTS.md), and at the root of each project folder. Global rules apply everywhere. Project rules layer on top. The total instruction budget is capped at 32 KB; the official guidance is to keep the global file under 2–3 KB so project files always fit within budget. In practice you want short and sharp, not exhaustive.

Here's the starter block. Paste it into the Personalization field in Settings (or save it as a file if you've gone the file route), and edit the bracketed placeholders.

# Procurement AGENTS.md

## About me
- I work in procurement at [COMPANY]. I lead [CATEGORY / TEAM].
- I do not write code. I work in Google Docs, Sheets, Slides, and Slack.
- I am not technical. Explain in plain English. No code unless I ask for it.

## My tone
- Direct, concise, written for executive readers.
- No filler ("As an AI…", "I hope this helps").
- No bullet points unless I ask for them.
- Match the formality of the input. RFP drafts are formal; internal Slack messages are casual.

## My terminology
- "Supplier" not "vendor" unless the doc is internal.
- "Spend" not "spending".
- "Tail spend" = anything under [$ amount] per year per supplier.
- "Strategic supplier" = one of these 8: [list].

## My workflow
- All files go in ~/CodexProcurement (see folder structure).
- Drafts go in /00_inbox. Move to category/supplier folder only after review.
- When you save a doc to Google Drive, name it [YYYY-MM-DD]_[Topic]_[v#].

## Things I never want you to do
- Never send an email, schedule a meeting, or sign anything without explicit confirmation.
- Never delete files from /03_contracts or /04_rfps.
- Never put supplier pricing or contract terms in a public Slack channel.
- Never assume, if a supplier name, contract value, or date is ambiguous, ask.

## Things I want you to do every session
- Read this file. Confirm by name what category I'm working in if you can tell from the context.
- When summarizing long docs, lead with the executive summary, not the methodology.

After you save (either in the Personalization UI or by writing the file), open a fresh thread and ask:

If Codex echoes back what you wrote, your instructions are live. If not, check that you saved in Settings, or check the file path if you went the file route, and restart the app.

Projects: persistent workspaces

Codex Projects are containers that bundle a folder, an AGENTS.md, a set of approved plugins, a memory store, and any saved skills together as one persistent workspace. Open a project and Codex remembers everything you've done in it. Close it and the context stays. Open another project and you get a fresh start.

For procurement, the structure that works:

One project per category you actively manage. Lasts as long as you own the category.
One project per major active RFP. Closes when the RFP closes.
One "Inbox" project for ad hoc work that doesn't belong anywhere else.

Memory does not transfer between projects. If you mentioned in your Category A project that Supplier X is on a 30-day notice clause, Codex won't know that in your Category B project. This is deliberate, and it's the right design, but it's also why your AGENTS.md should carry the truly cross-cutting things (your name, role, tone, terminology) while project notes carry the category-specific things.

Plugins to set up on day one

Codex plugins are the equivalent of Cowork's MCP connectors. Plugins gained first-class status in Codex through Q1 and Q2 2026, with OpenAI announcing the official Google Drive plugin and a plugin marketplace alongside the desktop app GA. Some are first-party (built by OpenAI); some are community or third-party. Three tiers to think about:

Install on day one:

Plugin	Type	What it does for procurement
Google Drive	First-party	Read/write Google Docs, Sheets, Slides; navigate Drive folders. The single biggest win.
Web Search	Built-in	Live web access for supplier news, market data, public filings.
Slack	First-party (optional)	Receive @Codex mentions, post results. Covered in Section 8. Requires GitHub because Slack mentions trigger Cloud tasks.
GitHub	First-party (optional)	Only needed if you'll use Codex Cloud (the web surface), Slack `@Codex`, or the code-review integration. Skip otherwise, the desktop app works fine with a local folder.

Install when you need them (Section 8 covers each in detail):

Plugin	Type	What it does
Browser Use (in-app)	First-party	Drives a browser inside the Codex desktop app for supplier portals with no API.
Computer Use (macOS)	First-party	Lets Codex operate other apps on your Mac. Higher trust, narrower use cases.

Honest gaps, no native plugin today:

Microsoft 365 (Outlook, SharePoint, OneDrive, Teams). The biggest gap for procurement. Cowork has full M365 connectors; Codex does not, as of this playbook's publication date.
DocuSign. No connector. Draft contracts in Codex, sign in DocuSign manually.
SAP Ariba, SAP Concur. No connectors. Workaround via CSV export.
ServiceNow, Microsoft Dynamics. No connectors.
Coupa, Ivalua, Jaggaer. No connectors.

Section 8 covers the workarounds and the in-app browser fallback. The gaps are real and you should plan around them, not pretend they don't exist.

Your first 30 minutes

A guided run-through. Do these five things in order. Copy the prompts as written.

1. Confirm AGENTS.md is loaded.

Expected: Codex echoes back the tone, terminology, and folder rules you set.

2. Read your folder structure.

Expected: Codex describes your folder structure and may suggest one or two improvements.

3. Read a real document.

Expected: A clean exec summary plus a recommendation. This is the moment most procurement professionals decide whether Codex is worth their time.

4. Try Plan mode.

In the composer, type /plan-mode to toggle Plan mode on. You'll see the indicator change. Then send:

Expected: A numbered plan, 5–8 steps, with what data Codex would pull and what the output would look like. You can edit the plan before approving it.

5. Save your first skill.

Expected: Codex confirms the skill is saved and offers to test it.

If all five of these worked, you are set up. Move to Section 3.

2 of 16 sections

Continue reading. It's free

Enter your work email to unlock the remaining 14 sections, the mental model, what Codex uniquely does, skill building, automations, connectors, data work, slides, credit budgeting, team rollout, governance, the 30/60/90 plan, role cheat sheets, and running Codex alongside Cowork. You'll also get the free Procurement Codex Starter Pack (AGENTS.md + 8 skill files) by email.

3 · Mental Model 4 · Unique Strengths 5 · First Skill 6 · Skill Library 7 · Automation 8 · Connectors 9 · Data Work 10 · Slides 11 · Credits 12 · Team Rollout 13 · Guardrails 14 · 30/60/90 15 · Cheat Sheets 16 · Running Both

Work email only. No spam. Unsubscribe anytime.

The Mental Model

How Codex fits alongside ChatGPT, Cowork, and your existing tools

Is Codex the same as ChatGPT?

Short answer: no, but it's included with every paid ChatGPT plan. The longer answer is the most useful thing in this section.

ChatGPT is OpenAI's conversational assistant, the thing at chat.openai.com where you type questions and get answers. Codex is OpenAI's agent, sold and licensed under the same ChatGPT umbrella (Plus, Go, Pro, Business, Edu, Enterprise) but built for a different job: reading and writing files on your machine, running multi-step workflows, executing skills, scheduling automations, driving a browser, and pursuing long-horizon objectives in cloud sandboxes. You don't buy Codex separately. If you have a paid ChatGPT plan, you have Codex access. You just have to install the Codex app (Section 2) to actually use it.

Where the confusion gets expensive: a procurement leader searches "how to use ChatGPT for procurement," lands on ChatGPT's conversational interface, types a few questions, and concludes "AI isn't ready for our workflows." The right tool for that procurement work, file-touching, skill-running, automation-scheduling, was Codex, sitting one install away inside the same subscription.

This playbook is about the Codex part of your ChatGPT plan.

The surface map

Most of the confusion about Codex in procurement teams comes from the fact that it shares a brand and a login with ChatGPT, and people reasonably assume they're the same product with different surfaces. They share a model family, GPT-5.5, GPT-5.3-Codex, and GPT-5.4 are all available inside Codex, but the surface, the affordances, and the workflows they enable are different enough that you should think of them as separate tools that happen to live behind the same account.

Surface	What it is	Best for	Not for
ChatGPT (chat.openai.com)	Conversational interface. No local file access. No persistent project workspace. Memory is account-wide.	Quick research questions, drafting, brainstorming, learning. The "I have a question" tool.	File-based work, automations, repeated workflows, anything that touches your folder structure.
Codex Desktop App	Agentic desktop tool. Reads local files, runs plugins, executes skills, schedules automations, runs in-app browser.	Multi-step procurement workflows. RFP drafting, scorecard refreshes, category briefs, contract analysis. The workhorse.	Anything legally binding without human signature. Anything requiring real-time market data feeds. ERP write-back.
Codex Cloud (web)	Long-running tasks in OpenAI's sandboxes. Doesn't use your machine's resources.	Overnight autonomous runs (Goal mode), parallel jobs, anything that takes >30 minutes.	Interactive conversational work; cloud sandboxes are async-first.
Codex in Slack	`@Codex` mention in a channel creates a cloud task. Replies in-thread.	Quick delegations from where your team already lives. "Pull the renewal terms on Acme."	Long, complex multi-step work, use the desktop app for those.
Codex IDE extension	Sidebar in VS Code/Cursor/Windsurf/JetBrains.	Engineers in your procurement-tech team if you have any.	Day-to-day procurement work. Skip.

The single most useful thing you can do, mentally, is split your work into "is this a question?" and "is this a task?"

Questions go to ChatGPT. Tasks, anything where a file gets read, a doc gets written, a skill gets invoked, a folder gets scanned, go to Codex.

The Codex vs Claude Cowork decision frame

We get this question every week, and we want to be honest about who it actually applies to.

Most procurement professionals reading this don't have a free hand on AI tooling. Your organization is already in one ecosystem because IT or engineering, usually the first adopters in any company, picked first and the rest of the business followed, or because a company-wide license deal made the decision for you. If you're in that situation, the comparison below is academic. Skim it for context, then move on to Section 4 and use the rest of the playbook to get the most out of the tool you've been handed.

If you do have agency over the choice, a smaller procurement function with its own budget, a CPO with the leverage to spec the stack, or an organization deliberately leaving the call to the function, read on. The honest answer depends on which tools you already live in.

When Codex is the right fit:

You already run on Google Workspace (Docs, Sheets, Slides, Gmail, Drive). Codex's Google Drive plugin is the most polished surface in the product.
Your team lives in Slack. The @Codex integration is meaningfully better than the alternative.
You need to run multiple long-horizon tasks in parallel. Codex Cloud's parallel sandbox model has no Cowork analog.
You want delegation that doesn't require your laptop to be on. Cowork's scheduled tasks need your machine awake; Codex Cloud doesn't.
Your organization is already standardized on the OpenAI / ChatGPT stack. The procurement seats are easier to add to an existing Business or Enterprise contract.

When Cowork is the right fit:

You run on Microsoft 365 (Outlook, SharePoint, OneDrive, Teams). Cowork has full M365 connectors; Codex doesn't.
DocuSign is part of your daily workflow. Cowork connects natively; Codex doesn't.
Your spend cube lives in SAP Ariba or Concur. Cowork has CData-mediated connectors today; Codex requires CSV export.
You're in a regulated industry that needs the audit trail story right now, not in a future release.
Your team spends most of its day in Excel. The Excel integration in Cowork is significantly more mature than Codex's Google Sheets equivalent for spend-heavy work.

When both make sense:

Mixed estate, ambitious procurement function, the rare team with budget and patience to split workflows by tool. Section 16 covers the operating model, but it's a path for the few, not the many.

The framing we use internally when a team does have agency: Cowork is better at being inside your existing workflow today. Codex is better at scaling work beyond what one person can do in real time. If your bottleneck is workflow fit, Cowork wins. If your bottleneck is throughput, Codex wins. If your bottleneck is "the choice wasn't mine to make", and for most readers it isn't, then the comparison was settled before you arrived, and the rest of this playbook is the guide to making the most of where you've landed.

What Codex Uniquely Does Well

Five things Codex can do that no other procurement AI tool can, yet

A new tool only earns its way into your workflow if it does something the existing tool can't. The previous section drew a fair line between Codex and Cowork on workflow fit. This section is about the dimensions where Codex is not just an alternative, but genuinely without a peer in the procurement-AI category today.

These five capabilities are why the procurement teams we've worked with this quarter, even the ones happily standardized on Cowork, keep coming back to ask whether they should add Codex as a second tool. The honest answer is: it depends on whether any of these five capabilities solve a real bottleneck for you.

1. Overnight Cloud runs (Goal mode) that finish before you arrive

Goal mode is Codex's capability to pursue a single durable objective autonomously for hours, sometimes days, across multiple sessions, token budget resets, even your laptop closing and reopening. The objective persists. Codex plans its own steps, executes them, checks its own output, corrects course when it fails, and stops only when it's confident it has reached the goal, or when it hits something it genuinely can't get past without you.

For procurement, this is the single most novel capability in the product. Cowork's scheduled tasks have to fit inside the chat session that triggers them. ChatGPT can't keep state across hours of work. Goal mode was built for exactly this, and the use cases that fall out of it are the ones procurement teams have been asking for since the GenAI wave started.

How you actually trigger it depends on which surface you use:

Codex Cloud (chatgpt.com/codex), recommended for desktop users. Cloud is async-first by design. Open a thread, hand it a long-horizon objective ("Refresh every category strategy in /01_categories/ for Q2..."), submit, close the tab, and walk away. Codex Cloud runs autonomously for hours in its own sandboxes. This is the practical Goal-mode path for non-technical users on the desktop app, and it's how the rest of this playbook talks about long autonomous runs.
Codex CLI (/goal), for the technically inclined. If you (or someone on your team) is comfortable in the terminal, /goal <objective> in the CLI is the literal command. There's a full lifecycle: /goal alone checks status, /goal pause halts, /goal resume picks back up, /goal clear ends it. The feature is currently experimental and enabled via /experimental or by adding goals = true under [features] in ~/.codex/config.toml (official use case docs).
Codex desktop app, not yet. As of this playbook's date, /goal isn't a documented slash command in the desktop app (the app exposes only /feedback, /mcp, /plan-mode, /review, /status by default). Use Codex Cloud for the equivalent capability.

Concrete use cases:

Multi-supplier deep dives. Kick off at 6pm with "Run a full risk-and-fitness brief on all 12 candidate suppliers for the upcoming MRO RFP, using the criteria in /04_rfps/MRO_2026/eval_criteria.md. Save each brief as a separate Google Doc in /02_suppliers/MRO_candidates/. When done, produce a comparison matrix in a new Sheet." Walk in at 9am to 12 briefs and a matrix.
Quarterly category refreshes. "Refresh every category strategy doc in /01_categories/ for Q2. Pull spend from the latest GL extract in /00_inbox/Q2_GL.csv. Flag any category where spend moved more than 15% QoQ. Save updated docs back to the category folders and produce a summary memo."
Long-form contract analysis. "Read every contract in /03_contracts/Acme_Group/. Build a single matrix capturing renewal date, value, payment terms, termination clauses, and any out-of-pattern terms. Flag risks. Output to a Google Doc in /03_contracts/Acme_Group/_analysis/."

We've seen teams use overnight Cloud runs to clear, in a single night, work that would have taken an analyst three weeks. The first time you do it, it feels uncomfortable. By the third time, you start saving the harder work for the evening on purpose.

2. Parallel cloud sandboxes, "run this on all 12 at once"

When you queue multiple tasks in Codex Cloud, Codex does not work through them sequentially. It spins up a separate cloud environment for each task and works on all of them in parallel. The procurement implication is straightforward and large.

Concrete use cases:

Parallel supplier scorecard refreshes. Fire off 12 scorecard refresh tasks at once instead of waiting for them to run in sequence. What was a half-day of work compresses into the time it takes the longest single task to finish.
Multi-RFP response evaluation. Six suppliers submitted RFP responses. Send each to its own cloud task with the same evaluation rubric. Codex evaluates all six in parallel and posts the scoring back to a shared sheet.
Cross-category spend variance scans. Eight categories, same prompt, run them all simultaneously. The output is eight memos sitting in your inbox 20 minutes later.

The pattern OpenAI's internal engineering teams call this is the "abundance mindset", instead of trying to perfect a single request, you launch many in parallel and curate the outputs. The procurement version of the abundance mindset works exactly the same way, and it's the single biggest mental shift that separates teams getting real value from Codex from teams using it as a fancier chatbot.

3. Slack-native delegation

The Codex Slack app, generally available in 2026, lets you mention @Codex in any channel with a prompt and have Codex create a cloud task and reply in the thread. This is more useful for procurement than it sounds at first hearing.

Codex reads the prior thread context automatically. You don't have to restate. If your category team has been discussing a supplier issue in a Slack thread for a week, you can drop "@Codex pull the last three exchanges with this supplier from /02_suppliers/Acme/correspondence and summarize whether they're moving in good faith" into the thread, walk away, and have the answer when you check Slack after lunch. It works because Codex inherits the thread as context, runs the task in a cloud sandbox, and replies where the work was happening.

Concrete use cases:

Mid-meeting research. You're on a call. A supplier name comes up that no one in the room knows. Drop @Codex who is [supplier]? Single paragraph, focus on financial stability and any recent press into the team channel. The answer arrives before the call ends.
Async digest creation. @Codex read the last 7 days of messages in #category-mro and write a one-page summary for the QBR deck.
Quick fact lookups. @Codex what was the negotiated discount we landed on Acme's MSA last renewal? Check /03_contracts/Acme/.
Fire-and-forget delegation. @Codex pull the renewal terms on every contract in /03_contracts/ that expires in the next 90 days, post a list here.

You need a Plus, Pro, Business, Enterprise, or Edu plan, a connected GitHub account, and at least one Codex environment configured. Setup is genuinely 5 minutes if your IT team allows Slack apps.

4. Plan mode before you commit

Plan mode is a deliberate pre-flight surface. You toggle it on (type /plan-mode in the composer), give Codex a task, and Codex outputs a step-by-step approach, what files it will read, what tools it will call, what the output will look like, what it might fail on, before any work runs. You read it. You edit it. You approve, modify, or reject. Toggle Plan mode off when you're ready to execute.

This sounds like a small affordance and it isn't. For procurement work specifically, where the cost of a misunderstanding can be a wasted afternoon at best and a misfiled deliverable at worst, Plan mode is the single most underrated feature in the product.

Concrete use cases:

Anything irreversible. "Move every contract in /03_contracts/ older than 5 years into /99_archive/", never run this without a plan in front of you first.
Anything expensive. Before kicking off a parallel cloud job that will run on 12 suppliers, see the plan and confirm Codex understood the rubric you handed it.
Anything new. The first time you ask Codex to do a workflow you haven't tested, Plan mode is the cheap way to find out whether Codex understood the brief.

5. Image in, action out

You can attach images directly in any Codex surface, screenshots, scanned documents, photos of whiteboards, charts, diagrams, supplier portal exports that arrive as PNG. Codex reads the image and acts on the contents. This was rolled out across the Codex app, IDE, and CLI through Q1 2026 and is now table stakes inside the product.

For procurement, this closes a workflow gap that has been frustrating since GenAI showed up: the supplier portal with no export button.

Concrete use cases:

Quote extraction. A supplier sends a price list as a screenshot from their portal that doesn't have a CSV export. Drop the image into Codex. "Extract every line item and load into a clean Sheet in /02_suppliers/Acme/quotes/."
Scanned invoices. Old contracts and historical invoices arrive as PDF scans. "Read this scanned invoice. Pull supplier name, invoice number, line items, dates. Output as structured Sheet rows."
Whiteboard captures. You photographed the negotiation framework your team built in a whiteboard session. "Read this whiteboard photo and turn it into a clean negotiation playbook doc."
Chart-to-analysis. A consultant sent a Pareto chart from a PDF deck. "Read this Pareto chart. Tell me which 3 categories drive 70% of the spend shown, and what you'd recommend doing about each."
Portal screenshots. A supplier dashboard you can't export from. "Read this screenshot from the Acme supplier portal. Pull every PO listed and reconcile with our open POs in /02_suppliers/Acme/POs.csv."

This is the capability that quietly does the most work day-to-day. Half the procurement screens in the world are not exportable; Codex closes that gap.

Where these five strengths add up

None of these capabilities is a single killer feature. The combined effect is. The procurement professionals we've worked with who get the most out of Codex use all five, they plan in the desktop app then run long jobs in Cloud, they queue parallel cloud jobs for multi-supplier work, they delegate from Slack, and they drop images at Codex when the workflow involves a screen with no export button. The people who use Codex like a fancier ChatGPT don't see the leverage and (rightly) wonder why they're paying for it.

If you take one thing from this section: when you sit down for your weekly hour with Codex, ask yourself, "What's the version of this that I could run autonomously in Cloud overnight, parallelize across all my suppliers, or delegate from Slack?" That single reframe is where the value is.

Building Your First Skill

Step-by-step using the Supplier Research Brief as the starter case

Codex skills are reusable, named workflows. Instead of re-explaining a complex task from scratch every time, you package the instructions, context, and tool connections once, give the skill a name, and invoke it with a single command. Skills became a first-class Codex concept alongside the desktop app GA in early 2026.

The structure is intentionally simple. A skill is a markdown file that lives in your project folder under /05_skills/. It contains a name, a trigger phrase, a description, a set of inputs, a sequence of steps, and an output specification. Codex reads the file when you invoke the trigger and follows it.

We start every team we work with on the same first skill, for the same reason: it pays for the time invested faster than anything else, and it teaches the muscle of writing a skill that someone other than you can use.

The six-step skill-building process

Step 1. Define the trigger phrase.

The phrase you'll type to invoke the skill. Make it specific enough that Codex won't trigger it by accident, and natural enough that you'll remember it under pressure.

Good: "Run a supplier research brief on [Supplier Name]."

Bad: "research" (too generic, Codex won't know whether you mean a supplier brief, a category brief, or a market brief).

Step 2. Write the goal in 2–3 sentences.

What success looks like, in plain English. This is the part Codex consults when something goes off the rails.

For the Supplier Research Brief: "Produce a one-page executive brief on a single supplier covering financial health, ownership, key personnel, recent press, current relationship with us, and a preliminary fitness assessment for the category in question."

Step 3. List the inputs Codex needs.

The variables the skill takes when invoked. Be explicit. If you assume Codex will know, it will guess, and the guess will be wrong about 20% of the time.

For this skill: - Supplier name (required) - Category (required) - Last contract date with us (optional, Codex should look it up in /03_contracts/ if not provided)

Step 4. Describe the steps in plain English.

The sequence Codex will follow. Don't write code. Write the steps the way you'd write them for a new hire on their first day.

For the Supplier Research Brief: 1. Search the web for the supplier's official website, headquarters, ownership structure, parent company if any. 2. Search for the most recent (last 12 months) press coverage. Prioritize financial press over marketing press. 3. Search D-U-N-S or equivalent for financial stability indicators. Flag anything unusual. 4. Look in /03_contracts/[Supplier Name]/ for our historical contracts. If found, extract last contract value, renewal status, and any contractual flags. 5. Look in /02_suppliers/[Supplier Name]/ for prior research, scorecards, or notes. If found, summarize them in 2 sentences. 6. Draft the brief in the standard format (see step 5).

Step 5. Define the output format.

The structure of what Codex produces. Be precise. Codex will infer if you don't specify, and the inference is usually 80% of what you wanted.

For this skill, the output is a one-page Google Doc with seven sections: 1. Supplier at a glance (3 lines) 2. Ownership and structure 3. Financial signals 4. Recent press (last 12 months) 5. Our relationship and history 6. Preliminary fitness for [Category] 7. Open questions to validate

Save it to /02_suppliers/[Supplier Name]/[YYYY-MM-DD]_Research_Brief_v1.gdoc.

Step 6. Save it. Test it. Iterate.

Save the skill file as /05_skills/supplier_research_brief.md. Invoke it on a real supplier you know well. Read the output critically. The first version will be 70–80% right. Edit the skill file to address the gaps. Re-run on a different supplier. By the third iteration, the skill output will reliably match what you'd write yourself, faster.

The skill file

Here is the full skill file for the Supplier Research Brief. Copy it into /05_skills/supplier_research_brief.md and edit the bracketed placeholders.

# Supplier Research Brief

## Trigger
Run a supplier research brief on [Supplier Name].

## Goal
Produce a one-page executive brief on a single supplier covering financial
health, ownership, key personnel, recent press, current relationship with
us, and a preliminary fitness assessment for the category in question.

## Inputs
- Supplier name (required)
- Category (required, default: ask the user)
- Last contract date with us (optional, look it up in /03_contracts/ if
  not provided)

## Steps
1. Search the web for the supplier's official website, headquarters,
   ownership structure, parent company if any.
2. Search for the most recent (last 12 months) press coverage. Prioritize
   financial and operational press over marketing press.
3. Check D-U-N-S or equivalent for financial stability indicators. Flag
   anything unusual (administration, distressed-debt mentions, recent
   leadership turnover at C-level).
4. Look in /03_contracts/[Supplier Name]/ for historical contracts. If
   found, extract last contract value, renewal status, and any
   contractual flags.
5. Look in /02_suppliers/[Supplier Name]/ for prior research, scorecards,
   or notes. If found, summarize them in 2 sentences.
6. Draft the brief in the format below.

## Output
A one-page Google Doc saved to:
/02_suppliers/[Supplier Name]/[YYYY-MM-DD]_Research_Brief_v1.gdoc

Format:
### 1. Supplier at a glance
3 lines: HQ, ownership, size (employees or revenue), category fit.

### 2. Ownership and structure
Parent, subsidiaries relevant to us, any recent M&A activity.

### 3. Financial signals
Stability indicators, anything flagged in step 3. Plain English only.
No financial jargon unless it changes the conclusion.

### 4. Recent press (last 12 months)
Up to 5 bullet points, each one sentence, dated.

### 5. Our relationship and history
2–4 sentences. Last contract, current status, notable history.

### 6. Preliminary fitness for [Category]
1 paragraph. Honest assessment.

### 7. Open questions to validate
3–5 specific questions a sourcing lead should answer before progressing.

Save the file. Open Codex. Type:

Watch what happens. You'll see Codex confirm the inputs, run the steps, and produce the output. Read it carefully. Identify two things you'd change. Edit the skill file. Re-run.

By the time you've done this three times, you have a skill your whole category team can use, and you have the muscle of skill-writing. The rest of the playbook gets meaningfully easier from this point.

The Procurement Skill Library

15 skills covering the procurement workflow, 8 ship in our free starter pack today, 7 are on the roadmap

This section is the full map of what we think Codex should do for a procurement function. It's also our honest inventory of what we've built and what's coming. Eight skills ship in the free starter pack at the bottom of this playbook, you can be running them inside an hour. The other seven are on our roadmap; we'll publish them in order of which ones our readers tell us they want most.

Every skill below has a single status tag:

✅ In the starter pack, we've built it, you can download it today.
🟡 On the roadmap, we're building it; vote for which we ship first by emailing us with the skill names you want.
🔴 Blocked, waiting on Codex, the platform doesn't yet have the capability to do this well, so we're not building it yet.

01. Supplier Research Brief, ✅ In the starter pack. Produces a one-page executive brief on a single supplier. Web search + Google Drive plugin. The skill we used as the worked example in Section 5. Trigger: "Run a supplier research brief on [Supplier Name]."

02. Spend Analysis Starter, 🟡 On the roadmap. Refreshes a category spend cube from a GL extract. Codex works with Google Sheets natively today; Excel-based spend cubes need a one-step conversion. We'll ship the skill file with both paths handled. Trigger: "Refresh the spend cube for [Category] using the latest GL extract in /00_inbox/."

03. RFP Draft Generator, ✅ In the starter pack. Drafts a full RFP document, scope, requirements, evaluation criteria, timeline, from a category brief and a list of requirements. Saves to Google Docs. Trigger: "Draft an RFP for [Category] using /01_categories/[Category]/brief.gdoc and the requirements in /04_rfps/[Project]/requirements.gdoc."

04. RFP Response Evaluator, ✅ In the starter pack. Reads supplier responses from a folder and scores each against the rubric you defined. Particularly strong as a parallel cloud job, score 6 supplier responses in parallel, get a comparison matrix in 20 minutes. Trigger: "Evaluate the supplier responses in /04_rfps/[Project]/responses/ against the rubric in /04_rfps/[Project]/rubric.gdoc."

05. Supplier Scorecard, ✅ In the starter pack. Refreshes a quarterly scorecard for a strategic supplier. Pulls KPIs from your tracking Sheet, recent press from web search, and prior scorecards for trend. Trigger: "Refresh the supplier scorecard for [Supplier] for [Quarter]."

06. Category Strategy Brief, ✅ In the starter pack. Builds a 3–5 page category strategy doc: market overview, supplier landscape, our position, key risks, savings opportunities, recommended actions. Trigger: "Build a category strategy brief for [Category] for the [Year] cycle."

07. Contract Renewal Tracker, ✅ In the starter pack. Identifies contracts approaching renewal and produces a prioritized action list. Used as the recurring-automation worked example in Section 7. No DocuSign connector today, Codex reads contracts stored in your project folder; if your contracts live in DocuSign, you'll add a monthly export step. Trigger: "Run the contract renewal tracker."

08. Negotiation Playbook Generator, ✅ In the starter pack. Builds a structured negotiation playbook for an upcoming supplier negotiation: BATNA analysis, lever sequencing, counter-proposal language. Especially useful as a Plan-mode-then-Cloud pair: scope the playbook with /plan-mode in the desktop app, then hand the refined objective to Codex Cloud and let it draft the full thing autonomously overnight. Trigger: "Generate a negotiation playbook for the upcoming [Supplier] renewal."

09. Savings Tracker, 🟡 On the roadmap. Maintains the year-to-date savings register against committed targets. Works against Google Sheets natively; Excel registers need a conversion step. Trigger: "Update the savings tracker with this month's closed initiatives in /00_inbox/."

10. Supplier News Monitor, ✅ In the starter pack. Daily news digest on a watchlist of suppliers. Best run as a scheduled automation (Section 7). Web search + Google Docs output + Slack summary. Trigger: "Run the supplier news monitor for today."

11. Tail Spend Review, 🟡 On the roadmap. Identifies tail spend opportunities, suppliers under a threshold who could be consolidated, paid faster, or rationalized. Same Google Sheets / Excel path as the spend cube refresh. Trigger: "Run a tail spend review for [Category]."

12. Vendor Onboarding Checklist, 🟡 On the roadmap. Generates the standard onboarding checklist for a new supplier, required docs, internal approvals, system setup steps, first-90-days touchpoints. Trigger: "Build the vendor onboarding checklist for [New Supplier]."

13. QBR Deck Generator, 🟡 On the roadmap. First-draft QBR deck in Google Slides, your standard sections, populated with the previous quarter's data. The output is structure, not polish. You'll spend the saved time on the polish. Trigger: "Build the Q[X] QBR deck for [Audience]."

14. Maverick Spend Detector, 🔴 Blocked, waiting on Codex. Detects off-contract spend against contracted suppliers. Requires a live connector to your ERP (SAP, Oracle, Coupa, Ivalua). No first-party Codex ERP connectors today, and the CSV-export workaround loses the live detection that makes this skill valuable. We'll publish the skill when Codex ships a first-party ERP connector. For now, run this in Cowork via SAP Ariba's CData connector if maverick spend matters to your function. Trigger: "Scan the latest GL extract for maverick spend against [Category] contracts."

15. Supplier Risk Heatmap, 🟡 On the roadmap. Quarterly risk heatmap across the supplier portfolio. Works for the data Codex can see; live integrations to third-party risk feeds (Dun & Bradstreet, RapidRatings, Riskmethods) require CSV export today. Trigger: "Refresh the supplier risk heatmap for [Quarter]."

💡

🟢 Get the starter pack today. Eight skills above are ready right now, the AGENTS.md template from Section 2, plus Codex-native skill files for #01 Supplier Research Brief, #03 RFP Draft Generator, #04 RFP Response Evaluator, #05 Supplier Scorecard, #06 Category Strategy Brief, #07 Contract Renewal Tracker, #08 Negotiation Playbook Generator, and #10 Supplier News Monitor. The pack also includes a folder structure and a governance checklist. Drop the files into your Codex project folder and you have a working library in under 10 minutes. [Download link at the bottom of this playbook.]

🟡 Vote on what we ship next from the roadmap. Six skills are on our build queue: Spend Analysis Starter (#02), Savings Tracker (#09), Tail Spend Review (#11), Vendor Onboarding Checklist (#12), QBR Deck Generator (#13), and Supplier Risk Heatmap (#15). We publish them in order of reader interest. Email us at hello@moleculeone.ai with the skill names you want first, we ship the most-requested skill within two weeks of the request count crossing 10.

Scheduling and Automation

What to delegate to Codex Automations and Codex Cloud, and what to keep human in the loop

Codex gives you two related-but-different ways to remove yourself from a recurring task. The first is Codex Automations, scheduled tasks that run on a cadence you set, the way Cowork's scheduled tasks do. The second is Codex Cloud, one-off (or repeating) jobs that run in OpenAI's sandboxes rather than on your machine. The first is for "every Monday at 8am do this." The second is for "kick this off now, I don't want to wait, and I don't want my laptop tied up."

The two together cover almost every recurring procurement workflow worth automating. They also expose you to roughly twice as many ways to do something stupid quickly. Section 11 covers the credit budgeting; this section covers the rules of what to automate and what not to.

What is safe to automate, and what isn't

The rule we use, written on the inside of our heads: if the action can't be undone in 10 minutes by a human, a human approves it before it runs.

Apply that rule and most of the categorization becomes obvious.

Safe to automate:

Morning supplier news digest. Daily web scan against your supplier watchlist. Output a Google Doc, post a Slack summary. Worst case it's wrong: a missed news item, recoverable in 10 minutes.
Weekly spend variance flag. Pull last week's spend, compare to prior weeks, flag anything over your threshold. Output a Sheet. Worst case: a false positive, takes minutes to dismiss.
Contract renewal radar. Monthly scan of /03_contracts/ for anything renewing in the next 90 days. Output a list. Worst case: a missed renewal, but you'd catch it in next month's scan or in your manual reviews.
Quarterly supplier scorecard refresh. Run skill 05 against every strategic supplier. Output Sheets and Docs. Worst case: a stale data point, fixable in the manual review pass.
Daily inbox triage. Read /00_inbox/, classify each file by category, move to the right folder. Worst case: a misfiled doc, two minutes to fix.

Keep human in the loop:

Supplier termination decisions. Codex can produce the analysis. A human signs.
Contract signing. Always.
RFP award notifications. The output of Codex's evaluation is input to a decision, not the decision.
Escalations to legal or to executives. Codex can draft. A human sends.
Anything triggering payment. Always, even if it's "just" approving a vendor for first payment.

Your first scheduled automation: Contract Renewal Tracker

We use this skill as the worked example for two reasons. It's high-value (missed renewals are how procurement teams lose 5–10% of their year), and it's low-risk (the output is a list, not an action). The full skill file ships in the free starter pack (Section 6, skill #07), you don't have to write it from scratch, but it's worth reading through here so you understand how a skill file is structured before you customize it.

Build the skill first as a one-shot. Once it's working, schedule it monthly.

The skill file:

# Contract Renewal Tracker

## Trigger
Run the contract renewal tracker.

## Goal
Identify every contract in /03_contracts/ that will renew or expire in
the next 90 days. Produce a prioritized action list grouped by urgency
and impact, and post a summary to the procurement Slack channel.

## Inputs
- Today's date (Codex provides automatically)
- Strategic supplier list (from /06_templates/strategic_suppliers.md)

## Steps
1. Walk /03_contracts/. For each contract doc, extract:
   - Supplier name
   - Contract value (annual)
   - Renewal date or expiry date
   - Auto-renewal clause yes/no
   - Notice period required
2. Filter to anything where the renewal/expiry date is within 90 days
   of today.
3. Tag each row:
   - Strategic supplier? (yes/no)
   - Value tier: High (>$500K), Medium ($100K-$500K), Low (<$100K)
   - Action urgency: Critical (auto-renews in <30 days), High (<60),
     Medium (<90)
4. Build a Google Sheet at:
   /03_contracts/_radar/[YYYY-MM-DD]_renewal_radar.gsheet
5. Write a 1-paragraph summary calling out the top 3 critical
   renewals by name.
6. Post the summary to #procurement-team in Slack with a link to
   the Sheet.

## Output
- Sheet at /03_contracts/_radar/[YYYY-MM-DD]_renewal_radar.gsheet
- Slack post in #procurement-team

Save the file. Run it once manually to confirm it works. Then schedule it.

To schedule, in the Codex desktop app, open the Automations page in the sidebar. Click "New automation." Pick the skill. Set the cadence, for this one, "1st of every month at 8am." Save. You're done.

When to use Codex Cloud (vs. desktop) for one-off automations

Codex Cloud is built for tasks that meet one or more of these conditions:

They take longer than 30 minutes.
They run in parallel across multiple inputs (12 suppliers, 8 categories).
You don't want them tying up your machine.
They run during hours you're not at your desk.

Everything else, run in the desktop app, where you can watch the work happen and intervene.

Monthly review cadence

Once a month, audit your automations. Three questions:

Did every automation actually run? (Failures happen, connectors expire, plugin auths lapse.)
Was the output actually used? (An automation no one reads should be killed.)
Has anything changed in your workflow that the automation should reflect? (New strategic supplier, new threshold, new format.)

The teams we've worked with average 8–12 active automations after the first 90 days. Three or four of those will get killed in any given month and replaced. That churn is healthy.

Connecting Your Tools

Practical integration notes, what's solid, what needs workarounds, what's missing

This is the section where we are most candid about Codex's procurement-readiness today. Some connectors are excellent. Some don't exist. The honest map matters, because the wrong assumption here ("of course Codex talks to SAP, every AI tool does") is what causes failed rollouts.

We grade each integration the same way we graded skills in Section 6.

Google Workspace, Solid

The Google Drive plugin, announced as a first-party Codex plugin in 2026, is the foundation of using Codex for procurement work. It gives Codex access to Google Docs, Sheets, Slides, Drive folders, and (with a separate plugin) Gmail and Calendar.

Setup is two clicks: in the Codex desktop app, open Plugins → Google Drive → Connect → authorize the OAuth scope. The plugin asks you whether to grant access to your entire Drive or a single shared folder. The right answer for procurement work is "single shared folder", point it at the Google Drive equivalent of ~/CodexProcurement and nothing else.

Example workflows that work as advertised: - Read every doc in a Drive folder and synthesize a brief. - Build a new Google Doc from a structured prompt. - Pull data from a Google Sheet into an analysis. - Update an existing Sheet with new rows or recalculated cells. - Build a Slides deck from a template (see Section 10).

The plugin is the single most polished connector in the product. If you live in Google Workspace, you'll spend most of your Codex time inside it.

Microsoft 365, Gap

There is no first-party Codex plugin for Outlook, SharePoint, OneDrive, or Teams as of this playbook's publication date. This is the biggest gap in the product for procurement teams in Microsoft-default organizations.

Workarounds, in order of preference:

Mirror to Google Drive. If your organization allows it, sync a subset of relevant files from OneDrive or SharePoint into a Google Drive folder Codex can read. Awkward but works.
In-app browser. Codex can navigate Outlook Web and SharePoint inside its in-app browser. It works but is slow, and it doesn't scale to high-volume workflows.
Cowork. If Microsoft 365 is the center of your team's workflow, Cowork is the better tool for those specific workflows. Run both. Section 16.

DocuSign, Gap

No first-party plugin. Workarounds:

Draft in Codex, sign in DocuSign. The most common pattern. Codex produces the contract draft. A human pushes it through DocuSign for signature.
Export contract data from DocuSign. Monthly CSV exports from DocuSign loaded into a Codex-readable folder. Codex can analyze. Not real-time.
Cowork. Cowork has a DocuSign connector. If contract signing is central to your daily flow, that's where to run those workflows.

SAP Ariba and SAP Concur, Gap

No first-party plugin. No reliable community MCP server we'd recommend at the time of publication. Workarounds:

CSV export. Manual or scheduled CSV exports of GL data, contract data, supplier master data, into a Codex-readable folder. Codex runs against the exports. Loses freshness but recovers most of the analytical value.
Cowork. Cowork has CData-mediated connectors to both Ariba and Concur today. For spend cube, supplier master, and PO data, Cowork is the better surface.

ServiceNow and Microsoft Dynamics, Gap

No first-party plugins. Treat the same as SAP: CSV export workaround, or use Cowork.

Slack, Solid

The first-party Codex Slack app is the second-most polished connector in the product, after Google Drive. Setup is fast (5 minutes if your IT team allows Slack apps), and the procurement use cases are larger than they look. We covered the headline use cases in Section 4; here are the procurement-specific patterns we use day-to-day.

Channel-level digest. @Codex once a week, read this channel and produce a summary of the open issues, decisions made, and outstanding asks. Post on Mondays at 9am.
Vendor research from the meeting. You're on a call. Someone mentions a supplier name. @Codex who is [supplier]? Single paragraph, financial stability angle.
Document delegation. @Codex draft a follow-up email to Acme based on the last 4 exchanges in /02_suppliers/Acme/correspondence. Tone: firm but constructive. Don't send it, paste the draft here.
Status pulls. @Codex what's the status of the MRO RFP? Read /04_rfps/MRO_2026/_status.md and summarize.

A useful detail: Codex reads earlier messages in the Slack thread automatically. You rarely need to restate context. If your team's been arguing about a supplier for three days in a thread, you can drop @Codex pull this together and recommend a position into the thread and get something useful back.

Codex in-app browser, Solid for narrow use cases

Codex can drive a Chromium browser inside the desktop app. The capability landed in the April 2026 desktop app update. It's useful when there's no API, no plugin, and no export, which is most supplier portals.

What it's good at: - Logging into a supplier portal with credentials you provide. - Navigating menus, downloading reports, reading line items off a page. - Filling forms (supplier registration questionnaires, RFP submission portals from the supplier side).

What it's bad at: - Anything behind MFA that requires a phone tap. Codex can't tap your phone. - High-velocity workflows. The browser is much slower than an API call. - Anything where the supplier portal's UI changes weekly. Codex can adapt, but you'll spend time re-confirming.

Five good use cases for the in-app browser in procurement: - Pull current prices from a supplier portal that has no API. - Submit a routine supplier inquiry through a portal questionnaire. - Download monthly statements from a supplier portal. - Scrape a public market index page that updates daily. - Check a public regulatory filing site for a supplier's most recent disclosures.

Five places to proceed carefully: - Any portal requiring MFA per session. - Any portal where logging in commits your team to terms of service Codex may not have read. - Any portal that locks accounts after too many login attempts. - Any portal containing PII you haven't classified. - Any portal that changes UI weekly and breaks your workflow constantly.

GitHub, Optional, surface-dependent

This is the connector most people get wrong on day one, so we want to be explicit. GitHub is not required to use Codex. The desktop app, the IDE extension, and the CLI all work happily against a local folder using what OpenAI calls a "local environment", Codex stores its config in a .codex/ subfolder inside the folder you choose, and your files never leave your machine unless you point Codex at a cloud drive.

GitHub becomes required only for these specific surfaces:

Codex Cloud (the web surface at chatgpt.com/codex), cloud sandboxes need a repository to clone. For procurement use, this "repository" is typically a private placeholder; you won't browse or edit it.
Codex in Slack, @Codex mentions create Cloud tasks, so the requirement inherits from Cloud.
The native GitHub code-review integration, obviously, since the workflow is reading PRs.

For a procurement leader who runs everything in the desktop app against a local folder synced to Google Drive, GitHub is genuinely optional. Skip it on day one. Connect it later if you decide you want overnight Codex Cloud runs or @Codex delegations from Slack.

If you do connect it, setup is one-time: create or use an existing GitHub account, authorize Codex to create or connect to private repos, done. If your IT team has policies about what can be stored on GitHub, check with them before flipping the Cloud or Slack switch.

What we wish existed and don't expect soon

Things we expect to land in 2026 but haven't at publication: a first-party Microsoft 365 plugin, a DocuSign plugin, an SAP Ariba plugin. If any of these are critical to your day-to-day, the right answer today is Cowork for those workflows and Codex for everything else. We hate the dual-tool answer as much as anyone but the gap is real and pretending otherwise causes rollouts to stall.

Codex for Procurement Data Work

Spend analysis, scorecards, contract data, price variance, using Google Sheets and the workarounds for everyone else

Cowork's procurement playbook had a section called "Claude in Excel for Procurement Data Work." This section is the Codex equivalent. The honest framing: Codex does not have a native Excel add-in. The substitute is Google Sheets via the Drive plugin, plus Codex's ability to run lightweight data processing scripts in the background, invisibly to you, when working with CSV exports.

For teams already on Google Workspace, this section will feel like a clean win. For teams on Excel, it will feel like an awkward set of tradeoffs you should weigh against just using Cowork for spend-heavy work.

Use cases and time savings

Use case	What to say to Codex	Time saved
Spend cube refresh from CSV	Take last quarter's GL extract in /00_inbox/Q2_GL.csv. Classify by category using the taxonomy in /06_templates/categories.md. Flag variances >15% QoQ. Output to a new Google Sheet at /01_categories/_cube/Q2_2026.gsheet.	~4 hours → ~5 minutes
Supplier scorecard refresh	Pull the 8 KPIs from /02_suppliers/[Supplier]/kpis.gsheet, /02_suppliers/[Supplier]/contracts.gsheet, and /02_suppliers/[Supplier]/incidents.gsheet. Build a scorecard tab for the supplier. Compare to last quarter's scorecard in the same file.	~2 hours → ~8 minutes
Contract data cleanup	Read all DocuSign-export PDFs in /03_contracts/_exports/2026Q2/. Extract renewal date, value, owner, payment terms, notice period. Output as structured rows in a new Sheet. Flag any contract where you couldn't find one of these fields.	~6 hours → ~12 minutes
Price variance scan	Compare PO prices in /00_inbox/POs_Q2.csv to the contract rate in /03_contracts/[Supplier]/master_rates.gsheet. Flag anything off by >5%. Output a Sheet with PO number, contract rate, actual rate, variance %, root cause hypothesis.	~3 hours → ~4 minutes

These are real time savings from real procurement teams. Caveats: the first time you run any of these on your data, you'll spend additional time tuning. Budget an hour. By the third run, it's exactly the time savings above.

The taxonomy trick

Most of the value in spend analysis comes from a good category taxonomy, and most of the time wasted comes from a bad one. Codex doesn't know your taxonomy unless you give it to it.

The pattern that works: maintain a single file at /06_templates/categories.md that lists your categories with examples and rules. Every spend skill references this file. When categorization gets stale, you update one file, not 12 skills.

Example structure of that file:

# Category Taxonomy

## L1 categories

### Indirect / MRO
Includes: industrial supplies, safety equipment, janitorial, packaging,
small tools, lab consumables (non-clinical).
Excludes: capital equipment, fleet maintenance (separate category).

### Indirect / IT
Includes: software licenses, cloud services, hardware refresh, telco,
managed services.
Excludes: capital IT projects > $500K (separate review).

### Direct / Raw materials
Includes: [your materials list]...

[continue for full taxonomy]

## Edge cases, always ask
- Software in IT vs. software in R&D
- Marketing services vs. consulting services
- Travel-related spend on corporate cards

Codex consulting this file before classification produces meaningfully better cube refreshes than Codex inferring from category names alone.

Codex vs. Cowork for spend work, an honest call

If your spend cube is in Google Sheets, Codex is competitive. If it's in Excel, Cowork is the better tool today. The Codex workaround, exporting Excel as CSV, processing, exporting back, works, but it loses the formatting, the named ranges, and any pivot tables you've built. For real Excel-native spend work, Cowork's add-in is meaningfully better.

This is a place where we routinely tell teams to either move the spend cube to Google Sheets (some can, some can't) or run their data work in Cowork while the rest of their procurement work runs in Codex. Section 16 covers the operating model.

Codex for Slides and Document Work

QBR decks, category strategy briefs, savings summaries, via Google Slides and Docs

Cowork's playbook had a section on "Claude in PowerPoint." Codex's analogous capability is Google Slides via the Drive plugin. Same caveat as the data section: Codex doesn't have a native PowerPoint integration. If your CFO requires PPTX, you'll export from Slides, and Section 10 below covers what gets lost in that export.

Quarterly business review decks

The QBR deck is the canonical procurement-meets-Slides workflow. You build the same 12–15 slide structure every quarter. The data changes. The narrative shifts. The skeleton is the same.

Skill prompt:

What you get: a 12–15 slide draft with your standard sections populated. Spend trend, top suppliers, savings progress, risk heatmap, next quarter priorities.

What you do next: spend an hour refining the narrative, especially the executive summary and the priorities for next quarter. The data is right. The story still needs you.

Time saved: ~2 hours → 20 minutes for the first draft. The polish is where you earn your title.

Category strategy presentations

Different rhythm, these come up once a year for each strategic category, not quarterly. Bigger lift per instance. Codex earns its keep here especially because the upstream document, the category strategy brief from skill 06, already exists.

Skill prompt:

The draft you get will be tighter than the QBR draft because the source document is more structured. Refinement time: 1 hour, mostly on the recommendations slides and the financial sensitivities.

Savings summary slides for finance

The 3-slide artifact your finance partner needs at month-close. Same structure every month.

Skill prompt:

Time saved: ~45 minutes → 3 minutes. This is the one finance teams adopt first because it's small, repeatable, and identical every month.

The PowerPoint gap, and what to do about it

Codex's Slides integration is solid. The gap is for teams whose final delivery format is PowerPoint, almost every finance organization in the Fortune 1000.

The workaround: produce in Slides, export to PPTX. The export is one click in Google Slides. The output is usable. What you lose: master-slide fidelity, embedded font behavior, and any animation. For static decks where the final output is "send the PDF" anyway, the workaround is invisible. For decks where someone will live-edit in PowerPoint after handoff, you'll notice the formatting drift.

If PowerPoint-native authoring matters to your function, this is one of the places we'd point you to Cowork's PowerPoint integration instead of Codex's Slides workaround. Section 16 covers the dual-tool case.

Managing Credits and Costs Wisely

How not to burn your Codex usage pool on the wrong tasks

OpenAI moved Codex pricing to token-based billing on April 2, 2026. The mechanics for budgeting changed with it. The headline implication: a heavy day on Codex can consume what would have been a week of light use under the old per-message model, and the surface that consumes the most tokens (long Cloud runs and the parallel cloud sandboxes that produce most of Codex's procurement value) is also the surface where a misconfigured run can run for hours before you notice.

This section is the section that pays for itself.

What costs what

Rough relative cost of common procurement Codex actions. "Credit intensity" is the relative draw on your token budget per action; "cumulative risk" is whether the cost compounds when you run the action repeatedly or in parallel.

Action	Credit intensity	Cumulative risk
Read a single Google Doc, summarize	Low	Low
Read a Sheet, compute a few metrics	Low	Low
Web research with 5+ sources	Medium	Low
Draft a 1-page brief from inputs	Medium	Low
Run a skill that touches 10–20 files	Medium-High	Medium (when scheduled daily)
`/plan-mode` on a complex task	Low	Low
Cloud run, 30–60 minutes	High	Medium
Cloud run, 2+ hours	Very high	High
Codex Cloud parallel job (5+ sandboxes)	Very high per batch	High
Scheduled daily skill	Cumulative, budget separately	Critical

Five rules for credit budgeting

Rule 1: /plan-mode in the desktop app before any Cloud run longer than 30 minutes.

This is the single highest-leverage rule in the section. The cost of a Plan-mode pass is negligible. The cost of a misunderstood Cloud run is your monthly credit allowance. We've never met a procurement team that regretted over-planning. We've met several that wished they'd planned the night they fired off a 6-hour Cloud run at midnight against the wrong rubric.

Rule 2: Cap parallel Cloud tasks at 3 unless you've checked your usage dashboard.

Parallel cloud sandboxes are powerful and expensive. Three parallel tasks is a reasonable cap for everyday work. If you need 12 in parallel, do it deliberately, check the dashboard first, set a budget alert, and accept that the run will eat into your monthly pool.

Rule 3: Use GPT-5.4 for boilerplate. Reserve GPT-5.5 for reasoning-heavy work.

Codex lets you pick the model from the model dropdown in the composer (just above the prompt input, there's no slash command for it in the desktop app). GPT-5.5 is the recommended default and worth it for category strategy briefs, RFP evaluations, complex syntheses. For routine work, daily news digests, simple scorecards, file moves, format conversions, GPT-5.4 is cheaper and produces results that are equivalent within the noise. Stating a model preference in your AGENTS.md doesn't actually change the model selected, but it does tell Codex how to behave when you're on a less-capable one (e.g., "ask before tackling complex synthesis on GPT-5.4").

Rule 4: Schedule automations off-peak if you're on a pooled team plan.

Token throughput on shared plans can throttle during peak hours. Schedule your daily and weekly automations to fire between 1am and 6am local time, when nobody else is hitting the same pool. The wall-clock time of each automation drops; cost stays the same; humans are less likely to be confused by a contradicting parallel run.

Rule 5: Audit usage monthly.

The dashboard at chatgpt.com (under your plan settings) shows usage by skill, by automation, by user. Look at it on the first of every month. The patterns we see most often: one rogue automation consuming 40% of the team's monthly budget, an experimental skill someone built and forgot to retire, or a Cloud run that ran for 6 hours and produced nothing because it hit an authentication loop and didn't give up.

Rolling This Out to a Procurement Team

Admin setup, governance, the shared workspace, and the three real barriers to adoption

Getting Codex onto your own desk is easy. Getting it onto the desks of seven of your colleagues and having them all use it productively is the hard part, the part where every AI rollout you've watched has stalled. This section is the operating playbook for that part.

Plugin marketplace and admin controls

ChatGPT Business and Enterprise plans give you an admin console for Codex. The admin console is where you control which plugins are available to your team, which can be auto-installed, which require user approval, and which are blocked. It's also where you manage Codex seats, and a quiet detail worth surfacing: as of April 2, 2026, Business and Enterprise workspaces can add Codex-only seats with no fixed seat fee (usage is billed pay-as-you-go on tokens). This lowers the cost of giving someone access to try Codex without committing to a full ChatGPT seat for them (official announcement).

For procurement specifically, the plugin governance default we recommend:

Plugin	Setting	Why
Google Drive	Auto-install	Required for almost all procurement work.
Slack	Auto-install	Delegation surface.
Web Search	Auto-install	Built-in, required.
GitHub	Available, not auto-install	Skip unless your team will use Codex Cloud, `@Codex` in Slack, or the GitHub code-review integration. The desktop app works fine against a local folder without it.
In-app Browser	Available	High-utility but high-permission; train before unlocking.
Computer Use	Available, individually approved (macOS only)	High-trust capability, narrow procurement use cases.

The governance flow

A four-step model for keeping a team rollout in control without strangling adoption. Mirror this to whatever cadence your function runs:

1. Codex Admin defines access and plugin approvals. A single Codex Admin in your team (a procurement ops lead, typically, not IT, not the CPO, someone close enough to the work to make sensible calls) decides who gets seats, which plugins are approved, which model is the default. The Codex Admin role is separate from Workspace Owner, keep it deliberately small.

2. Skills and automations are reviewed before being shared. Anyone on the team can build a skill for themselves. Before a skill is shared (via a team plugin, via a shared Drive folder, or even informally) someone other than the author reads it. The review takes 5 minutes. It catches roughly 1 in 4 skills that have a hidden assumption baked into them that won't hold for someone else's data.

3. AGENTS.md is version-controlled at the team level. A team-wide AGENTS.md lives in a shared location (a shared Drive folder, or the team's GitHub repo). Individual users keep personal AGENTS.md additions in their local global file. Team rules update once and everyone gets them.

4. Usage and risk events flow into your governance stack. On Enterprise, Codex supports OpenTelemetry log export for user prompts, tool approval decisions, MCP server usage, and network events. Configure the export at rollout, not later. The OTel export is the single most useful audit lever in the product and most teams forget it exists.

Shared workspace folder structure

The structure we recommend for a team of 5–10 procurement people sharing a Codex workspace via Google Drive:

/CodexProcurement_Team/
  /AGENTS.md                    ← team-wide rules
  /00_inbox/                    ← personal scratch (one subfolder per person)
    /sandeep/
    /priya/
    /...
  /01_categories/               ← shared, one per category
    /MRO/
    /IT/
    /Marketing/
    /Logistics/
    /Professional_Services/
  /02_suppliers/                ← shared, one per active supplier
  /03_contracts/                ← controlled access, read for most, write for few
  /04_rfps/                     ← shared, one per active RFP
  /05_skills/                   ← team library of approved skills
  /06_templates/                ← AGENTS-referenced templates
  /99_archive/                  ← shared

A controlled-access folder around /03_contracts/ matters. Most procurement teams want everyone to read contracts but only category owners or the contract manager to modify them. Codex respects the Drive permissions, what it can see is what your sharing settings allow.

Training: teach skills, not prompting

The mistake most teams make in rollout is teaching their colleagues how to write prompts. Prompt engineering is a transient skill. The model and tooling get better and your team has to retrain. The durable skill is recognizing which tasks belong in Codex and building or invoking a skill to do them.

Our training format, which we run as a 90-minute live session with a procurement team: 1. 15 minutes: a working demo of three real skills running. 2. 30 minutes: every attendee builds their own version of the Supplier Research Brief skill (Section 5) using their own data. 3. 30 minutes: every attendee picks a recurring task from their actual week and converts it to a skill. 4. 15 minutes: questions, governance, what not to do.

No prompt engineering is taught. By the end, each attendee has a working skill and a worked example of the skill-building process they can apply to their next workflow.

The three real barriers to adoption

We have run rollouts of Cowork and Codex inside enough procurement teams to know exactly which objections come up, in roughly which order, with roughly which weighting. These are the three that matter.

Fear of headcount loss

The fear is real, and pretending it isn't makes it worse. The honest answer:

Codex makes one analyst significantly more productive. It does not, in 2026, replace an analyst's judgment, their context on internal politics, or their ability to spot the thing the model has no way to know. The teams we've worked with that have used Codex for 6+ months have not reduced headcount. They've reduced backlog. They've reduced lead time. They've increased the complexity of what their existing team can take on, typically by moving analysts up the value curve toward more strategic work.

When you're rolling this out, say this explicitly. Don't argue with people. Show them the outputs.

Data security

The second-most-cited objection, and the one that ages worst when ignored. For Codex specifically:

OpenAI's stated position is that ChatGPT Business and Enterprise data is not used to train OpenAI's models, and Enterprise contracts include the additional commitments around data residency, retention controls, and the Compliance API. Read your contract. Have your security team read your contract.

Practically: the safe pattern is to operate inside a controlled Google Drive folder (the team workspace folder above), to keep PII and bank/payment information out of Codex entirely, and to use the Compliance API on Enterprise for the audit trail your function will eventually be asked to produce.

Codex is not less secure than ChatGPT, but it touches more data. The exposure surface is wider. Treat it accordingly.

Skepticism that a "developer tool" can do procurement work

This is the objection unique to Codex (Cowork doesn't trigger it). It sounds like "I've seen the Codex demos on Twitter, that's not what I do."

The response: that's fair, and the demos are developer-focused, and Codex did start as a developer tool, and you will notice the developer-origin in a few places (the optional GitHub integration, the CLI being prominent in the marketing). But the work this playbook describes is real procurement work, and Codex does it well in the desktop app against a local folder, no GitHub required for any of it. The proof is in the first 30 minutes of Section 2. Have the skeptic do those 5 prompts and see what they think.

If you want help structuring your rollout, configuring the team plugin, or running the training session for your team, email me directly at sk@moleculeone.ai, or use the form at moleculeone.ai/contact if you'd prefer.

What Not to Use Codex For Right Now

Honest guardrails given current product limitations

This is the section that makes the rest of the playbook trustworthy. Every tool has limits. The fastest way to lose your team's trust is to oversell. The fastest way to keep their trust is to draw the lines yourself before they discover them the painful way.

Seven things you should not use Codex for, today, in procurement.

1. Anything legally binding without human signature

Codex can draft a contract, a termination notice, an SLA amendment, a side letter. Codex cannot sign one for you. Even where the in-app browser could mechanically click the DocuSign signature button, you should not have it do that. Drafts are produced by agents. Signatures are produced by humans. Keep the line bright.

2. MFA-protected supplier portals

The in-app browser hits a wall here. Codex can navigate the login screen up to the MFA prompt. Then it stops. Anything behind a supplier portal that requires a phone-tap or a hardware key per session is a manual workflow. Plan around it.

3. Real-time market data feeds

Codex does not stream. If your category requires real-time price updates, energy spot prices, commodity tickers, FX-sensitive raw material rates, Codex is the wrong tool. Use a dedicated data provider and have Codex consume the daily summary.

4. Audit-trail-required workflows without Enterprise

The Compliance API and OpenTelemetry export are Enterprise-only at publication. Business plans have per-user history but no team-level audit. If your function is in financial services, healthcare, defense, or anywhere regulated, the operating principle is: get the Compliance API contracted before you onboard the team. Doing it after is harder than doing it before.

5. Direct ERP write-back

No first-party Codex connectors to SAP, Oracle, Coupa, Ivalua, Jaggaer at publication. Read-only via CSV export works. Write-back doesn't exist. Anything that would mutate your ERP, creating a PO, updating supplier master data, changing payment terms, is a manual workflow. Codex can prepare the data; humans push the button.

6. Org-specific knowledge Codex can't see

Codex knows what you've shown it. It doesn't know your last quarterly review, your CPO's preference for certain phrasings, the supplier issue from 2021 that's still political, or the soft commitment your team made to a board member. The most common failure mode in procurement Codex deployments is treating it as if it knew context it doesn't. The mitigation: a thorough AGENTS.md, project notes for category-specific context, and the discipline to start any tricky session with "Here's what you don't know yet."

7. Long-horizon autonomous Cloud runs on irreversible work

Autonomous Cloud runs (the Goal-mode pattern) are the most powerful capability in the product and the easiest one to get wrong. Pointing one at work that can be undone in 10 minutes is fine; pointing it at work that touches a production system, a live ERP, a contract signature workflow, or anything that triggers an outgoing communication is dangerous. The discipline: /plan-mode in the desktop app first, scope tight, sandbox if available, human approval at the end of the run. Don't let an autonomous Cloud run send anything outbound.

Confidential information

OpenAI's data handling for ChatGPT Business and Enterprise is described in the OpenAI Trust Portal. Read it. Have your security and legal teams read it. The short version, as of publication: data submitted through Business and Enterprise plans is not used to train OpenAI's models by default, and Enterprise customers can configure data retention and residency. None of that frees you from your own organization's data classification rules. If you wouldn't paste it into a public web form, treat it the same way for Codex.

Data governance for procurement teams

Four tiers we use internally. Color-coding optional but useful.

🟢 Safe freely. Published prices, public supplier filings, public market data, RFP drafts before they're sent, public company press releases, your own internal templates with no client/customer data.

🟡 Safe with care. Internal supplier scorecards (your assessments, not the supplier's confidential data), draft category strategies, internal cost models, savings figures expressed as ranges rather than exact numbers. These are fine inside Codex as long as the project is in a controlled folder and your AGENTS.md has the right rules.

🟠 Use with caution. Contract terms under active negotiation, supplier financial data shared under NDA, internal pricing positions not yet disclosed, supplier-specific risk assessments containing confidential information. Allowed if and only if your security/legal team has signed off on the Enterprise contract terms, and the project is in a folder with controlled access.

🔴 Do not put in. Personally identifiable information (employee data, supplier contact PII beyond business contact, payment instructions, bank routing details), anything covered by a supplier NDA you haven't reviewed for AI tool clauses, classified information of any kind, anything where the loss of confidentiality would trigger a breach notification obligation.

Five governance principles for procurement Codex use

Default to least-privilege. Codex sees what its plugins can see. Restrict the Google Drive plugin to a controlled folder, not all-Drive. Restrict the browser to the portals where you've explicitly tested. Restrict MCP servers to ones your security team has reviewed.

Skills get reviewed before they get shared. Personal skills are personal. Shared skills are reviewed. No exceptions.

Automations get an owner and a review date. Every automation has a named owner. Every automation has a review date in the next 12 months. Automations without an owner die.

/plan-mode in the desktop app before any autonomous Cloud run on irreversible work. This is the discipline that lets you sleep at night.

Audit usage monthly. The dashboard. First of the month. 15 minutes. The cheapest insurance you'll buy this year.

Common questions you'll hear (or be asking yourself)

We get the same five questions in roughly the same order from every team we work with. Here are the honest answers.

"Is Codex more secure than Claude Cowork?"

Different model, similar threat surface. Both Anthropic and OpenAI have published trust portals with comparable commitments, no training on enterprise data, similar SOC 2 and ISO certifications, similar enterprise contract terms. The meaningful security difference is the connector surface: Cowork connects to more of your enterprise systems (M365, DocuSign, SAP via CData) than Codex does today, which means more places where misconfiguration can leak data, but also more places where the security team has visibility. Codex's smaller connector surface means narrower exposure today, but the in-app browser exposes a different surface (any web app the browser can reach). Neither is categorically more secure. Both require the same diligence.

"Does OpenAI train on our data?"

By default, no, for Business and Enterprise plans. The default for Plus and Pro consumer plans is opt-out (you can disable training on your data in settings, but the default is on). Business and Enterprise contracts explicitly exclude training. If your team is on Plus or Pro, check your training opt-out before doing real procurement work.

"Can I run Codex air-gapped?"

No, today. Codex is a cloud-hosted product. Some MCP servers can run locally, but the model inference happens on OpenAI infrastructure. If air-gapped operation is a hard requirement for your function, neither Codex nor Cowork solves your problem and you should be looking at on-premise open-source models with a vendor like AWS Bedrock or Azure OpenAI's deployment-specific offerings.

"What happens to our work if we cancel?"

For desktop-only use, your work is already local, skills, AGENTS.md, and any files Codex produced live in the folder you pointed Codex at (and via the Drive plugin, in your Google Drive). Cancellation removes your access to the Codex apps; the files persist in your own storage. If you've also used Codex Cloud or @Codex in Slack, the task history for those surfaces sits in the GitHub repos Codex connected to, same answer, those persist in your account too. Export anything operationally important before cancellation if you want a clean break.

"Will Codex replace my analyst?"

No, in 2026. It will make your analyst significantly more productive. The teams we've worked with for 6+ months have not reduced headcount; they've redirected analyst time toward higher-value work. The honest medium-term answer: the role will change, the same way every analyst role has changed for the last 20 years with every tooling wave. Plan for the role change. Don't promise headcount cuts you won't make.

A 30/60/90 Day Adoption Roadmap

A week-by-week plan for a procurement team starting today

We are routinely asked for a calendar. Here is the calendar. Adjust for your team's bandwidth and your function's risk appetite, but if you do roughly this, in roughly this order, you'll be in roughly the right place after 90 days.

Days 1–30: Land it in one person

Week 1. Sandeep (or whoever owns the rollout) installs the Codex desktop app, picks the Pro $100 tier for the personal trial, sets up AGENTS.md per Section 2, sets up the workspace folder, completes the "first 30 minutes" exercise. Two hours total. End of week 1: Codex is responding to you in your voice.

Week 2. Build the Supplier Research Brief skill from Section 5. Run it on 3 real suppliers from your active categories. Iterate the skill file until the third output is something you'd ship. Total time: 90 minutes across the week.

Week 3. Build two more skills. Pick from the library in Section 6, we recommend the Category Strategy Brief (skill 06) and the Negotiation Playbook Generator (skill 08), because both demonstrate /plan-mode and autonomous Cloud runs and produce outputs you can show colleagues. Total time: 2–3 hours across the week.

Week 4. Schedule your first automation. The Contract Renewal Tracker from Section 7. Run it once manually, then set it to monthly. End of month 1: you have 3 working skills, 1 automation, and a usable AGENTS.md.

Days 31–60: Bring in a colleague

Week 5. Pick one colleague who already trusts you, and run them through the "first 30 minutes" exercise yourself. 60 minutes co-working. End of the session, they have Codex installed, AGENTS.md customized, and they've watched you run one of your skills against your real data.

Week 6. They build their first skill (Supplier Research Brief) from scratch, using their own data and their own category. You sit in for the first 30 minutes, then leave them to it. Check in by Friday.

Week 7. Build three more skills together, pick from the library based on which workflows the two of you do most. Set up a shared folder structure following the team workspace pattern in Section 12. Move your shared skills into /05_skills/ so they can be invoked by both of you.

Week 8. Set up Codex in Slack for the two of you. Test @Codex delegations in a private channel. Run one parallel cloud job, refresh quarterly scorecards for 5 suppliers in parallel, and watch it work. End of month 2: two of you are productive, and you have a working pilot.

Days 61–90: Scale to the team

Week 9. Make the decision and present it. Bring the team your honest read. If Codex is in, get budget approval for Business or Enterprise seats. If Codex is staying narrow (e.g., just for parallel research and overnight Cloud runs alongside a primary Cowork deployment), document that scope clearly.

Week 10. Onboard the rest of the team. Run the 90-minute training format from Section 12, demo, build, convert, governance. Get every attendee to a working personal Supplier Research Brief skill by the end of the session.

Week 11. Set up team governance. Codex Admin assigned. Plugin defaults configured. Team AGENTS.md in a shared location. Compliance API or OpenTelemetry export configured (Enterprise). Monthly usage review on the calendar.

Week 12. Run one team-wide initiative end-to-end using Codex. We recommend a Quarterly Business Review, the deck, the data refresh, the scorecard updates, the executive summary. Whole team contributes. Codex does the heavy lift. End of month 3: the team has a worked example of "Codex in our actual operating rhythm" they can point to.

What a typical week looks like (once the team is running)

The narrative scenario from the Cowork playbook, retold for Codex. The same procurement team, the same Acme renegotiation, different tool.

Monday, 8:00 am. Priya (category manager) opens Slack. The Contract Renewal Tracker fired overnight in the cloud and posted to #procurement-team. Acme's MSA renews in 87 days. Value: $1.2M annual. Flagged as Critical because of an auto-renewal clause that triggers if no notice is given by day 60.

Monday, 9:30 am. Priya, in the Codex desktop app, runs the Supplier Research Brief skill on Acme. 4 minutes later: a one-pager covering financial health (rated stable), recent press (one labor action in their European business, six months ago, flagged), and our relationship (3-year contract, no scope creep, decent SLAs).

Monday, 2:00 pm. Priya toggles /plan-mode in the desktop app and types: "Refresh the [Category] strategy doc with Acme's contribution analyzed against the three credible alternatives." Codex outputs a 7-step plan. Priya edits step 4 (changes the rubric weighting). Approves the plan.

Tuesday, 9:00 am. Priya toggles /plan-mode off, opens Codex Cloud, and hands the refined objective to a fresh Cloud thread for autonomous execution. Walks away. Goes into other meetings.

Tuesday, 4:00 pm. Priya checks back. The brief is in /01_categories/[Category]/Acme_renegotiation_brief.gdoc. Three credible alternatives identified. Pricing benchmarks pulled from public filings and press. Two specific levers Acme might respond to. One material risk noted (Acme's European labor action).

Wednesday, 11:00 am. Priya in Slack: @Codex draft an outreach email to Acme's account manager opening the renewal conversation. Tone: collaborative, no specific asks yet, propose a meeting in the next 10 days. Draft arrives in the thread 90 seconds later. Priya tweaks two sentences and sends.

Friday, 3:00 pm. Priya runs the Negotiation Playbook Generator skill on the Acme renewal. Codex produces a 4-page playbook: BATNA (move to alternative 2), opening position, walk-away, three concession levers in sequence, three counter-proposal language patterns. Priya prints it for the negotiation kickoff next week.

The following Tuesday. Negotiation kickoff. Priya walks in with the playbook, the brief, the alternatives analysis, and a refreshed scorecard. The negotiation runs three sessions over the next month. Acme renews at 8% below the previous rate with extended payment terms.

Total Codex time used. Roughly 35 minutes of Priya's active attention across two weeks. Roughly 4 hours of autonomous Cloud work done in OpenAI's sandboxes while she did other things. The work that would have taken the analyst on her team a full week of focus took Priya less than an hour of her own time and freed her analyst for a different category that had been waiting.

That's the rhythm. The work didn't change. The bottleneck did.

Role Cheat Sheets

Prompt guides by role, 15 prompts each, copy-paste ready

Each role gets its own one-page prompt guide. Same seven roles we wrote for the Cowork playbook, the procurement work is the same, only the tool changed. Every prompt below is rewritten for Codex idioms: /plan-mode in the desktop app before complex work, Cloud runs for multi-hour autonomous work, @Codex in Slack for delegations, skill invocations by name.

We've included five sample prompt names per role; the full 15-prompt guide per role is downloadable as a PDF below.

🧮 Analyst

Tagline: Faster spend analysis, cleaner data, better questions for your manager.

If your job is to turn raw data into insight, Codex shortens the data wrangling and gives you back your thinking time. Five sample prompts:

Refresh the spend cube for [Category] using /00_inbox/[file].csv.
(Toggle /plan-mode first, then:) Build a price variance report for last quarter, then I'll review before you run it.
Run the supplier scorecard skill on these 4 suppliers in parallel.
@Codex what's our YTD spend with Acme? Check /01_categories/MRO/_cube/.
Read this scanned invoice and pull the line items into a Sheet.

Download the Analyst Prompt Guide →

🏷️ Category Manager

Tagline: One brain, more categories, with Cloud doing the analyst work overnight.

If you own categories, Codex's parallel cloud sandboxes and long autonomous Cloud runs are the highest-leverage feature in the product for you. Five sample prompts:

Build a category strategy brief for [Category] for the 2026 cycle.
(In Codex Cloud, as a long autonomous run:) Refresh the QBR deck for [Category] using the Q2 cube, scorecards, and savings register. Match last quarter's tone. Save to /00_inbox/.
Run the supplier news monitor every weekday at 7am for these 8 suppliers.
@Codex pull the renewal terms on every contract in /03_contracts/[Category]/ that renews in the next 90 days.
Generate a negotiation playbook for the upcoming [Supplier] renewal using the brief at /01_categories/[Category]/strategy_2026.gdoc.

Download the Category Manager Prompt Guide →

📋 Sourcing Lead

Tagline: RFPs that don't take six weeks anymore.

The RFP lifecycle is the single workflow where Codex's parallel cloud sandboxes save the most clock time. Five sample prompts:

Draft an RFP for [Category] using the requirements in /04_rfps/[Project]/requirements.gdoc.
Evaluate the supplier responses in /04_rfps/[Project]/responses/ in parallel against the rubric in /04_rfps/[Project]/rubric.gdoc.
(Toggle /plan-mode first, then:) Build a shortlist memo from the evaluation outputs. Recommend a final 3.
@Codex what's the status of the [Project] RFP? Read /04_rfps/[Project]/_status.md.
Build the vendor onboarding checklist for the winning supplier.

Download the Sourcing Lead Prompt Guide →

👔 CPO

Tagline: Better decks, sharper reads, less analyst dependency for the basics.

You don't need to operate Codex deeply. You need it to make your team faster and your board materials sharper. Five sample prompts:

Build the Q[X] QBR deck for the board review. Match last quarter's structure. Pull from /02_suppliers/_scorecards/ and /05_skills/_outputs/savings_Q[X].gsheet.
(In Codex Cloud, as a long autonomous run:) Read every active RFP in /04_rfps/. Produce a 1-page executive summary of where we are, what's at risk, and what decisions are pending.
@Codex draft a 5-line update for the CEO on the [Supplier] situation. Tone: factual, no hedging.
Read the last 3 board decks in /99_archive/. Tell me what messages we've consistently emphasized and what we've quietly dropped.
Build a supplier risk heatmap for the next board pre-read.

Download the CPO Prompt Guide →

📝 Contract Manager

Tagline: Contracts read, terms tracked, renewals never missed.

Contract work is data-heavy and detail-driven, exactly the kind of work where Codex's reliability matters most. Five sample prompts:

Read every contract in /03_contracts/[Supplier]/ and produce a matrix of renewal date, value, payment terms, termination clauses, notice period, and any out-of-pattern terms.
Run the Contract Renewal Tracker now and flag anything urgent.
(Toggle /plan-mode first, then:) Compare the proposed Acme MSA in /03_contracts/Acme/draft_2026.gdoc to our standard MSA template at /06_templates/MSA_standard.gdoc. Flag every deviation.
@Codex what's the notice period on the [Supplier] contract? Check /03_contracts/[Supplier]/.
Read this scanned legacy contract PDF and extract the key terms into our standard contract data Sheet.

Download the Contract Manager Prompt Guide →

🛒 Tactical Buyer

Tagline: Less time on the buying mechanics, more time on the buying judgment.

If your role is hands-on requisition-to-PO, Codex's gains are in the small repeated tasks that consume your day. Five sample prompts:

Read this requisition in /00_inbox/req_[id].pdf and tell me which contracted supplier is the right fit.
Build a 3-supplier RFQ for [item] using our template at /06_templates/RFQ.gdoc.
@Codex what's the contracted rate for [item] with [supplier]? Check /03_contracts/[supplier]/.
Compare these 3 supplier quotes and tell me which is the best total cost, not just the lowest unit price.
Draft a follow-up to [supplier] on the late delivery we discussed yesterday. Tone: firm, no threats yet.

Download the Tactical Buyer Prompt Guide →

🤝 Supplier Relationship Manager

Tagline: Better QBR prep, sharper insights, no missed supplier signals.

SRM is the role where the AGENTS.md and the project notes earn their keep, your job is full of context that doesn't fit in a doc. Five sample prompts:

Refresh the supplier scorecard for [Supplier] for [Quarter].
(In Codex Cloud, as a long autonomous run:) Read every interaction we've had with [Supplier] in the last 6 months, emails, meeting notes, Slack, and produce a one-page health assessment.
@Codex what was [Supplier]'s response on the [issue] from last month? Check /02_suppliers/[Supplier]/notes/.
Build the agenda for next Tuesday's QBR with [Supplier]. Use last QBR at /02_suppliers/[Supplier]/QBR_Q1_2026.gdoc as the model.
Read [Supplier]'s most recent press release and tell me what, if anything, we should bring up in the next conversation.

Download the SRM Prompt Guide →

Codex + Claude Cowork: Running Both

For teams who want to hedge their bets, here's how to split the work

This section exists because reality keeps producing the dual-tool answer and we want to spare you a year of trying to make one tool fit everything.

We will not tell you to use both. Plenty of procurement teams are well-served by Cowork alone or Codex alone. The teams that benefit from both are the ones with specific shapes, and once you've seen the shapes, you'll know quickly whether you're one of them.

Three archetypes

Archetype 1: Google shop, ChatGPT-standard. Your team lives in Google Workspace. Your organization has already standardized on ChatGPT for the broader knowledge work. Your spend cube is in Google Sheets. Your team chats in Slack.

Verdict: Codex primary. Cowork is unnecessary friction. You won't get enough out of Cowork's M365 connectors to justify the second tool. Stay on Codex and accept the gaps where they show up.

Archetype 2: Microsoft shop, regulated industry. Your team lives in Outlook, OneDrive, SharePoint. DocuSign is core. SAP Ariba or Concur runs your purchasing. You're in financial services, healthcare, defense, or another industry where audit trails are non-negotiable today, not at some future release date.

Verdict: Cowork primary. Codex narrow if at all. Cowork's M365/DocuSign/SAP connectors are the right tool for 90% of your daily work. You may want Codex for specific high-leverage moves (overnight Cloud runs on category research, parallel cloud jobs on multi-supplier evaluation), but they're enhancements, not the foundation.

Archetype 3: Mixed estate, ambitious procurement function. You're partly on Google, partly on Microsoft. Your team uses Slack and Teams. Your spend data is split across Excel, Sheets, and Ariba exports. You have one or two analysts who are AI-fluent and the appetite to run a more sophisticated tool stack.

Verdict: Both, with a clear split. This is where Section 16 earns its keep. Read on.

The dual-tool operating model

If you're running both, the work splits by tool fit, not by user. Every workflow has a home. Skills exist in both tools where there's overlap, with clear precedence.

A practical split for the mixed-estate archetype:

Workflow	Tool	Why
Supplier research brief	Either; usually Codex	Roughly equivalent quality. Codex slightly faster on parallel research.
Spend cube refresh	Cowork if Excel, Codex if Sheets	Tool follows data.
Contract analysis	Cowork	DocuSign integration.
RFP drafting	Either; usually Codex	Codex's `/plan-mode` then Cloud-run pattern is well-suited.
RFP evaluation (parallel)	Codex	Parallel cloud sandboxes. No Cowork equivalent.
Supplier scorecards	Cowork if M365, Codex if Google	Tool follows data.
Category strategy briefs	Either; team preference	Roughly equivalent.
QBR decks	Cowork for PowerPoint, Codex for Slides	Tool follows format.
Daily news monitor	Either	Roughly equivalent.
Overnight autonomous Cloud runs	Codex	Cowork can't run with laptop closed.
Slack-native delegation	Codex	First-party Slack app.
Teams-native delegation	Cowork	Microsoft connector.
Audit-trail-required work	Cowork (or Codex Enterprise)	Per your governance contract terms.
Computer-use / browser automation	Cowork on Pro/Max; Codex in-app browser	Roughly equivalent; usability differs.

We'd love to give you a clean rule. The clean rule is: tool follows data, tool follows format, tool follows the chat surface your team lives in. The rest is local optimization.

Cost of running both

Roughly: $40–$300 per active user per month combined, depending on tiers.

A reasonable mid-market estimate: one Cowork Max 20x seat at $200 plus one ChatGPT Pro $100 seat at $100 per power user, dropping to one Cowork Team seat at $125 plus a shared ChatGPT Business pool at ~$25/seat for the broader team.

For a team of 10 with 3 power users and 7 general users, the dual-tool stack runs roughly $1,200–$1,500/month at 2026 list prices. That's roughly one analyst-day per week per active user freed by the tooling, a margin that justifies the dual-tool overhead in every team we've worked with that fits the third archetype.

Where Molecule One stands

We will not push you toward either tool. Our practice runs both, and our recommendation to any team is the one that fits your data, your workflows, and your chat surface, not the one that's hot this quarter.

If you'd like help structuring the right split for your team, training your function on whichever stack you choose, or building the team plugin and skills library to make either tool productive on day one, email me at sk@moleculeone.ai, or use the form at moleculeone.ai/contact if you'd prefer.

★

Continue the series

If this playbook was useful, the rest of the series goes deeper on specific moves:

The Claude Cowork Playbook for Procurement Teams, the sibling article. If you read this one and want the Cowork comparison in full, this is it.
The Procurement Codex Starter Pack, free download below. The AGENTS.md template from Section 2 plus 8 ready-to-use skill files for the most-used procurement workflows.
Codex vs Claude Cowork: A Field Report from Procurement, coming soon, the head-to-head, with real workflows tested side-by-side.
GPT-5.5 vs Claude Opus 4.7 for Procurement Knowledge Work, coming soon, model comparison from the procurement-buyer's seat.
Building Your First Codex Skill: Supplier Research Brief Walkthrough, coming soon, Section 5 of this playbook, expanded into a full standalone tutorial.
The AI-Native Procurement Team Training Playbook, coming soon, the 90-minute training format from Section 12, with materials.

★

Get the free starter pack

We've published a free bundle with everything you need to put this playbook into practice on Codex:

The full procurement AGENTS.md template from Section 2
Eight ready-to-use skill files: Supplier Research Brief, RFP Draft Generator, RFP Response Evaluator, Supplier Scorecard, Category Strategy Brief, Contract Renewal Tracker, Negotiation Playbook Generator, and Supplier News Monitor
The recommended Codex project folder structure
A one-page governance checklist

Download the Procurement Codex Starter Pack →

No spam. Instant download. Unsubscribe anytime.

★

Sources and further reading

This playbook draws on OpenAI's official Codex documentation, public statements from the OpenAI Developers team, and our own 60 days of running procurement workflows through Codex in Q2 2026. Key references:

Molecule One is an AI-native procurement consultancy helping organizations transform their procurement operations. For implementation help, custom skill development, or enterprise rollout support, email Sandeep directly at sk@moleculeone.ai, or use the form at moleculeone.ai/contact.