Toward AGI: 7 Critical Milestones Before “General Intelligence” in 2026 - and the Startup Opportunities They Create

Articles
February 21, 2026

Over the past few days, the debate around “the road to AGI” has reignited. On February 5, 2026, OpenAI announced GPT-5.3-Codex, while Anthropic unveiled Claude Opus 4.6.




Over the past few days, the debate around “the road to AGI” has reignited. On February 5, 2026, OpenAI announced GPT-5.3-Codex, while Anthropic unveiled Claude Opus 4.6. The common thread across both releases is striking: models no longer just generate answers; they perform tasks on computers, sustain long-horizon workflows, and plan and execute multi-step actions.

In the same week, according to Bloomberg’s compiled projections, Alphabet, Microsoft, Amazon, and Meta are planning capital expenditures approaching $650 billion in 2026. This massive budget is flowing into the “invisible layers” of the road to AGI: data centers, chips, networking, and energy infrastructure.

Put these two pictures side by side, and the main story of 2026 becomes clear: AGI will not arrive overnight. It is the result of engineering, productization, safety, regulation, data and infrastructure, and business model transformation converging. As we move toward “general intelligence,” we are in fact passing through a series of critical inflection points.

At Boğaziçi Ventures, as we officially activated our AI-focused BV Growth II fund at the end of 2025, our core thesis was this: value will be created not primarily at the model layer, but in the secure and measurable integration of models into real business processes. Reading the seven milestones below through that lens makes the startup opportunities of 2026 much clearer.

The Economic Threshold: The Unit Cost of Intelligence Falls, the Bottleneck Shifts to Infrastructure

One of the quietest yet most decisive inflection points on the road to AGI is economic. According to Stanford HAI’s 2025 AI Index, the cost of querying a GPT-3.5-level model dropped from $20 per million tokens in November 2022 to $0.07 in October 2024—more than a 280× decrease in about 18 months. The same report notes that, depending on the task, LLM inference prices have fallen between 9× and 900× year over year.

As prices fall, the market expands in two ways:
(i) More companies can experiment with more use cases.
(ii) Previously uneconomical edge workflows—multi-step analysis across multiple data sources with tool usage—cross the cost threshold.

However, by 2026 the bottleneck has shifted: to energy and compute infrastructure.

Drawing on IEA data, Pew Research Center reports that U.S. data centers consumed 183 TWh of electricity in 2024, with projections reaching 426 TWh by 2030. As AI becomes cheaper and usage explodes, the energy impact becomes visible. The projected ~$650 billion in Big Tech CAPEX for 2026 is therefore no coincidence—the next leap is no longer just about better models, but about the ecosystems (data centers, networks, chips, energy) that power them.

Startup opportunity:

  1. Efficiency layer (AI FinOps): model selection, compression, caching, latency/cost optimization, task-based routing.

  2. Energy-aware infrastructure: power management, peak shaving, intelligent workload scheduling, hybrid edge-cloud architectures.

In 2026, competitive advantage will not come solely from using a better model—but from accomplishing the same task at lower cost and with less energy.

The Model Layer Is Commoditizing: Multi-Model Strategy and the Open-Weight Wave

In 2024–2025, the central question was “Which model is better?” In 2026, for many institutions, the more critical question is:
“How do we manage multiple models without being locked into a single provider?”

Two factors are accelerating this shift.
First, the maturation of open-weight models and more efficient architectures. Meta’s Llama 4 family (April 2025), with its MoE architecture and natively multimodal design, brought capabilities such as extended context (10M for Scout) to a broader developer base.
Second, enterprise risk and cost optimization: within a single product, one workflow may call a premium model, another a lower-cost model, and yet another an on-premise deployment.

This commoditization makes the application layer more valuable. A dynamic “model control plane” that routes decisions based on cost-quality tradeoffs, data privacy requirements, latency targets, and regulatory constraints is becoming critical infrastructure.

Startup opportunity:
Multi-model orchestration, vendor risk management, model procurement and compliance workflows, continuous benchmarking infrastructure. From a BV perspective, this is a horizontal layer needed across industries—but the strongest products will emerge where horizontal capabilities merge with deep vertical domain understanding.

Context and Memory: Long Context Is Not Enough — The Rise of the Enterprise Knowledge Layer

One of 2026’s most visible leaps is in context windows and memory. Anthropic’s Claude Opus 4.6 introduced a 1-million-token context window (beta) for the Opus class. At this scale, massive codebases, long contracts, and multi-page reports can be handled in a single session.

But long context alone does not solve the accuracy problem.

For enterprises, what matters most is calling the right source at the right time, surfacing citations, and ensuring data freshness. Retrieval-augmented generation (RAG) and enterprise search layers remain central.

The real shift: LLMs are evolving from “knowledge producers” into “interfaces to knowledge.” This requires organizations to transform their internal data (documents, email, CRM, ERP, logs) into a well-structured knowledge layer with tagging, access control, and lifecycle management.

Startup opportunity:
Data connectors, permission and access management, enterprise search + RAG infrastructure, citation-aware response systems, knowledge graphs, and data quality layers.

In short, we are entering an era where winners are defined not by “strong models,” but by “strong knowledge layers.”

Agents: From Assistants to Autonomous Workflows (and the Limits of Autonomy)

The most tangible reason 2026 feels closer to AGI is agentic systems.

OpenAI’s GPT-5.3-Codex highlights long-horizon task execution, research capability, tool usage, step-by-step computer interaction, and interactive user steering. These capabilities extend beyond code writing into the full software lifecycle—debugging, deployment, testing, metrics, documentation—and into knowledge work surfaces like spreadsheets and presentations.

Anthropic similarly emphasizes planning, long-running agentic tasks, and reliability in large codebases.

However, a sober note is necessary. For most enterprises, agents remain risky in production. Gartner predicts that by the end of 2026, 40% of enterprise applications will include task-specific AI agents—but also that a significant share of agentic AI projects will be canceled by 2027 due to cost, risk, or lack of value.

2026 is not the year of “autonomize everything.” It is the year of autonomizing the right tasks at the right level.

Startup opportunity:
High-frequency, measurable ROI workflows (finance close processes, IT operations, procurement, compliance, customer support), agent orchestration, and human-in-the-loop design.

The best products will treat autonomy not as a binary toggle, but as a controllable, inspectable spectrum.

Reliability Engineering: Evals, Observability, and “Agent QA” Become a Category

On the road to AGI, we often focus on intelligence—but in real production systems, reliability engineering is decisive.

An agent can send emails, deploy code, process invoices, query databases. Each action requires validation beyond classical software testing.

Stanford HAI’s AI Index notes that reported AI incidents reached 233 in 2024, a 56.4% increase year over year. Deployment velocity is outpacing control mechanisms.

In 2026, system cards, evaluation reports, red-team exercises, live observability (telemetry, audit logs, decision traces) are no longer optional. OpenAI’s publication of a dedicated System Card for GPT-5.3-Codex reflects rising transparency expectations in enterprise markets.

Startup opportunity:
Evaluation platforms (scenario testing, regression testing, comparative benchmarks), agent observability (tool calls, data usage, confidence scoring), automated policy enforcement and approval workflows.

This is foundational infrastructure for agentic transformation—and therefore a deep, durable market.

Security and Compliance: Trust Is Now a Competitive Strategy

As agents begin performing real actions, the threat model expands.

The OWASP Top 10 for LLM Applications (2025) highlights prompt injection, data leakage, and excessive agency as LLM-specific risks layered on top of classical application security.

On governance, frameworks such as NIST’s AI Risk Management Framework and Generative AI Profile provide operational guidance across “govern-map-measure-manage.”

Europe is moving early on regulation. Under the EU AI Act, obligations for general-purpose AI providers began applying on August 2, 2025. Subsequent simplification proposals (“Digital Omnibus on AI”) signal that compliance is a moving target.

In 2026, compliance is not just legal overhead—it is a differentiator in enterprise sales.

Startup opportunity:
Compliance-as-code, LLM security testing, data classification and access control, incident monitoring and reporting infrastructure.

Trust is becoming productized.

Multimodal and Physical Expansion: AI Beyond the Screen

Another critical dimension on the road to AGI is multimodality and embodiment.

AI is no longer limited to text—it increasingly processes image, video, audio, and sensor data, and operates in physical environments (robotics, industrial automation, logistics, healthcare).

As we noted in our BV Insights CES 2026 review, AI is moving beyond screens into perceiving, deciding, and acting systems. This shift is driven not just by better models, but by more efficient edge hardware, cheaper sensors, mature connectivity/device management, and enterprise readiness to move from pilots to deployments.

Physical AI raises the bar for safety and accountability: a wrong action can be costlier than a wrong answer.

Startup opportunity:
Robotics/edge AI software stacks, simulation and digital twin testing environments, sensor fusion, safety-critical agent systems, and field monitoring with continuous learning.

Markets with strong engineering talent—such as Turkey—have a real opportunity to iterate rapidly through real-world pilots.

Reading AGI as a System

When we connect these seven milestones, a more grounded view of AGI in 2026 emerges. “General intelligence” should not be understood as a single model release, but as the convergence of:

  • Scalable infrastructure

  • Multi-model strategy

  • Reliable knowledge layers

  • Inspectable, controllable agents

  • Security and compliance frameworks

  • Multimodal and physical productization

At Boğaziçi Ventures, our applied AI investment lens is shaped by this systems perspective. With BV Growth II, we prioritize teams that go beyond impressive demos—those that embed security, data integration, measurable value creation, and scalable go-to-market strategies.

A final note for founders:
In 2026, winning products will not be defined by “smarter models,” but by “better systems.” Products that reduce customer risk perception, measure business outcomes, and gradually increase autonomy in a controlled manner will create outsized value—even before AGI fully arrives.

Let me close with a practical mini-checklist we use in the field:

When evaluating a product idea, ask:

  1. Is there a clear before-and-after metric within a single workflow?

  2. Are data access, permissions, and citations built into the design?

  3. What actions will the agent take, and what approval layers are required?

  4. Are evaluation, logging, rollback, and incident management systems in place?

  5. Can compliance and AI security be productized as a “trust package” in the sales process?

Teams that can answer these well can create tremendous value—AGI or not. Because transformative shifts are rarely about big words; they are about disciplined engineering and the right market timing.