Enterprise AI – Enablers for the next wave of infrastructure cloud

The acquisitions of MosaicML and Neeva by Databricks and Snowflake prompted this blog. Some definitions and then lets zoom out..

Consumer AI: heypi.com, ChatGPT and Bard are consumer AI service

Enterprise AI: Any SaaS/SW product powered by neural nets that is addressing a workplace/B2B or B2C use case. e.g. Jasper.ai, Databricks, Snowflake.

We all have seen and heard about this.

From Nov’22 to now Aug’23 – in one human pregnancy cycle (9 months), technology companies have given birth to their new child(ren) – ChatGPT, Bard, LLaMa-1,2, Dolly and many more LLMs (open & closed models with the recent announcement of Inflection (www.heypi.com) raising $1.3B, the consumer AI is in full swing. Combined they have raised >$10B.

Reflecting back to the 2000s, the consumer use cases emerged first, followed by enterprise. Search (Google), e-commerce (Amazon), and social media (Yahoo, MySpace) were the precursors to the rise of enterprise SaaS (CRM via Salesforce as the initial one). While Salesforce (founded in February 1999) was a cloud-first company, the “SaaSification” or “cloudification” of enterprises did not happen until AWS S3 and EC2 were announced in 2007 — almost 8 years later. To be clear Salesforce was not a cloud-native or cloud-resident SaaS until the past few years. It had SaaS as a business model from day 1. To get some data on Enterprise vs Consumer, I turned to our new Oracles…ChatGPT and Bard.

I asked ChatGPT & Bard the following prompts. ChatGPT failed, but bard came through..

  1. What is the total accumulated revenue for consumer SaaS?
  2. What is the total accumulate revenue for enterprise SaaS? (it gave market TAM instead)
  3. What is the total VC investments in consumer SaaS between 2003 and 2022?
  4. What is total COGS for consumer SAAS companies between 2003 and 2022?
  5. What is the total COGS for enterprise SaaS companies between 2003 and 2022?
  6. What is the accumulated revenue of AWS, GCP and Azure between 2007 and 2022?
  7. Can you now summarize the above data – into Enterprise SaaS and Consumer SaaS by revenue, COGS and investments between 2003 and 20022 in a simple table? – It put then summary in Google sheets.
MetricEnterprise SaaSConsumer SaaSSource Data
Market TAM (2021)$128.5 billion$113.1 billionGartner
COGS (2003-2022)$483.1 billion$359.3 billionStatistia
VC investments (2003-2022)$330.3 billion$243.8 billionPitchbook
Top 5 companies by revenue (2022)Salesforce, Workday, ServiceNow, Adobe, MicrosoftZoom, Slack, Atlassian, Adobe, Microsoft
Top 5 companies by COGS (2022)Salesforce, Workday, ServiceNow, Adobe, MicrosoftZoom, Slack, Atlassian, Adobe, Microsoft
Top 5 companies by VC investments (2022)Salesforce, Workday, ServiceNow, Adobe, MicrosoftZoom, Slack, Atlassian, Adobe, Microsoft
Source Bard: Consumer vs Enterprise SaaS
Cloud PlatformAccumulated Revenue (2007-2022)
AWS$900 billion
Azure$350 billion
GCP$150 billion
Source Bard: Accumulated Revenue – $1.4T

A few observations with this high level perhaps inconsistent sources of data.

  1. A significant portion of the $800B in COGSs was funded by Venture investments (spurious correlation – the total VC investments is also close to $570B).
  2. While the data here shows enterprise SaaS is bigger than consumer SaaS, if you include FB, GOOG, AWS – consumer SaaS is a lot bigger but also lot more concentrated vs enterprise SaaS where there are 150+ public companies and perhaps hundreds of startups.
  3. AWS was the biggest beneficiary of both the VC investment dollars and the open source community (AWS resold open source compute cycle for many years)
  4. Salesforce (CRM) was early with enterprise SaaS in 1999 along with Google and Amazon (consumer SaaS), the enterprise SaaS business inflection happened around circa 2007 i.e. the knee of the curve was 2007 – a 7-8 year lag behind consumer SaaS. (Salesforce revenue as a proxy).

A significant portion of the $800B of VC investments were funding CACC (Cost of Customer Acquisition). While I don’t have the actual percentage spent on CACC, but I would hazard a guess it would have been >30% maybe as high as 50% for many companies.

Assuming conservative 20%, that is $100B of venture capital going into CACC. And perhaps >$500B of cloud spend (COGS) with AWS, GCP and Azure during the same years. The total revenue of AWS, GCP and Azure between 2007 and 2022 is $1.4 trillion.

Hindsight is 20/20 – but projecting forward consumer AI and enterprise AI are going to be bigger – but the big shift is now from cost of customer acquisition (CACC) to Cost. of Compute (CoC) and the time collapse between Consumer AI and Enterprise AI. Lets look at both perhaps speculate or project forward.

  1. Cost of Compute (CoC)

A significant portion of VC and Corporate investments in the past 9 months is for cost of compute for largely consumer AI companies. The winner (to-date) in this round is Nvidia like SUNW was in the last round until 2003. Back in 2001, Internet was run by Sun boxes until the Open source, Linux and distributed systems became the new infrastructure stack.

As Elad Gil points out on twitter “At this point LPs in venture funds should just wire money direct to NVIDIA & skip the LP->VC->Startup->NVIDIA step”. Not so fast – the fish was falling into the boat back in 2001 at Sun and soon after it had to go fishing. More on this later.

And then there is Martin’s tweet – “If a company is going to train a $5B model, wouldn’t it make sense to use 5% of that to build a custom ASIC for that model? The benefits are certain to dwarf the 5% investment. At these scales, even ASIC design costs become marginal“. 5%. of $10B – enough for a new systems company to be built either from within the traitorous 8 or via a VC / Corporate investment. (more on this in a future blog).

The wake of the SaaS tsunami (since 2001) left us with AWS as the winner and leaving behind big SMP OEMs like SUNW, SGI, DEC, IBM). 20 years later the traditional Enterprise is served by 3 OEMs (CSCO, Dell, HPE) and 2+ enterprise software companies (Oracle, SAP and a distant third in IBM). Now they (the traditional enterprise infrastructure companies – both hardware and software) face a new tsunami with AI. A multi-trillion COGS spend is ahead of us and as we see new cloud investments (Coreweave, LambdaLabs) as well as AI+SaaS companies (the list above), potentially becoming the new AWS or GCP as they grow. Nvidia is spreading its new found shareholder capital to blanket this space. The silicon guys have woken up as they control the other end of the IP/value chain in a cloud stack. Given $10B+ of venture and corporate $s have gone to spend CoC in the past 7 months, it makes sense to fund startups that addresses CoC and challenge the incumbents? Seems like time is now – AWS from a perception has gone from #1 to #3 player in AI compute overnight, with Nvidia and (and perhaps its compatriots) thinking of how to enter this space.

AI unlike the past two decades of SaaS wants to be everywhere. In the cloud, on-prem, in-between, edge etc etc.

2. Enterprise AI

It took 7 years between 2000 and 2007 and >$10+B investments to transform the Enterprise SaaS with a new set of players. Instead of 7 years, in 7 months, we move from Consumer AI to Enterprise AI at breakneck speed. With the acquisition of Neeva by Snowflake, and MosaicML by Databricks, the first set of rockets have been launched to take on traditional enterprise companies (SAP, Oracle, IBM).

Back in 2007, the winners for infrastructure were new cloud companies. This time the winners for new infrastructure is likely to include silicon players (e.g. Nvidia, Intel, AMD, AVGO) and potentially some losers. Unlike 2007, with AI, there is this issue of Data (sovereignty, privacy, security) and CoC. Enterprise SaaS (e.g. Salesforce) started with on-prem infrastructure but moved to the cloud eventually. But there is a real possibility of reverse migration (move closer to data) especially with AI. Snowflake is already positioning itself as the DATAI cloud.

What is becoming a possibility is for the first time in perhaps 40-50 years, the incumbent enterprise software infrastructure companies (SAP and Oracle) are going to be challenged by Databricks and Snowflake. I say respectively because SAP is to Databricks and Oracle if to Snowflake follow Jerry Chen’s framework ( systems of intelligence and systems of record ) and it applies here once more.

The systems of engagement are the new Chat Bots like ChatGPT and Bard. In the 1990s client server era – SAP took pole position in the App Tier (“Systems of Intelligence”) while Oracle powered the enterprises (“Systems of Record”) with Sun, HP and IBM being the infrastructure providers. I had a front row seat between 1991 and 2001 (Sun, SPARC etc) in both being enablers for SAP and Oracle via SMP platforms and built the multi-core/threaded processors to match with the emerging AppTier built with Java as the programming layer to take advantage of the thread parallelism from the silicon world.

With onset of AI, the agility and pace with which Databricks and Snowflake are Co-opting AI (both are in the cloud today) can challenge the traditional Enterprise duo (SAP and Oracle). While they both think they are competing, they are actually offering different value to the enterprises and over time they will overlap, but not today. The bigger opportunity for both is not to compete with each other but use the new tool in the quiver (“GenAI”) to upend SAP and Oracle with sheer velocity of execution, engagement value (SQL vs natural language. Both have to address data sovereignty if they want to take their fight to the SAP and Oracle world. Addressing that means like enterprise SaaS jumped to the cloud back in 2007, enterprise AI ha to extend from the cloud to be closer to the customer.

The success of Snowflake and Databricks despite competing and layering on top of current cloud infrastructure is proof point that they can compete against well capitalized and tooled companies (AWS – Redshift/EMR and Google – Spanner/BigQuery….) that if they chose to extend their offerings to on-prem enterprises , they can win if they can address the data problem i.e. they need to jump from current cloud to emerging new cloud offerings. They can enable and consume new infrastructure players who can solve or address the data proximity issue.

Given the scale of COGS spend in the cloud for $1.4 Trillion, there is a atleast a $1Trillion of spend ahead of us in Enterprise AI away from the big three (AWS, GCP, Azure) in the next 15 years potentially led by the new duo of Databricks and Snowflake. On-prem Enterprise AI – which will continue to thrive and could well seed the new infrastructure players.

Lets revisit this in 3 years if new infrastructure providers emerge addressing CoC in new and differentiated ways.

Intelligrated ……

Ben Thompson of Stratchery in his recent blog on Intel Split prompted me to coin the word “Intelligrated“, which is a counterpoint to his thesis. – No, its not in the dictionary. Before we get to that, lets start with one topic he brings up as it is near and dear to me and many of my old fellow chip nerds from that time (1987-2003) which I would call as EDA 1.0 era.

EDA changed microprocessor roadmap starting Circa 1987 and continued through late 1990s: Ben references Pat’s paper on Intel EDA methodology which scaled design methodology to track moore’s law. Intel invested heavily in EDA internally as the industry was immature. Around the same time Sun Microsystems which built its business selling EDA/MCAD workstations was changing the industry EDA landscape (methodology and eco-system). [An aside: Would not be surprised if x86 CPUs till Pentium IV, were designed using Sun workstations]. Both companies had parallel but different approaches.

EDA 1.0: Intel vs Sun approach: Sun’s approach was industry tools and if it does not exist enable the industry eco-system to be built. It perhaps started in 1989 when a motley crew (25) of engineers (including yours truly) built the first CMOS SPARC SOC (Tsunami – referenced here) with no prior experience in custom VLSI. We all came out of a cancelled ECL SPARC microprocessor where none of us had done any custom VLSI design. The CAD approach was…

Necessity is the mother of invention. Sunil Joshi captured the EDA methodology then in the MicroSPARC HotChips presentation. 486 (Pat’s chip) had 1.2M transistors (100+ engineers perhaps) vs the more integrated MicroSPARC at 800K transistors that came 2 years later (same time as Pentium which had 200+ engineers) but a full SOC. AS noted, we had only 2 mask designers and every engineer had to write RTL, synthesize, verify, do their own P&R, timing analyze and two of us built the CAD flow so that it can be push button. Auto-generated standard cells were automatically P&R using compiler tools. That was not the norm for ‘custom VLSI’ circa 1991 for that scale.

That eventally got the name ‘Construct by correction vs Correct by construction’ and throughout the 1990s, this evolved, scaled and made the processor design competitive as well raised a lot of boats in the EDA industry that evenutally got Intel to adopt industry tools with a healthy mix of in-house tools. With no in-house EDA teams, we creatively partnered (Synopsys, Mentor), invested (Magma, mid 1990s), helped M&A (gateway design – verilog, Cooper and Chyan – Cadence), spin-out (Pearl timing analyzer to replace motive). At the same time IBM EDA tools which were superior to both Sun’s approach and Intel, but was locked up inside IBM until later in the decade, when it was too late.

In parallel, there was a great degree of systems innovation (SoCs, glue-less SMP, VIS, Ethernet, graphics accelerators, multi-core+threading) that was enabled by EDA 1.0 and CMOS custom VLSI by the industry at large with Sun leading the parade. Allow me to term it as ISM 1.0 (Integrated Systems).

Now IDM 1.0 is what made Intel successful to beat all the RISC vendors. We (Sun and compatriot RISC vendors) could beat Intel in architecture, design methodology and even engineering talent and in some cases like Sun which had OS and Platform built a strong moat. But, we could not beat Intel on manufacturing technology. Here’s a historical view of tech roadmap.

Tech History (litho nm only)

Caution: Given dated history, some data from 1990s could be incorrect – corrections please notify.

In a prior blog I have called out how Intel caught up with TI and IBM on process technology by 1998 (they were the manufacturing leader but not the technology leader w.r.t xtor FOM or metal litho until 1998). TI Process technlogists used to complain ‘At Intel design follows fab and you folks are asking fab to follow design’ as we demanded xTOR FOM and Metal litho more than Moore in the 1990s. By 1998 with coppertone, Intel raced ahead with both litho as well as xtor FOM (60% improvement in 180nm with coppertone to boost Pentium MhZ). So Intel was not xtor FOM leader in early 1990s, but they pulled ahead by 1-2 generations by late 1990s. It has been done by Intel. When they did go ahead is when IBM and TI became non-competitive (starting 1998) for high end microprocessors and the beginning of our end and my departure from microprocessor (unless I go to Intel). (Side note: Both TI and Intel bet on BiCMOS till 650nm). Unlikely history will repeat itself as the dynamics are different today with consolidation, but it has been done before by Intel.

Intel’s leadership with IDM 1.0 and co-opting ISM 1.0 architectural elements (by 2002 – multi-core, glue-less SMP, MMX, integrated DRAM controllers) into its processors made it difficult for fabless CPU companies to thrive despite having systems business to fund – which was not sufficient by 2002. Even 500K CPUs/year (Sun/SPARC) was not economically justifiable. IBM, SGI, HP and many more esp. dropped as cost of silicon design and tech went up. [Side note: I am not sure on a standalone basis Graviton is economically viable for Amazon – if 500K CPUs was not viable in 2000, 50K is certainly not viable in 2020 – but sure they can wrap other elements in the TCO stack to justify for a few generations – not sustainable over 3-5 generations). Regardless, 20 years later…

IDM 2.0 is a necessary first step and good that Pat & Intel are putting that focus back. But IDM 2.0 needs ISM 2.0 and the same chutzpah of late 1980s of design EDA innovation but this time perhaps own the ‘silicon systems platform SW stack’.

ISM 2.0 is ‘Integrated Systems 2.0’. If SoC were the outcome of Moore’s law in the 1990s, SoP (Systems on Package) is the new substrate to re-imagine systems as the platform is becoming heterogenous for compute (CPU, DPU, IPU, xPU), Memory (DRAM, 3D xpoint, CXL memories) and Networks (Ethernet and CXL). There will always be a CPU (x86 or ARM), but increasingly we will find a DPU/IPU/XPU in the system that sweeps all the new workloads. The IPU/DPU/GPU/XPU will increasingly be an SoP with diversity of silicon types to meet the diversity of workload needs. But it will need a common and coherent software model to be effective in enabling platforms and workloads including low level VMs or run-time with standard APIs (e.g. P4/Sonic, Pytorch+TF and others).

On economies of scale which killed the RISC revolution (amongst other reasons), I have written about SoC vs SoP in a different blog here, but its important to consider the diversity of customers from OEM platforms to cloud platforms, emerging telco/edge service providers, and emerging ML/AI or domain specific service providers that have a large TAM. Each one needs customization i.e. no more one size fits all platforms, its multiple chips into multiple SoPs to different ways to package and deliver to these new channels of delivery – OEM (Single System), Cloud (distributed systems) and emerging decentralized cloud. But to retain the economies of scale of chip manufacturing while delivering customized solutions to the old and new category of customers, we are moving towards disaggregate at the component level and aggregation at the platform software level.

Just to get a better sense of varied sales motion- Nvidia is a chip company, a box company (mellanox switches and DGX) as well as a cloud company (delivering ML/AI as a service w/Equinix).

This is more than Multi-core and Virtualization that happened in Circa 2003 (VMware). An entire new layer of software will be a key enabler for imagining the new ‘integrated systems’ and delivery of them. For lack of a better TLA , let me call it EDA 2.0. The design tooling to assemble these variety of SoP solutions requires new design tooling to enable customization of the ‘socket’. The old mantra was sell 1 chip SKU in millions. That is still true. But we will have multiple SoP SKUs using the same multi-million unit chip SKU. The design tooling to assemble these SoP has not only manufacturing programmability but in the field as there will be some FPGA elements in the SoP as well as low level resource management functionality.

Hijacking the OSI model as a metaphor to represent the infrastructure stack…..

7 layers of Infrastructure Stack
Bottom 3 layers form the emerging silicon systems plane

The homogenous monolithic CPU has now become heterogenous CPU+DPU/IPU/XPU/FPGA. Memory from being homogenous DDR to DDR + 3Dxpoint on DDR bus and CXL bus.

So ‘Integrated Systems’ is an assembly of these Integrated chips based on target segments and customers, but have to manage the three axes of flexibility vs performance vs cost. While silicon retains that one mask set for most target segments (economics), the customization at the packaging level enables the new ‘integrated system’ (the bottom three layers in above visual) . This new building block will become complex over time (hardware and software) thus value extraction (simplification of complexity results in value extraction) but requires capital and talent. Both exists ironically either with the chip vendor or with the cloud service provider, the two ends of the current value chain.

The pendulum is starting swing back from the vertically integrated chip company (1980-2000) to the era when OEMs owned Silicon (Sun, HP, IBM) or chip companies owned fab (Intel), to Horizontalization with the success of fabless chip (Nvidia, Broadcom……) + TSMC (2000-2020) to again vertical integration at sub-system level for certain segments or markets.

Back to Intel Split vs Intel Integrated. In this era, if there is any lesson learnt from the EDA 1.0 era, it would be smarter to do a build, buy and partner i.e. build the new tooling (EDA 2.0 and BIOS 2.0) in a smart combination of build, buy, partner and expand into invest, open source and build a moat around that eco-system that will be hard for competitive chip companies to compete. EDA 2.0 is not the same as EDA 1.0 – its both design tools pre-manufacturing and low level programming and resource management frameworks. Directionally some of it is captured here by Chris Lattner ( MLIR & CIRCT). We have a chicken and egg situation that to create that layer we will need new silicon constructs, but to get to the right silicon, you will need the new layer (akin to how Unix+C enabled RISC and RISC accelerated Unix and C eventually referenced here..)

Coming back to Intel v TSMC and splitting, TSMC is good at manufacturing – but has not (yet) built eco-systems and platforms at higher levels. Its their customers (fabless). Intel knows that and done that many times over. I make the case that Intel Integrated with IDM 2.0 and ISM 2.0 and being flexible in delivery SoC, SoP and even rack level products to emerging decentralized-edge cloud providers will the emerging opportunity.

Splitting the company will devalue the sum of parts than being whole in this case. While, the point of split (fab) driving accountability and customer focus and serviceability is there, there are perhaps other ways to achieve the same without a formal split while retain the value of integration of the parts.

Smart integration of the various assets and delivery. Creatively combining design engineering, EDA 2.0, BIOS 2.0 and linking it to SoP and SOC manufacturing including field level customization will be a huge value extraction play. The Apple vs Android model of market share vs margin share. IDM 2.0 with ISM 2.0 will get market share and margin share for dominant compute platforms.

A POV – Intel has do 3 things. IDM 2.0 (under way), ISM 2.0 (will elaborate that in a future note) and something else ( aligned with Intel’s manufacturing DNA) truly out of the box before 2030 when its going to be economically and physically hard to get more from semiconductors as we know. That has to be put in place between now and 2030…

————————————————————-

References to EDA at Sun…

Historical CPU/Process Node Data..

Platformonomics

The Snark Must Flow!!!

wrong tool

You are finite. Zathras is finite. This is wrong tool.

----- Thinking Path -------

"knowledge speaks but wisdom listens" Jimi Hendrix.

The Daily Post

The Art and Craft of Blogging

WordPress.com News

The latest news on WordPress.com and the WordPress community.