The “Terza” Wave….

History never repeats itself, but it rhymes

The rhymes of the bard are drowned by chatter of GPT, the black swan has been sighted, a reference my good friend Partha Ranganathan called out last year at Stanford. A moment I have been waiting for 20 years and in reflection feels like I have been in training for this moment…..

A black swan moment has three attributes – its an outlier, has extreme ‘impact’ and we humans connect the dots after the fact. Terza (three) is past, present and future. Terza vertices of a triangle, and a perfect storm typically three colluding events. Terza is number three in Italian.

2023 and ChatGPT is perhaps like events that happened in 1998 (see below) that caused a perfect storm. New workloads, New platforms and New business models. The enabler – vision (diffusion , SAM ) models and NLP (transformers ), we have two parallel AI trains that are taking us past the AI inflection point that even Google (the source of key ideas) did not realize the bend in the exponentials of adoption. We have landed on the third (Terza) wave of modern computing. The entire stack will get re-engineered and I am reminded once more about this shift from the past.

To support the ‘entire stack will be re-engineered‘ claims above, allow me to take the history rhymes analogy to build the collective intuition on this thesis.

Going back 20 years (Circa 2001-2003), we faced a similar ‘black swan’ moment. At that time it was the overnight shift from proprietary Unix ‘Scale-up‘ platforms (Solaris, HPUX, AIX) and programming models to distributed systems or shift from coherent shared memory to distributed memory. Search and in general SaaS in effect was a distributed memory (and data) programming model problem that was solved at the application tier. It was not that obvious and potentially ubiquitous application or workload model back in 2003. By 2013 (10 years) it was a settled matter.

The Tsunami wave that led to that was built on two technologies (processors and operating systems) and an economic downward shift ( bubble) accelerated the business model shift (Capex to Opex) as well. One was Linux maturing over Solaris/Unix as the Software infrastructure platform and multi-core x86 winning over multi-core RISC. That whole transition occurred between 1995 and 2002.

source: Tech Shifts Talk at Intel

In 1995, Intel was lagging on processors for infrastructure ( it was the peak of 64-bit RISC machines from Sun, HP, MIPS/SGI and Power). Intel was using BiCMOS (Pentium) till mid 1990s and was behind the CMOS tech curve (IBM, TI..) as I have written here – (Intelligrated). Between 1998 and 2002, Intel prioritized engineering the transistor, postponed metal transition (Cu) until later (side note: We [SPARC @ TI] followed IBM’s lead on copper metal transition and industry norm for FOM improvements. At Intel, fab dictated the PDK the the CPU designers should use (other way around with us). The chart below shows how Intel came back from behind.

In that period (1995-2002), Open Source (linux) reaches an inflection point.

Source: Wikipedia.

By 1995, Linux was expanding to other ISAs (SPARC and PPC), Redhat going public in 1999, and release of 2.4 was a key marker for its maturity and adoption by then internet companies as the operating system for their infrastructure.

Side note: At the same time in 1998 – three companies amongst many were founded. VMware, Google and Equinix. We will get back to this later in this blog.

Linux gaining momentum thereby sedimenting the value of operating systems was the perfect setup for the shift.

Tectonic shifts are punctuated by Margin vs Market share dilemma with incumbents. OEMS of those days (HPE, IBM, Sun) were beholden to their customer base and their workload like Oracle (shared everything database) that prefered SMPs and thus missed the shift. While NUMA was a way to extend the scale of shared memory platforms, the ‘enterprise app’ programmers were beholden to the shared memory (SMP) model.

At the same time Google, Amazon, Akamai and Yahoo decided to take a clean sheet and work backwards from application or use cases (SaaS) and build systems for massive scale problems like CDN, search, shopping, social media. The marginal cost and sedimentation of value at work, which is again in play 20 years later.

Looking further back, the Scale-up era was initiated by the “Attack of the killer microprocessors (RISC), Unix (then open source) and a recession (1982 ). This has occurred twice in the past, and a thesis is we are onto the third (i.e Terza).

Between 1998 and 2008 (origins of EC2 and S3 at AWS) , the entire infrastructure for the new use cases was flipped over. With cloud formation, we went from 3-Tier to SaaS. The era of distributed ‘memory’ systems was punctuated by CAP theorem (Eric Brewer), strongly consistent to variable consistency models for data (Cassandra), reliable by design vs recovery oriented computing (Dave Patterson/Berkeley), the distributed computing manifesto (Werner vogels).

An interesting side note is the emergence of virtualization at the peak of scalable shared memory platforms whose value was eventually captured as a pure software play (VMware) and a free or open version (marginal cost going to zero), enabled the cloud.

Platform Transitions

Back to the future

A down market for SaaS, the rise of AI (emergent use case) driving the need for new infrastructure, the rise of xPU (Nvidia-GPU, Google – TPU) and inadequacy of current cloud providers to both satisfy the demand and meet the cost, performance and data sovereignty needs. The same margin vs market share dilemma is being played out (Google vs ChatGPT-MSFT) that is going to give rise to new players (a few upstarts – OpenAI, Cohere, Anthropic, Stability, character, adept….) that will take on the Goliaths with a new David strategy. They are going to drive the new stack. Cost of compute is their #1 problem as witnessed by their capital raise of millions (as opposed to cost of customer acquisition in the early SaaS era).

The rise of the xPU. The past 6 months, we see the fortunes of Nvidia rise meteorically while Google has been working with TPUs for last 7+ years along with more than $2B of venture capital investments made for various AI silicon all the way from the large (Sambanova, Cerebras, Graphcore) to medium (Habana) and a smattering of others differentiating on cost, power and form factors and end deployment models. While Nvidia GPU and Google TPU are leading the pack in terms of deployment scale, the game has begun, but consolidating fast.

At scale, Inferencing dominates the total processing cycles (4:1) and inferencing can be done across a range of hardware solutions from CPU to xPUs, demanding price/performance or rapid commoditization of cost. Training is dominated by accelerators or performance worth the price design point. Inferencing at scale is compute #1, memory #2, network #3, Training is all about the Network #1, Memory #2 and then compute #3, Inverted model relative to serving. Architecturally, it’s becoming a CPU + xPU silicon duo esp when it comes to serving / inferencing whereas Training is GPU first and then CPU. The ground truth here is CPUs and the huge software infrastructure are here to stay and serve a well understood purpose. The system design is once again constrained by memory model and its scale and additional compute ( linear algebra). The ratio of compute to memory and coherency vs consistency are the tradeoffs while keeping programming model simple. I distinguish coherency vs consistency to call out hardware based SMP (1-4 sockets) vs combination of hardware and software with variable consistency to meet the demands of scale vs resilience vs performance for the new serving and training workloads.

That is driving the system architectural push to a dual network, largely to bring compute and memory closer (“memory consistent”). In the case of Google their OCS and case of Nvidia its NVlink and other industry initiatives like CXL, including custom designs like in Dojo. The recognition here is base system needs for 10x-100x higher bisection bandwidth driven by large models touching a large chunk of memory (100s TB to PB perhaps) with associated compute. The need for a new memory model between hardware based coherence and application software driven coherent i.e distributed memory – I term it as ‘consistent memory’.

This bifurcates the networking model of the past 20 years which led to success of ethernet as a flat (CLOS) single type to a two level network as memory has to be accessible for a given span of compute (defined by training and serving clusters). A high bandwidth and perhaps lower latency ‘local network’ that is perhaps spanning an aisle like the Dojo or the Google TPU cluster or Nvidia SuperPOD and ethernet for its ubiquity and traditional networking needs e.g. storage.

Finally, Linux and perhaps Unix system calls were the ‘API’ between the application developer and their chosen framework, we have now moved one level up to have the graph model and PyTorch/Tensorflow/cuDNN as the API contract between the developer and underlying infrastructure. In this case like Linux sedimented the value of operating systems, an open source framework will win (e.g. PyTorch). Revisiting the key elements of the stack…

The visual above is an over-simplified view of the evolving layers of the infrastructure stack. but the point is each one of these boxes resulted in new companies or companies re-engineering the entire stack. While one cannot predict the end state, the change event has happened and the race is it on to build the best Open AI Cloud (not be confused by OpenAI or Cloud).

Finally onto the new new business models – like the transition from Capex to Opex to _______(left as an exercise to the reader).

In closing……

The new infrastructure stack will be driven by new memory programming model (coherent, distributed to now consistent), the API and unit of workload defined by run time systems like Unix, VMs and now models to finally exploit Instruction level parallel (ILP), to threading (TLP) to now memory/data level parallelism (MLP).

At the same time the business models also have evolved at various layers as depicted by the visual below. In 2023, are we at 1998 (beginning) or 2003 (peak) of transition – cannot say – but we are somewhere in that similar time scale of transitions.

1998 was a eventful year.

  1. launch of lowest cost Unix Workstations and Server ($1K) – Ultra 5
  2. Launch of more scalable SMP (E10K)
  3. Linux: IPO of Redhat
  4. 0.18uM CMOS (Coppertone) process from Intel (getting ahead in CMOS tech)
  5. Founding of VMware, Google and Equinix
  6. Amazon’s distributed computing manifesto
  7. Windows 98

Reference to Terms in this blog:

  1. UMA: Uniform Memory Architecture
  2. NUMA: Non-Uniform Memory Architecture:
  3. UNA: Uniform Network Architecture
  4. NUNA: Non-Uniform Network Architecture.
  5. Scale-UP: Shared (Coherent) Memory Systems or Programming Model
  6. Scale-Out: Distributed Systems i.e. memory across nodes are not coherent and communication is via IP or Infiniband or any non-coherent network fabric
  7. Scale-in: New term introduced. A hybrid between Scale-UP and Scale-out.

Pizza to Dojo…(Open *)

30+ years back (April 1989 to be exact) was the the launch of the Pizza box form factor for servers (and eventually became 1U/2U nomenclature), but it was the simple mechanical form factor that was the beginning of the server building block, but it was also an approach to design we learnt from Andy Bechtolsheim and impinged in memory. As the Wikipedia article notes Andy, “specified that the motherboard would be the size of a sheet of paper and the SBus expansion cards would be the size of index cards, resulting in an extremely compact footprint“.

But there was more to that simple statement than just formfactor. The system design i.e. the main silicon components had to be designed to fit. That was his view and that is what I got my chops of think silicon in the context of the system. It was the beginning of the journey of more silicon integration (floating point, then cache, followed by MMU, then memory controller and IO controller – all within 3 years), better IO (Sbus when compared with the ATI bus in PCs). The outside in-thinking was carried forward and Andy continued to lead the industry in designing 2-way and 4-way SMP in small form factors which forced silicon integration, re-think interfaces, innovate in packaging with that outside in-thinking. We were in the ‘green zone’ of moore’s law. It was a full stack solution as we had control over the silicon, platform hardware, operating system and perhaps even user experience.

Silicon and thus hardware was increasingly getting more integrated while software started its journey down that decade to disintegrate – starting with domains in big SMPs (E10K) to Virtualization and the massively distributed systems that we have to come to accept as the modern mainframe. Reflecting, the core form factor and components inside that has not changed the last 30 years. More integration, smaller coherence domains, DRAM and processor followed moore’s law and IO evolved from Sbus -> PCI -> PCIe. EDA, MCAD and HPC codes loved Pizza (boxes)i.e it was then the favourite of the emerging high performance computing workloads. But within 10 years, it was no longer the recipe for just those esoteric EDA engineers or HPC folks. That evolved to grow big in the form factor of E10K to run the world’s enterprise and eventually a separate stream taking those Pizza boxes (Akamai was first before Google, Amazon and Azure) and step+repeat to build the cloud as we know today.

The ‘system’ in retrospect was fairly simple and it started from the bottom up simple form factor to build more complex and scalable platforms.

30 years of systems innovation has been focused on integrating more into silicon and using software to both scale and disintegrate that silicon (virtualization, distributed system), fueled in large part by the congruent evolution of ethernet and networking.

At hotchips this year, folks at Tesla took a different approach than what has not been done since 1989. Like then – took a form factor view and define some key attributes that had to be solved (cost of communication, memory/compute/IO ratios, latency across the span and many more). The form factor was an aisle in a data center.

Source: Beyond Compute

Dojo engineers re-imagined all layers of this hierarchy instead of being beholden to the current source of silicon and platforms ( because they can) into this.

Source: Beyond Compute

With that they got back to this.

Source: Beyond Compute

Distracting note: At the other end of this spectrum Apple with its M1 is pursuing a different full stack optimization.

While there has been tremendous innovations in ‘distributed systems’ the last 20 years as manifested by the cloud at Google, Amazon and Microsoft, the basic hardware building block and the hierarchies and the metrics for platform design followed the same rules of thumb as moore exponential was still in vogue.

Looking back it has been incremental in large part as new workloads like ML training and inferencing that demanded heterogeneity in compute. With that heterogeneity need, drives the need to deal with disaggregated memory, deal with differening ratios of compute, memory or IO based on model/ap, which did not appear until recently. There is another reason beyond workloads to look at this from new perspective. The same Moore curve is not ahead of us. So flattening the hierarchy in hardware and software, re-imaging the network/communication pathways, disaggregating the system, but ‘integrate’ at the platform software level.

Ganesh Venkataramanan calls out the following software stack opportunities – use new APIs for ML (Unix System call was the API between SW and hardware back then), think massively parallel, more compilers, less kernel, renewed focus on distributed compilers, reduced OS roles, flexible ratios of memory-compute and IO (we had fixed ratios for 30 years), disaggregate memory.

source: Beyond Compute

This time, we have to do the reverse. Instead of starting from the Pizza box to build the data center, work backwards from the data center system design to derive the new Pizza box or perhaps Pizza stack.

An at-scale disaggregated system has to be ‘shrunk’ to meet the demands of deployment everywhere not just as a training supercomputer, just like the Pizza box of 1989 evolved to run all workloads and eventually everywhere.

While Dojo might look like a training supercomputer for the select few, taking an approach to shrink from the data center to the pizza stack as the new basic building block could well expand this to spawn itself from the ML training data center to every enterprise workload past, present and future. Perhaps this is akin to building the Roadster then model S and Model 3 and perhaps throw in a cybertruck along the way!

The only way to do that is open up the platform for others to innovate or build the same with open standards, open source and the modern version of Open Systems. Both will happen. The era of the Open Cloud is opening. We have reached that inflection point like 1989.

We cannot predict what is possible in 10 years…but we can imagine and realize it.

Tesla’s Ganesh Venkataramanan shared the rationale behind their approach.

Computing Epochs

Moore (Intel) and Cycles of Re-inventions

A talk I gave (at Intel) a year and a half back on technology cycles, Moore’s law and cycles of how starting with semiconductors the fountain head of technology has shaped the cycles of re-inventions.

It took more than a year to post a more verbose version of the talk in this format. So its broken down to four parts, roughly co-inciding with the four different epochs. We have entered the fourth Epoch of modern computing era beginning with Intel, Schottky, Arthur Rock and the Fairchild 8.

Intel has been part of each transition in both good and not so good ways…

The PreMoore or Pre-Cambrian explosion of is here (More Memory). Memory was the instigator.

That was followed by the Cambrian explosion with the Web 1 or Internet 1 era ending with dot-com. Logic was the Pivot. That summary is here (Logical Moore)

Then came the many moore or multi-core, virtualization, cloud and that is here. (Many Moore)

Finally speculating on the future is here (Moore No More)

But more to come on the future as we have entered the next disruptive phase.

Moore Memory – Epoch I

This is the ‘pre-cambrian’ explosion of semiconductors that created many categories and industries starting with the invention of the first diode to the shift from memory driving semiconductor processes to logic (the famous Andy Grove Pivot).

As shown in the visual below, it was the era of Bipolar (pre-CMOS), the creation of the first calculator based on 4004 that punctuated the beginning of modern semiconductor and Intel’s thumbprint in the evolution of semiconductors.

Going down the memory lane, its 1968 and the creation of the first semiconductor memory which was followed soon by frederico faggin and the 4004 architecture and the beginning of an instruction set (x86) that has remained upwards compatible (largely) to-date. Amazing that it has withstood the test of processor evolution and competition for 50 years almost. It was the height of IBM mainframe with its 370 and 360 architectures, DEC with its mini super computers and Sperry and Univac / Burroughs challenging IBM for business/commercial computing.

It was also the beginning of the modern version of venture capital with Arthur Rock making his first investment in Intel. 1972 was also the vietnam end game, oil crisis while quietly Arpanet with its first pings between two locations across the country happen. Seeds of ethernet was sown then which has also survived the onslaught of other ideas.

The perfect mix of Ethernet, PNP devices (flavors changed with time), CISC/x86 and by late 1970s, Microsoft and BSD Unix – all of these were given birth in this era and lifted so many companies, so many entrepreneurs, so many industries, countries too. (Taiwan eventually, which is in the news lately).

1976-1980 with 8086 – the birth of the modern microprocessor, BSD Unix and DOS. Looking back as in the Epoch III, it was the ARM, iPhone and AWS – how co-incident (within years) technologies that are seemingly unliked (Unix and Windows), but play a role eventually for new categories of platforms.

That era ended around Circa 1984 (roughly) 16 years, with Intel dropping memory for logic (a critical decision by Andy Grove), the birth of RISC with two competing professors and institutions (Berkeley and Stanford, Hennessy and Patterson) and the emergence of the Mac.

Little did I know reflecting back or googling back in history, that this era was punctuated by other magnetic drives giving birth to the modern hard disk, compilers and most important of all CMOS – complementary metal oxide semiconductor. CMOS was not Intel’s initial choice, but it came just by sheer nature of its characteristics beating out NMOS, PMOS and more importantly Bipolar or emitter coupled logic – not for power density (which was the reason claimed them) – just the scalability it showed over time.

Software as an industry/category was effectively created by Microsoft (mainframes and minicomputers had a closed software ecosystem). Oracle and SAP were founded and the big enterprise software market was effectively created at the same time as client software company was founded (microsoft and perhaps Apple) – Co-incidence?. Database was a separate category / company or bundled by the big mainframe makers.

1984: Big Brother did not appear and it was the dawn of open and free internet, open systems (Sun), new microprocessor approach (RISC), compilers became good enough to beat most hand written code.

Little did I know 1984 was the year IP law was formulated thanks to intense lobbying by Intel to fend off the Japanese who were competing rather well afgainst Intel on memory (which is no longer in Japan either now).

Rising tide lifts all boats. The chart below the boats that the confluence of MOS, Unix, Windows, PC, Server and the 10Mbps ethernet (which is now 10,000 times faster).

It was the birth and quick death of AI (expert systems) within a short span of a few years. Categories that were founded were Database, personal computing (SW and hardware), technical workstations that led to client-server later, birth of modern networking, OEM business (separation of selling hardware from software), Internet and the end or the beginning of the end of vertically integrated Mainframe, Intel and memory, (NMOS, PMOS, Bipolar)

My professional life in technology and semiconductors (which was not the original plan like most) started at the end of this era competing with Intel via RISC, Sun, CMOS and in hindsight carried through the Epoch II (1987-2003), which was the exponential part of Moore’s law.

How lucky I was in retrospect to enter the semiconductor industry Epoch II. That’s the next blog.

Logic(al) Moore – Epoch II

The Cambrian explosion era: This part of Moore’s law evolution is most interesting/exciting as it simultaneously saw the rise of various companies, categories, technologies, business models and everything in between but more relevant for me personally as it overlapped personally with my professional life /growth. So the impressions left are stronger and deeper.

But, let’s start with a recap of the Moore growth curve and timelines..

This was Internet 1 or Web 1.

CMOS established itself as the dominant. RISC established as the dominant microprocessor architectural model (which was eventually co-opted by x86/CISC), OEM as a category emerged, not just for compute, but for networking (Cisco) and more relevant EMC (Storage).

The creation and formalization of Software (Enterprise and Consumer), Compute, Network and Storage OEMs and connecting all them to create the network effects we see the results today.

In 1985, the 386, 5″ HDD helped accelerate the PC revolution. The Pizza box Sun workstation which later gave way to the 1U server driving the client – server architectural model fueled by the killer micros taking on VAX and IBM 360 to onslaught both the big enterprise technical and commercial computing.

The $75B dollar OEM business we see today was created from $0B by EMC, Cisco, HP, Sun, SGI, IBM and many more.

1991-1995 and later 1999-2003 are two interesting segments of this era.

1991-1995: The emergence of the Mosaic browser was co-incident with creation of Linux to challenge prior proprietary Unix operating systems that eventually led to the creation of the LAMP stack and thus rudimentary version of the ‘cloud stack’, all open source and helped consolidate x86 as x86 caught up with the RISC by late 1990s on performance, MHz marketing, and enterprise RAS capabilities. It was also the beginning of 64 bit computing led by MIPS, SPARC and eventually PPC and x86. But the modern cloud distributed systems design was created by Akamai which was founded in response to a challenged by Tim Berners Lee back in 1995. Its interesting to look back in time to see how technology evolutions like biological evolutions are both a random walk, survival of the fittest and is constrained (in this case) by faster, cheaper, better, more scale.

1998 is interesting looking back – Founding of Google, VMware, Equinix and many more – but today in 2022, Google epitomizes the leading edge of systems, while VMware has followed the Sun trajectory of living breathing for 20 years to get eventually acquired by another company whereas Equinix a real estate company is moving up the value chain to become the switzerland (potentially) the of physical infrastructure.

1998 is interesting from an Intel standpoint. It had won the client side computing, but failed to get significant market share in the enterprise, technical computing but was not the leader in semiconductor logic (it had compatriots in IBM and TI at a minimum).

But rising to the challenge of beating all the RISC vendors, with MHz marketing as an additional weapon, the technology (fab) guys at Intel took it upon to push ahead starting with 0.18 uM (Coppertone) to advance the transistor performance so hard that we at TI (then) and soon IBM felt it was not an economically competitive game to play.

The above chart (not a totally fair chart to compare two different microprocessor design styles with one metric – MHz), still its the best way to highlight what Intel and their fab did back in 1998 to separate the men from the boys, so to speak.

1998 was the beginning of the end of big SMP as witnessed by a 64GB 64 core SPARC SMP selling for $1M (today 64GB is $250 or less) but was the dinosaur (looking back) that had to give way to the rising tide of small 1U distributed system design that was initiated by Akamai, but exploited by Google and eventually everybody doing cloud and SaaS. It was of-course fueled by the emergence of 1G and soon 10G ethernet.

CMOS and multi layer metal (10+) became the norm. Multi-core was it as it nascent stages of creation and evolution. Apple was in the dumps but an iPOD was saving the bacon to eventually paved the way to an Phone and the emergence of ARM, eventuaally Samsung and TSMC as competitive foundries to Intel.

Personally, looking back my technical and professional learning and growth followed the same exponential as Moore’s law (A chart below to show). Starting from 1991 to 2003 – with the single chip SoC microprocessor (industry first and that too in CMOS) to 8 generations of SPARC culminating in the first dual core – spanning from 0.8uM to 90nm (roughly a decade) over a 10 year period.

Pinching myself to be lucky to have been at the right time , right place and right part of the Moore evolution.

This era was ended by dot-com bust and next wave of architectural simplification soon to follow (like RISC did to prior design styles back in 1984).

A few other technology innovations that should not be left out. FN tunneling i.e. Flash as technology was created in the late 1980s but found its way as a consumer portable media storage device (initially with the iPOD) and eventually into the enterprise.

A few notable things in this era.

EDA as a category and industry was formed (left today by Cadence and Synopsys)

OEM model was the dominant business model with licensed software as a business that depended on the OEM hardware. Separation of hardware and software from a business and GTM

Networking as a category as demonstrated by Cisco

Storage as a category as seen with EMC and NetApp with Flash finding its way over a 20 year period as key component of computing.

The browser as the presentation layer and the creation of systems of engagement, systems of intelligence and systems of record as the 3-layers of the enterprise stack.

Broadband witnessed hyper growth and innovcation in DSL, WiFi (OFDM).

It also saw the end of Mini computers (VAX/Digital), BiCMOS, GaAS, ECL/Bipolar, end of Japan’s dominance in memory.

Opensource came to existence to disrupt the prior models of Software distribution

In closing… Intel’s market cap rose and fell like most other dot-com companies.

It was peak Intel by 2003 and that 20 years later is being challenged.

Many (core) Moore (Part III computing Epoch)

Back to the past – This is part III of four part story of the computing epochs as punctuated by Moore’s law in which Intel had its imprint for obvious reasons.

This is the 2003-2020 Era, in which multi-core, Open source, Virtualization, cloud infrastructure, social networks all blossomed…The onset of it was the end of MHz computing (Pentium IV) to multi-core and throughput computing.

It was also the beginning of my end in semiconductors for a brief period (20 years) until I decided its time to get back in the 2020s…That was punctuated by the first multi-core CPUs (mainstream) that Sun enabled – famously known as Niagara family and of-course the lesser know is UltraSPARC IIe which has an interesting contrast to Intel’s Banias (back to Pentium).

Some would call it Web2 era or Internet 2 era…The dot-com bubble which blew a number of companies in the prior era (OEM era), paved the way for new companies to emerge, thrive and establish the new stack. Notably at the infrastructure level, Moore was well ahead with first multi-core CPUs enabling virtualization and accelerated the decline of other processor companies (SPARC, MIPS), system OEMs as the market shifted from buying capital gear to cloud and opex.

Semiconductors investments started to go out of fashion as Intel dominated and other fabs (TI, National, Cypress, Philips, ST and many more withered) leaving Intel and TSMC with an also-ran Global foundries. In the same period, architectural consolidation around x86 happened along with Linux, ARM emerged as the alternative for. a new platform (mobile) via Apple. Looking back it was the value shifting from vertical integration (fab + processors) to SoC and thus IP (ARM) became dominant despite many attempts by processor companies to get into mobile.

Convergent to the emergence of iPhone/Apple/ARM, was AWS EC2 and S3 and thus the beginning of cloud with Opex as the new buying pattern instead of capex. This had significant implication as a decade later that very shift to commodity servers and opex comes full circle via Graviton and TPU with the cloud providers going vertical and investing in silicon. Intel’s lead on technology enabled x86 to dominate and when that lead in technology both slowed thanks to Moore’s law and TSMC, the shift towards vertical integration by the new system designers (Amazon, Google, Azure).

Simultaneously, emergence of ML as an emerging and significant workload that demanded new silicon types (GPU/TPU/MPU/DPU/xPU) and programming middleware (TensorFlow and PyTorch) broke the shackles from Unix/C/Linux to new frameworks and new hardware and software stack at the system level.

Nvidia happened to be at the right time at the right place (one can debate if GPU is the right architectural design), but certainly the new category or the tea leaves for the new system which is a CPU + xPU seeds were sown by mid 2010s….

All of the shift towards hyper scale distributed systems was fueled by Opensource. Some say that Amazon made all the money by reselling open source compute cycles. Quite true. Open source emerged and blossomed with the cloud and eventually the cloud would go vertical and raises the question – Is open source a viable investment strategy especially for infrastructure. The death of Sun microsystems was led by open source and. the purchase of RedHat by IBM formed the bookends of Open Source as the dominant investment thesis by the venture community. While open source is still viable and continues to thrive, it’s not front and center as a disruptor or primary investment thesis by end of this era as many more SaaS applications took the oxygen.

We started with 130nm 10 layers of metal with Intel taking the lead over TI and IBM and ended with 10nm from TSMC taking. the lead over Intel. How did that happen? Volumes have been written on Intel’s mis-steps, but clearly the investment into 3DXpoint and trying to innovate or bet with new materials and new devices to bridge the memory gap did not materialize and distracted. Good idea and important technology gap need, but picking the wrong material stack distracted.

The companies that emerged and changed the computing landscape were VMware, Open Source (many), Facebook, Apple (Mobile), China (as a geography ). The symbiotic relationship between VMware and Intel is best depicted in the chart below.

Single core to dual socket multi-core evolution…

On networking front The transition from 10Gbps to 100Gbps (10x) over the past decade is one of the biggest transformation of networking adoption of custom silicon design principles.

Above chart shows the flattening of the OEM business while the cloud made the pie larger. OEMs consolidated around big 6 (Dell, HPE, Cisco, Lenovo, NetApp, Arista) and rest withered.

GPU/xPU emerged as a category and along with resurgence in semiconductor investments (50+ startups with $2.5+B of venture dollars). Generalization of xPU with a dual heterogenous socket (CPU + xPU) is becoming the new building blocks for a system, thanks to CXL as well. The associated evolution and implications for the software layer was discussed here.

We conclude this era with the shift from 3-tier enterprise (‘modern mainframe’) stack that was serviced by OEMs to distrbuted systems as implemented by the cloud providers where use case (e-commerce, search, social) drove the system design whereas technology (Unix/C/RISC) drove the infrastructure design in the prior era (a note on that is coming…)

In summary – Moore’s law enabled multi-core, virtualization, distributed systems, but its slowdown of growth opened the gates for new systems innovation and thus new companies and new stack including significant headwinds for Intel.

Lets revisit some of the famous laws by famous people…

  1. Original Moore’s law – (cost, density)

Bill Joy’s change it to Performance Scaling. Certainly slowing down and shift in performance moved to throughput over latency. Needs update for ML/AI era, as it demands both latency and throughput.

2.Metcalfe’s Law – Still around. See the networking section.

3.Wrights Law (demand and volume) – – this predates moore’s law and now applies to many more domains – battery, biotech, solar etc…

4.Elon’s law – (A new one…) – Optimal alignment of atoms and how close to that is your error. We are approaching that.

5.Dennard Scaling – Power limits are being hit. Liquid cooling is coming down the cost curve rapidly.

Intelligrated ……

Ben Thompson of Stratchery in his recent blog on Intel Split prompted me to coin the word “Intelligrated“, which is a counterpoint to his thesis. – No, its not in the dictionary. Before we get to that, lets start with one topic he brings up as it is near and dear to me and many of my old fellow chip nerds from that time (1987-2003) which I would call as EDA 1.0 era.

EDA changed microprocessor roadmap starting Circa 1987 and continued through late 1990s: Ben references Pat’s paper on Intel EDA methodology which scaled design methodology to track moore’s law. Intel invested heavily in EDA internally as the industry was immature. Around the same time Sun Microsystems which built its business selling EDA/MCAD workstations was changing the industry EDA landscape (methodology and eco-system). [An aside: Would not be surprised if x86 CPUs till Pentium IV, were designed using Sun workstations]. Both companies had parallel but different approaches.

EDA 1.0: Intel vs Sun approach: Sun’s approach was industry tools and if it does not exist enable the industry eco-system to be built. It perhaps started in 1989 when a motley crew (25) of engineers (including yours truly) built the first CMOS SPARC SOC (Tsunami – referenced here) with no prior experience in custom VLSI. We all came out of a cancelled ECL SPARC microprocessor where none of us had done any custom VLSI design. The CAD approach was…

Necessity is the mother of invention. Sunil Joshi captured the EDA methodology then in the MicroSPARC HotChips presentation. 486 (Pat’s chip) had 1.2M transistors (100+ engineers perhaps) vs the more integrated MicroSPARC at 800K transistors that came 2 years later (same time as Pentium which had 200+ engineers) but a full SOC. AS noted, we had only 2 mask designers and every engineer had to write RTL, synthesize, verify, do their own P&R, timing analyze and two of us built the CAD flow so that it can be push button. Auto-generated standard cells were automatically P&R using compiler tools. That was not the norm for ‘custom VLSI’ circa 1991 for that scale.

That eventally got the name ‘Construct by correction vs Correct by construction’ and throughout the 1990s, this evolved, scaled and made the processor design competitive as well raised a lot of boats in the EDA industry that evenutally got Intel to adopt industry tools with a healthy mix of in-house tools. With no in-house EDA teams, we creatively partnered (Synopsys, Mentor), invested (Magma, mid 1990s), helped M&A (gateway design – verilog, Cooper and Chyan – Cadence), spin-out (Pearl timing analyzer to replace motive). At the same time IBM EDA tools which were superior to both Sun’s approach and Intel, but was locked up inside IBM until later in the decade, when it was too late.

In parallel, there was a great degree of systems innovation (SoCs, glue-less SMP, VIS, Ethernet, graphics accelerators, multi-core+threading) that was enabled by EDA 1.0 and CMOS custom VLSI by the industry at large with Sun leading the parade. Allow me to term it as ISM 1.0 (Integrated Systems).

Now IDM 1.0 is what made Intel successful to beat all the RISC vendors. We (Sun and compatriot RISC vendors) could beat Intel in architecture, design methodology and even engineering talent and in some cases like Sun which had OS and Platform built a strong moat. But, we could not beat Intel on manufacturing technology. Here’s a historical view of tech roadmap.

Tech History (litho nm only)

Caution: Given dated history, some data from 1990s could be incorrect – corrections please notify.

In a prior blog I have called out how Intel caught up with TI and IBM on process technology by 1998 (they were the manufacturing leader but not the technology leader w.r.t xtor FOM or metal litho until 1998). TI Process technlogists used to complain ‘At Intel design follows fab and you folks are asking fab to follow design’ as we demanded xTOR FOM and Metal litho more than Moore in the 1990s. By 1998 with coppertone, Intel raced ahead with both litho as well as xtor FOM (60% improvement in 180nm with coppertone to boost Pentium MhZ). So Intel was not xtor FOM leader in early 1990s, but they pulled ahead by 1-2 generations by late 1990s. It has been done by Intel. When they did go ahead is when IBM and TI became non-competitive (starting 1998) for high end microprocessors and the beginning of our end and my departure from microprocessor (unless I go to Intel). (Side note: Both TI and Intel bet on BiCMOS till 650nm). Unlikely history will repeat itself as the dynamics are different today with consolidation, but it has been done before by Intel.

Intel’s leadership with IDM 1.0 and co-opting ISM 1.0 architectural elements (by 2002 – multi-core, glue-less SMP, MMX, integrated DRAM controllers) into its processors made it difficult for fabless CPU companies to thrive despite having systems business to fund – which was not sufficient by 2002. Even 500K CPUs/year (Sun/SPARC) was not economically justifiable. IBM, SGI, HP and many more esp. dropped as cost of silicon design and tech went up. [Side note: I am not sure on a standalone basis Graviton is economically viable for Amazon – if 500K CPUs was not viable in 2000, 50K is certainly not viable in 2020 – but sure they can wrap other elements in the TCO stack to justify for a few generations – not sustainable over 3-5 generations). Regardless, 20 years later…

IDM 2.0 is a necessary first step and good that Pat & Intel are putting that focus back. But IDM 2.0 needs ISM 2.0 and the same chutzpah of late 1980s of design EDA innovation but this time perhaps own the ‘silicon systems platform SW stack’.

ISM 2.0 is ‘Integrated Systems 2.0’. If SoC were the outcome of Moore’s law in the 1990s, SoP (Systems on Package) is the new substrate to re-imagine systems as the platform is becoming heterogenous for compute (CPU, DPU, IPU, xPU), Memory (DRAM, 3D xpoint, CXL memories) and Networks (Ethernet and CXL). There will always be a CPU (x86 or ARM), but increasingly we will find a DPU/IPU/XPU in the system that sweeps all the new workloads. The IPU/DPU/GPU/XPU will increasingly be an SoP with diversity of silicon types to meet the diversity of workload needs. But it will need a common and coherent software model to be effective in enabling platforms and workloads including low level VMs or run-time with standard APIs (e.g. P4/Sonic, Pytorch+TF and others).

On economies of scale which killed the RISC revolution (amongst other reasons), I have written about SoC vs SoP in a different blog here, but its important to consider the diversity of customers from OEM platforms to cloud platforms, emerging telco/edge service providers, and emerging ML/AI or domain specific service providers that have a large TAM. Each one needs customization i.e. no more one size fits all platforms, its multiple chips into multiple SoPs to different ways to package and deliver to these new channels of delivery – OEM (Single System), Cloud (distributed systems) and emerging decentralized cloud. But to retain the economies of scale of chip manufacturing while delivering customized solutions to the old and new category of customers, we are moving towards disaggregate at the component level and aggregation at the platform software level.

Just to get a better sense of varied sales motion- Nvidia is a chip company, a box company (mellanox switches and DGX) as well as a cloud company (delivering ML/AI as a service w/Equinix).

This is more than Multi-core and Virtualization that happened in Circa 2003 (VMware). An entire new layer of software will be a key enabler for imagining the new ‘integrated systems’ and delivery of them. For lack of a better TLA , let me call it EDA 2.0. The design tooling to assemble these variety of SoP solutions requires new design tooling to enable customization of the ‘socket’. The old mantra was sell 1 chip SKU in millions. That is still true. But we will have multiple SoP SKUs using the same multi-million unit chip SKU. The design tooling to assemble these SoP has not only manufacturing programmability but in the field as there will be some FPGA elements in the SoP as well as low level resource management functionality.

Hijacking the OSI model as a metaphor to represent the infrastructure stack…..

7 layers of Infrastructure Stack
Bottom 3 layers form the emerging silicon systems plane

The homogenous monolithic CPU has now become heterogenous CPU+DPU/IPU/XPU/FPGA. Memory from being homogenous DDR to DDR + 3Dxpoint on DDR bus and CXL bus.

So ‘Integrated Systems’ is an assembly of these Integrated chips based on target segments and customers, but have to manage the three axes of flexibility vs performance vs cost. While silicon retains that one mask set for most target segments (economics), the customization at the packaging level enables the new ‘integrated system’ (the bottom three layers in above visual) . This new building block will become complex over time (hardware and software) thus value extraction (simplification of complexity results in value extraction) but requires capital and talent. Both exists ironically either with the chip vendor or with the cloud service provider, the two ends of the current value chain.

The pendulum is starting swing back from the vertically integrated chip company (1980-2000) to the era when OEMs owned Silicon (Sun, HP, IBM) or chip companies owned fab (Intel), to Horizontalization with the success of fabless chip (Nvidia, Broadcom……) + TSMC (2000-2020) to again vertical integration at sub-system level for certain segments or markets.

Back to Intel Split vs Intel Integrated. In this era, if there is any lesson learnt from the EDA 1.0 era, it would be smarter to do a build, buy and partner i.e. build the new tooling (EDA 2.0 and BIOS 2.0) in a smart combination of build, buy, partner and expand into invest, open source and build a moat around that eco-system that will be hard for competitive chip companies to compete. EDA 2.0 is not the same as EDA 1.0 – its both design tools pre-manufacturing and low level programming and resource management frameworks. Directionally some of it is captured here by Chris Lattner ( MLIR & CIRCT). We have a chicken and egg situation that to create that layer we will need new silicon constructs, but to get to the right silicon, you will need the new layer (akin to how Unix+C enabled RISC and RISC accelerated Unix and C eventually referenced here..)

Coming back to Intel v TSMC and splitting, TSMC is good at manufacturing – but has not (yet) built eco-systems and platforms at higher levels. Its their customers (fabless). Intel knows that and done that many times over. I make the case that Intel Integrated with IDM 2.0 and ISM 2.0 and being flexible in delivery SoC, SoP and even rack level products to emerging decentralized-edge cloud providers will the emerging opportunity.

Splitting the company will devalue the sum of parts than being whole in this case. While, the point of split (fab) driving accountability and customer focus and serviceability is there, there are perhaps other ways to achieve the same without a formal split while retain the value of integration of the parts.

Smart integration of the various assets and delivery. Creatively combining design engineering, EDA 2.0, BIOS 2.0 and linking it to SoP and SOC manufacturing including field level customization will be a huge value extraction play. The Apple vs Android model of market share vs margin share. IDM 2.0 with ISM 2.0 will get market share and margin share for dominant compute platforms.

A POV – Intel has do 3 things. IDM 2.0 (under way), ISM 2.0 (will elaborate that in a future note) and something else ( aligned with Intel’s manufacturing DNA) truly out of the box before 2030 when its going to be economically and physically hard to get more from semiconductors as we know. That has to be put in place between now and 2030…


References to EDA at Sun…

Historical CPU/Process Node Data..

Moore No More – (Part IV – Computing Epochs)

The next Tsunami….

We overestimate what we can accomplish in 2 years and under-estimate what we can in 10 years

Bill Gates

“We are at a time of enormous transition and opportunity, as nearly all large-scale computing is moving to cloud infrastructure, classical technology trends are hitting limits, new programming paradigms and usage patterns are taking hold, and most levels of systems design are being restructured. We are seeing wholesale change with the introduction of new applications around ML training and real-time inference to massive-scale data analytics and processing workloads fed by globally connected edge and cellular devices. This is all happening while the performance and efficiency gains we’ve relied on for decades are slowing dramatically from generation to generation. And while reliability is more important than ever as we deploy societally-critical infrastructure, we are challenged by increasing hardware entropy as underlying components approach angstrom scale manufacturing processes and trillions of transistors.”

Amin Vahdat on Google on SRG creation

We are at the cusp of another systems evolution and rebuild of the entire stack. We have entered the 4th epoch of computing at roughly 20 year intervals, with

  • Epoch 1 1965-1985 (mainframe , CRT terminal (VT100))
  • Epoch 2: 1985-2005 ([Workstation, Unix Servers], [PC, Mac])
  • Epoch 3: 2005-2025 (Cloud, SmartPhone)
  • Epoch 4: 2025- (TBD, TBD)

It roughly follows Moore’s law in terms of impact to systems as shown in the visual below.

The onset of next wave is both obvious and non-obvious based on your vantage point. It was not obvious in 1980 – Ethernet, NFS, RISC, Unix let Sun (and perhaps PCs) and eventually big SMP boxes was the end game along with it Sun’s demise starting the end of dot-com bust. Technology stack drove the use case but they drove out the mini and main frame. So Monolithic to disaggregated back to monolithic (or perhaps modular) within 20 years!!! The business model then was OEM (Capex). Unix/C/RISC was the tech stack at the infrastructure level and use cases or killer apps were HPC, EDA and eventually shared everything databases (Oracle) and 3-tier enterprise apps (SAP). In a prior SoC to SoP blog I mentioned the emergence of cheap servers from both Sun and ODMs – but Sun failed to capitalize on that early lead and trend as it was beholden to margin over market share – a point Martin Casado makes about Dr. Reddy Labs . A classic reference

By 2000 we saw search (Google, 1998), e-commerce (AWS, 1995), hypervisor (VMware, 1998), distributed systems (1U rack mount servers), 10G networks, distributed storage (S3 – 2007) and then cloud happened with that and more. This time use cases drove the infrastructure stack and the incumbment i.e. OEMs (Sun, HPE, DEC etc) missed that transition and some eventually disappeared with last man standing effectively being Dell through consolidation, low cost supply chain (market share over margin) and financial engineering (well executed!). Again disaggregated and now back to monolithic (hyper scaler clouds of today). The business model with Cloud being Opex. (Linux+VM/Java,javascript,c,…/x86) is the tech stack and all the new consumer and enterprise SaaS apps.

The tides of complexity and simplification is on its march again – but this time its use case and technology coming from both ends and there will be new winners and many loosers. Martin’s blog on trillion dollar paradox is a leading indicator of this shift and the pendulum is swinging back to the middle between the old OEM/on-prem model and hyperscale cloud opex model to something new. I am guessing there are healthy mix of agreements and disagreements with this shift inside these hyperscalers. Just when you think they are too big to disappear, think again – every cycle leaves a consolidated entity eventually and new players emerge. To quote Clayton’s innovators dilemma

The third dimension [is] new value networks. These constitute either new customers who previously lacked the money or skills to buy and use the product, or different situations in which a product can be used — enabled by improvements in simplicity, portability, and product cost…We say that new-market disruptions compete with “nonconsumption” because new-market disruptive products are so much more affordable to own and simpler to use that they enable a whole new population of people to begin owning and using the product, and to do so in a more convenient setting.

Clayton Christensen

The questions to answer are –

  • What is the new infrastructure stack
  • What is the new business model
  • What are the new use case drivers
  • Who are these new customers?

Its fascinating to watch Cloudflare which reminded me of Akamai and Inktomi back in 1998 that led to 1U/ small server (@Sun ) – but to be only ignored by the $1M sales GMs. They took the company down with their margin over market share mentality as they were drunk by the fish falling into the boat during pre dot-com days. Strange 20 years later replace Inktomi or Akamai to Cloudflare and emergence of new category of consumption and deployment is the rising tide that is going to lift some new boats and leave some behind.

Fast forward, Its near impossible to predict which technology and companies will be torch bearers by 2030, but we can start with what we got. A few things to keep in mind.

  • Storage: Cloud was enabled by foundational under-pinnings in storage (GFS, S3) – distributed storage. 1980s systems as anchored by Sun Microsystems was successful because of NFS. So a common pattern with emergence of new stack is storage is an important and critical tier and those who got it right tend to rule the roost (Sun, AWS).
  • Network: Over the 2 epochs, networks (TCP/IP) dominated with Ethernet winning by Epoch 1, riding the Silicon curve in 2nd epoch and now we are at the cusp of breaking it. Its no longer a unitary world. The network was the computer in Epoch 2. It was even more important in the cloud era and its going to be much more critical in the coming epoch – but its not the same networking model as you would think though.
  • Compute: ILP (Instruction level parallelism) was the enabler in Epoch 2 via RISC. TLP (Thread level Parallelism) was enabler for Epoch 3 (Multi-core / threading) and thus poses the question – what is this wave going to present itself with. IPU, DPU, TPU are all names for coherent “datapath’ accelerators and that boat is sailing.
  • OS: Unix was the hardware abstraction layer that gave way to hypervisors that is giving way to K8s and I would say more like Serverless/Lambda i.e. you need both a HW abstraction and resource management layer.
  • Developer Abstraction:
  • Asset utilization: Turning cost line items in your P&L and turning into business is Amazon’s claim to fame. Asset light vs Asset heavy arguments (Fabless vs Fab arguments). So taking one use case paid for by a class of customers and leveraging that asset investment into another business is common practice and cloudflare is playing that game. This is an important element of disruptions and disruptors. Famously – Scott (Sun CEO) used to implore on some of us – investment of SPARC is already paid for – why not open source or give it away? Looking back 20 years later he was right, but not taken seriously or executed.
  • Marginal value of technology and value extraction: The cost of transistor going to near 0, led to the entire computing platform in the 1980s and 1990s. The cost of transporting a byte going to zero led to the internet and other upper level services and value extraction. The cost of storage (HDD) going to zero led to the cloud (Yahoo offering mail storage, Google Drive and S3). The next wave is going to be cost of transaction or inferencing going to zero will lead to the next set of disruptors and value extraction.

What is the new infrastructure Stack?

Network is the computer starts with the rack: This has been around for sometime, but finally a confluence of technologies and use cases i.e. a perfect storm is brewing. Like the beginning of every epoch, we are starting with ‘disaggregated systems’ (or perhaps modular) the computer and the computing platform starts with refactoring the computer as we know to simpler forms and assembled at upper layers of the stack to enable the evolution of lower level components. The physical rack as a computer is not new, but the software layers to make that viable is more needed and viable now than before. It starts with basic building blocks which are now SoP ( systems on a package ), CXL (Computer Express Link), (x)PU and emerging memory components. At the height of the moore exponential (epoch 2), the CMOS roadmaps drove higher perf, lower cost and lower power all three. Like the CAP theorem, now we can get only 2 out of 3 and to keep on the same Moore curve, SoP is one of the ways to achieve that same performance curve. Given more data parallel workloads, performance is achieved over more silicon area, power is distributed over a larger surface (density) and cost is managed by re-use of many sub-componets. Its expected by end of the decade the total cost and power as % of the single system dedicated to ‘control path’ execution will shrink as the ‘data path’ component will grow driven by ML/AI and other emerging verticals.

Systems on a package is enabler of disaggregation of the system. It starts out with addressing performance, cost and power, but when combined with emerging memory tiering, CXL at a rack level its a key enabler of disaggregating a single node, but aggregating at the rack level.

Three trends – Memory is getting disaggregated, computing is going more datapath and disaggregated to meet the scale requirements (Nvidia with NVlink and A100, Google with TPUs and IB), Network is going 2 tier to handle latency vs bandwidth di-chotomy and new frameworks are getting established to be the contract with the application developers. Lets explore each one in a little bit more detail. The physical disaggregation has to be aggregated at the software level i.e.

Memory vs Storage: While memory has been tracking Moore’s law, scale of data processing is much larger in scale and thus the emergence of Hadoop and other distributed computation embedded within storage model. But we are reaching a point with emergence of interconnects like CXL enabling coherent tiered memory with manageable latencies (1-3 NUMA hops) but significant increase in capacity. When combined with new workloads that demand not infinite storage space (consumer SaaS), but moderate size (some enterprise SaaS, edge, ML, ADN to name a few), but need much higher performance (latency) that we can re-imagine systems at the rack level with memory only access modes with associated failure domains handled entirely in software.

A visual of that is shown below.

By mid-decade one can imagine 2+ PB ‘memory’ semantic with tiers of memory enabled by CXL interconnect. Any page in such a locally distributed system (hyperconverged is the wrong word – so not using it) can be Sub uS. There is tremendous investment in PCIe, but the economic motivation for these new apps are the drivers for memory-centric computing model. At moderate scale handling (tail) latency is also of value. (Ref: Attack of the killer microseconds)

Coherent Co-Processors: Historically accelerators have been in their own non-coherent space (starting with GPUs ever since graphics has been a category) and ofcourse NICs and and FPGAs and some specialized HPC accelerators. Coherency and ‘share-able’ memory simplifies the programming model significantly and the great unlock is CXL as until now in the mainstream market, Intel did not open up their coherent interface to other providers for some very good technical and perhaps business reasons. With the new workloads (ML in particular) demanding more compute and the mapping of the ML algorithms to sparse linear algebra (over-generlized – but remains true for the past 3-5 years) reflects on the shift in both time and power spent in compute cycles.

The 1-2uS and 10uS are interesting numbers in the memory – storage visual above to demark the split between memory model and storage/IO model. That lends itself to two types of networks – in this case they are synergistic, one can subsume the other and each one solves the needs below the sync-barrier or RDMA barrier and the other provides reach, throughput and carries the 40 year investment. The beauty of CXL with as one of the operating modes is it can transport IP packets within the same basic payload and thus provide compatibility while providing unique capability for rack scale needs. A rack will be the new unit of compute, the new unit of aggregation at the software level and is a good demarcation point for the foreseeable future. 2030 is 10 years away and a lot can change in that 10 years (whole new categories of companies will emerge).

By 2030 these three shifts will enable us to create the ‘fungible computer’ atleast at the rack level.

Disaggregation of silicon into component parts but aggregating them at upper layers of the stack is the key shift that is enabled by new workloads that are more data parallel, re-imagined processing pipelines, memory centric computational models at specific cost and power plateaus. SoP, CXL, (x)PU (x is your favourite IPU, NPU, DPU, MPU ……the alphabet soup of ideas).

This image has an empty alt attribute; its file name is image-14.png

The natural value shift and sedimentation is on its march once again. Unix was the high value at the dawn of RISC and OS (Solaris, Linux) sedimented down in value over a 20 year period. Multi-core and Cloud reset that with hypervisor and scale management frameworks with natural sedimentation that has happened. Emerging ML-IR and layers that map, aggregate, schedule and resource manage these disaggregated components will be most value and will take its due 20 year cycle to sediment itself until the next wave of innovation happens.

All of this is technical direction – reality might not follow technology visions. Going back to the proposition stated up front – this epoch we are going to see new efforts that drive both technology (bottoms up) and use cases (tops down). To enable that a new delivery and consumption model is needed.

Enter the bottom feeder once more…The upstream push by a select few global colo providers when married with right open and/or proprietary management stacks and many elements of the system design above used by some of the new players (e.g. Cloudflare, crypto companies – decentralized model is a first class property) will ride the next Tsunami to challenge current incumbents with the new technology stack (edge native) and new business models. The business model is interesting and will require a blog of its own at the right time, but its starting to emerge. To remind ourselves, we shifted from Capex to Opex to now……..(leave it as an exercise or perhaps share your ideas with me (DM) – love to hear them…).

In Summary I am reminded of Jerry Chen’s framework Systems of engagement, Systems of Intelligence, Systems of record. As more things change its continues to fit within the overall framework.

Tier 1 SystemWebEngagementNLP,CV, ML ?
Tier 2 SystemApp TierIntelligenceServerless ?
Tier 3 SystemDatabaseRecordCrypto ?

The pendulum is starting to swing away from the current cloud to either the new middle (cloud exchanges) or perhaps continue to find the other end of its swing (edge). We will know for sure by 2025, long before 2030….

Back in 2001 led the charge of building basic blocks for the emerging distributed systems world. Alas, failed to see the use case driven approach that led to innovation elsewhere. Lessons learnt – See more of this from me this decade and if interested lets talk, engage and collaborate.

In closing to what prompted me to pen this was Amin’s note – While the ‘cloud’ has innovated a lot in hardware systems, most of it was derived from HW innovations that were kicked off in the prior epoch. This coming decade will be unlike both prior epochs as we don’t have more of moore and its highly distributed and perhaps decentralized but more hardware innovation is at play than the prior epoch (cloud era).

SoC to SoP

A reflection of moore’s law, personal history and coming Tsunami of Systems

This blog was prompted by Pat Gelsinger in his recent keynote talking about Systems on Package (SOP). That brought memories of Systems on a Chip (SoC) – back to Circa 1991. While this term is common in the lingua franca of chip nerds these days, it was not the case back in 1991. Perhaps one of the first SoCs on the planet was one in which I was lucky to be involved with that also helped bootstrap my professional life in Silicon and Systems. It was Microsparc-I (aka Tsunami) while at Sun and that had a few firsts. All CMOS, first SoC and had a TAB package. All-in-one.

Image result from
MicroSPARC – 1 in a TAB package (Circa 1991)

This chip was in the system. Good to know its in the computer history museum archives.


The label Sun 386i was a joke. Used to have Sun 386i platform and the joke was, this was faster and cheaper than any PC then.

MicroSPARC-1 on the board

That was the beginning of my semiconductor run in my professional life. It started with an ECL machine for SPARC we did back in 1987-1990, which got shelved eventually as it was going to be hard to manufacture and sustain volume production. Some of us without a job, were asked to work on a ‘low cost’ SPARC and work with TI on their 0.8uM CMOS process. While the rage then was BiCMOS (SuperSPARC for Sun) and Intel Pentium. It showed Intel despite being a tech and manufacturing power house, has made mistakes in the past, not just recently…We will come to that

The First SoC (Microprocessor SoC) had many firsts back in 1991.

  1. It was all CMOS (when BiCMOS and ECL were still ruling the roost
  2. It was all integrated (Integer Unit, Floating Point Unit, Icache, Dcache, MMU/TLBs, DRAM controller (SDRAM) and Sbus Controller (Pre PCI).
  3. It was in 0.8 uM CMOS (TI) and in a TAB package (as seen above)
  4. It was entirely Software driven tool chain – the physical layout was done with Mentor GDT tools – programmatically assemble the entire chip form basic standard cells and GDT P&R tools, Synopsys synthesis, Verilog. All SW driven Silicon – A first. There is a reference to it here. This led to the entire EDA industry rallying around the way Sun designed microprocessors and a whole sleuth of companies formed around that (Synopsys, Ambit, Pearl, CCT->Cadence and many many more).
  5. It was the beginning of the low cost workstation (and server) – approach $1000 and ‘fastest’ from a clock rage (MHz – when that was the primary performance driver in the early years).
  6. From 1991 through 2003 by the time I left Sun, was involved in 8 different generations/versions of SPARC chips and looking back, the Sun platform/Canvas not only helped me be part of the team that changed the microprocessor landscape, we changed the EDA industry and by late 1990s brought ODM manufacturing to traditional vertically integrated companies to completely outsource systems manufacturing.

A visual of the height of Moore’s law growth and the success I rode with that Tsunami (Co-incidently the first chip for me was named Tsunami). From 0.8 uM 2LM CMOS to 0.65uM 10 LM CMOS. From 50 MHz to 2 GHz, 0.8M xtors to 500M xtors.

1991-1994 – Microsparc – The first CMOS SoC Microprocessor that extended Sun workstations and servers to the ‘low end’ and drove technology leadership with EDA companies named above in driving many ‘SW driven VLSI’. We built the chip with the following philosophy ‘construct by correction’ vs ‘correct by construction’ – which was the prevailing methodology. In modern parlance of Cloud – its DevOps vs ITops.

1995-1998 – UltraSPARC II and IIe – With the introduction of 64 bit computing, we continued to lead both on architectural performance (IPC), semiconductor technology (lead CMOS @ TI along with IBM until Intel took control of that by 1998), Clock Rate and many system level innovation (at Scale Symmetric Multi-processor, glue-less SMP at low cost, Media instructions). This was the Ultra family of compute infrastructure that was the backbone of the internet until the dot-com bust (2001-2003)!

1998-2001 – UltraSPARC I & E series: Created 2 product families and both drove new business ($1B+) for Sun. The Telco/Compact PCI business went form $0 to $1B in no time, the extension of workstations and servers to $1K and glue-less SMP (4-way) for <$20K, another industry first. The beginning of NUMA issues and pre-cursor to the dawn of the multi-core era. UltraSPARC IIi (codenamed Jalapeno) was the highest lifetime volume CPU for the entire lifetime of SPARC.

Clock Rate (SPARC vs x86)

While clock rate is not a good representation of actual device technology improvements, its the best first order relative metric I can share here given the dated history. Suffice it to say as you can see, until 1998 we had good technology (CMOS) FOM improvements per node until 0.18uM (Intel coppertone), when Intel decided to boost its performance by 60% when the industry average was 30%. That was the beginning of the end on two fronts – Sun + TI having enough capital and skills to keep up with the tech treadmill against Intel (althought we introduced copper for metal ahead of Intel) and the decision to start shifting architecture from pure IPC and clock to multi-core threading. Recognizing this, I started the multi-core effort around Circa 1998, but it took another 5 years to bear fruit. I digress.

As a side note: Look at Intel technology improvement performance lately. I would never have in my wildest imaginations thought this would happen.

2001-2003 – Dawn of Multi-core and threading: While the results of these happened in 2001-2003, the seeds of this were sown in both multi-core in the form of dual core UltraSPARC IIe and eventually Niagara (UltraSPARC T Series).

The next 10 Year years is going to be as dramatic as the 1990s for completely different reasons at the system level. While Moore’s law has slowed down, the SoP is an important and critical technology shift to enable one to keep up the effective Moore curve. With Moore you got performance, power and cost at the same time./ We won’t get all three, but we can strive 2 out of 3 – i.e. Performance at constant cost or power.

SoP (Systems on Package) is an important milestone and glad to see Intel leading that and so is AMD and rest – but this can be a compelling way to construct the new system. In the next blog we will explore why the next 10 years is going be disruptive at the system level, but SoP like SoC and CMOS+Moore law was the Tsunami wave that raised a lot of boats including my career, many companies success and changed the industry and computing stack in a fundamental way.

I expect many firsts or changes or disruptions from design methodology to now customization by customer of various heterogenous silicon components (CPU, IPU, FPGA, memory elements and a lot more). Associated with that will be tools to assemble this, but also tools to make these look like one monolithic’ fungible computing element to the end user.

Virtualization to-date has been dominated by leveraging multi-core and improving utilization by spawning of many VMs that subdivide the machine into smaller chunks. New software layers either above or below the standard frameworks like Lambda (Server-less), PyTorch/TF (ML/AI) or Crypto will drive new ways to effectively use the dramatic increase in total silicon real estate including tiering of memory, scheduling code chunks to accelerators in coherent space (via CXL), new intra-rack and intra-node connectivity models via CXL and many more to come. Strap in for that ride/discussion. HW is getting more disaggregated from aggregation that started back in 1991 via SoC to now with SoP , Software will have to do the ‘aggregation’.

As I signoff, will share some more images from the 25 year anniversary of SPARC is captured here in this montage below.

$TSLA – Marching towards $10T by 2030……

First Trillionaire and 10 Trillion dollar company.

This is my 4th post on the topic of $TSLA and never thought I would do one in 2021. My predictions was a valuation of $1T by 2030. That will come and pass rather soon.

My first post on $TSLA was back in June, 2017 where the core value long term I thought was Chemistry (Battery) and Intelligence (Full Self Driving/Autonomy). That continues to be the case with Elon’s battery day (Sep’20) & Tesla Autonomy day on April 2019.

So why $10T? That seems to be even more ridiculous than the $1T. Since Feb’20 to now it has gone by 4x and $600B market cap. While there are lots of bears, there are lots of bulls as well for the TSLA case.

Bull Case #1: The bull case is presented by Ark Invest (Source: Ark Invest). Having crossed 500K in 2020 and total of 1M+ with 2 additional factories (Austin and Berlin) yet to come online, getting to 1-2M by 2025 is highly likely and approaching 5M might be difficult, but then Elon has beaten the odds and the market is expecting him to with the demand.

2020Example Bear Case 2025
Cars Sold (millions)0.55Example Bull Case

Average Selling Price (ASP)$50,000$45,000$36,000
Electric Vehicle Revenue (billions)$26$234$367
Insurance Revenue (billions)Not Disclosed$23$6
Human-Driven Ride-Hail Revenue (net, billions)$0$42$0
Autonomous Ride-Hail Revenue (net, billions)$0$0$327
Electric Vehicle Gross Margin (ex-credits)21%40%25%
Total Gross Margin21%43%50%
Total EBITDA Margin*14%31%30%
Enterprise Value/EBITDA1621418
Market Cap (billions)673$1,500$4,000
Share Price**$700$1,500$4,000
Free Cash Flow Yield0.4%5%4.2%
Ark Invest Projections

What’s interesting is TSLA has single-handely taken out the $35K- $100K market which the Germans dominate and Toyota tried hard to penetrate with incremental engineering and marketing. TSLA changed the game and will perhaps go as low as $25K but not lower is my guess. TSLA will license IP (Chemistry and Intelligence) and let others make the cars. The entire $35K to $100K is now ‘owned’ by TSLA and its going to be harder for most makers other than the BMW or Mercedes and they will be supported by ardent fans latched onto the brands. 2018 data for segmentation of the various categories is shown here.



As you can see from above, 62.8% of the market will be covered by Tesla with Model 3, Model Y, Model S, Cybertruck and perhaps upcoming new China sourced $25K model. That includes the SUV, Midsize, MPV, Pickup, Executive, Sport and Luxury segments. The total market size is 54M cars and if Tesla can get 20% of that category – which actually is possible (we are in winner take all world these days with Amazon, Google, Apple where its tech driven) relative to conventional wisdom of highly fragmented and splintered market for automobiles.

Bull Case #2: Its what I mentioned in the last year. One has to look at TSLA as a business of businesses. Expect in the next 5 years, either take the Alphabet (GOOG) route or via other routes (Spin-offs, M&As, SPACs ….) derivative businesses will emerge and stand on their own. To re-cap

  1. A car company
  2. A Battery company at planet scale
  3. An AI/ML company (machine vision in particular)
  4. An electric storage company
  5. An Electric Utility company (low value but at scale gets interesting)
  6. An energy distribution company
  7. A potential Cloud or computer company (if a book store turns into cloud computing, an autonomous car can have the right assets for becoming a cloud company)
  8. A big data/mapping/navigation company
  9. A carless car company (i.e. Uber/Lyft killer robotaxi)
  10. A machine vision driven robotics company
  11. and more to come….(more than letters in the Alphabet)

Elon himself quoted a version of this back in Oct’2020

Bull Case #3: Chemistry and Intelligence. Every tech category goes through vertical integration and horizontal stratification. I speculate Elon will build down to a $25K car and below that, he will ‘license’ IP (Chemistry and Intelligence or battery tech and autonomy tech) to get worldwide reach. It would not make sense to have factories all over the world for all geos – but a strong IP revenue model ($1-$2K/car) could be had and also enable new players in countries to become car companies – i.e. more local manufacturing and distribution. And not just limited to cars, for all kinds of transportation and perhaps energy sectors. From the chart above the remaining 35% of the segments (Compact, Sub-Compact, City-car) would belong in this category. Of the 86M cars sold in 2018 (I suspect its less in 2021), 30M cars would be in this category. If 20% of the manufacturers pay TSLA $1K-$2K – lets assume $1K – that is $6B of pure profits which is subsidized by the higher end. The TSLA brand will be more valuable and trusted over VW, Mercedes, BMW, Toyota by 2025 that most people would buy a car ‘Tesla powered’. At this rate of battery cost decline (see blow), these manufacturers who cannot afford R&D or manufacturing at scale would do well to buy it off TSLA. This is akin to INTC holding onto x86 and not having an IP model which let ARM into its turf. Imagine if INTC had both a vertically integrated model of CPUs and a licensing model for some components – AAPL would be in Intel’s camp and so would the big three hyperscalers. Pat Gelsinger is trying to get Intel back into that game in 2021 (which we will address in a different blog post). But if TSLA were to choose both models, a vertically integrated model for some categories and an IP or sub-component sale to other categories, they can cover the entire spectrum and make the brand even more ubiquitous and higher moat.

TSLA is handicapped relative to VW and Toyota on manufacturing scale and distribution reach. The aggressive ramp of manufacturing of the car and IP model will broaden the reach and create other businesses (Robo Taxi, Energy Storage/distribution, Cloud computing for AI and many more to come).

Bill Gates famouly said ” We overestimate what we can do in 2 years and underestimate what humanity will achieve in 10″. One has to do a version for Elon. He over-promises what’s coming in 2-3 years, but delivers on a 10 year vision. If you look back what he has said in 2011/2012 – and see what has been accomplished – its not that far off.

We will revisit this blog in 2025 if we have crossed the Ark Invest marker and if TSLA is barrelling past $3-$4T and march towards the first $10T company on the planet (or maybe a collection of companies).

wrong tool

You are finite. Zathras is finite. This is wrong tool.

----- Thinking Path -------

"knowledge speaks but wisdom listens" Jimi Hendrix.

The Daily Post

The Art and Craft of Blogging News

The latest news on and the WordPress community.

%d bloggers like this: