Many (core) Moore (Part III computing Epoch)

Back to the past – This is part III of four part story of the computing epochs as punctuated by Moore’s law in which Intel had its imprint for obvious reasons.

This is the 2003-2020 Era, in which multi-core, Open source, Virtualization, cloud infrastructure, social networks all blossomed…The onset of it was the end of MHz computing (Pentium IV) to multi-core and throughput computing.

It was also the beginning of my end in semiconductors for a brief period (20 years) until I decided its time to get back in the 2020s…That was punctuated by the first multi-core CPUs (mainstream) that Sun enabled – famously known as Niagara family and of-course the lesser know is UltraSPARC IIe which has an interesting contrast to Intel’s Banias (back to Pentium).

Some would call it Web2 era or Internet 2 era…The dot-com bubble which blew a number of companies in the prior era (OEM era), paved the way for new companies to emerge, thrive and establish the new stack. Notably at the infrastructure level, Moore was well ahead with first multi-core CPUs enabling virtualization and accelerated the decline of other processor companies (SPARC, MIPS), system OEMs as the market shifted from buying capital gear to cloud and opex.

Semiconductors investments started to go out of fashion as Intel dominated and other fabs (TI, National, Cypress, Philips, ST and many more withered) leaving Intel and TSMC with an also-ran Global foundries. In the same period, architectural consolidation around x86 happened along with Linux, ARM emerged as the alternative for. a new platform (mobile) via Apple. Looking back it was the value shifting from vertical integration (fab + processors) to SoC and thus IP (ARM) became dominant despite many attempts by processor companies to get into mobile.

Convergent to the emergence of iPhone/Apple/ARM, was AWS EC2 and S3 and thus the beginning of cloud with Opex as the new buying pattern instead of capex. This had significant implication as a decade later that very shift to commodity servers and opex comes full circle via Graviton and TPU with the cloud providers going vertical and investing in silicon. Intel’s lead on technology enabled x86 to dominate and when that lead in technology both slowed thanks to Moore’s law and TSMC, the shift towards vertical integration by the new system designers (Amazon, Google, Azure).

Simultaneously, emergence of ML as an emerging and significant workload that demanded new silicon types (GPU/TPU/MPU/DPU/xPU) and programming middleware (TensorFlow and PyTorch) broke the shackles from Unix/C/Linux to new frameworks and new hardware and software stack at the system level.

Nvidia happened to be at the right time at the right place (one can debate if GPU is the right architectural design), but certainly the new category or the tea leaves for the new system which is a CPU + xPU seeds were sown by mid 2010s….

All of the shift towards hyper scale distributed systems was fueled by Opensource. Some say that Amazon made all the money by reselling open source compute cycles. Quite true. Open source emerged and blossomed with the cloud and eventually the cloud would go vertical and raises the question – Is open source a viable investment strategy especially for infrastructure. The death of Sun microsystems was led by open source and. the purchase of RedHat by IBM formed the bookends of Open Source as the dominant investment thesis by the venture community. While open source is still viable and continues to thrive, it’s not front and center as a disruptor or primary investment thesis by end of this era as many more SaaS applications took the oxygen.

We started with 130nm 10 layers of metal with Intel taking the lead over TI and IBM and ended with 10nm from TSMC taking. the lead over Intel. How did that happen? Volumes have been written on Intel’s mis-steps, but clearly the investment into 3DXpoint and trying to innovate or bet with new materials and new devices to bridge the memory gap did not materialize and distracted. Good idea and important technology gap need, but picking the wrong material stack distracted.

The companies that emerged and changed the computing landscape were VMware, Open Source (many), Facebook, Apple (Mobile), China (as a geography ). The symbiotic relationship between VMware and Intel is best depicted in the chart below.

Single core to dual socket multi-core evolution…

On networking front The transition from 10Gbps to 100Gbps (10x) over the past decade is one of the biggest transformation of networking adoption of custom silicon design principles.

Above chart shows the flattening of the OEM business while the cloud made the pie larger. OEMs consolidated around big 6 (Dell, HPE, Cisco, Lenovo, NetApp, Arista) and rest withered.

GPU/xPU emerged as a category and along with resurgence in semiconductor investments (50+ startups with $2.5+B of venture dollars). Generalization of xPU with a dual heterogenous socket (CPU + xPU) is becoming the new building blocks for a system, thanks to CXL as well. The associated evolution and implications for the software layer was discussed here.

We conclude this era with the shift from 3-tier enterprise (‘modern mainframe’) stack that was serviced by OEMs to distrbuted systems as implemented by the cloud providers where use case (e-commerce, search, social) drove the system design whereas technology (Unix/C/RISC) drove the infrastructure design in the prior era (a note on that is coming…)

In summary – Moore’s law enabled multi-core, virtualization, distributed systems, but its slowdown of growth opened the gates for new systems innovation and thus new companies and new stack including significant headwinds for Intel.

Lets revisit some of the famous laws by famous people…

  1. Original Moore’s law – (cost, density)

Bill Joy’s change it to Performance Scaling. Certainly slowing down and shift in performance moved to throughput over latency. Needs update for ML/AI era, as it demands both latency and throughput.

2.Metcalfe’s Law – Still around. See the networking section.

3.Wrights Law (demand and volume) – https://ark-invest.com/articles/analyst-research/wrights-law-2/ – this predates moore’s law and now applies to many more domains – battery, biotech, solar etc…

4.Elon’s law – (A new one…) – Optimal alignment of atoms and how close to that is your error. We are approaching that.

5.Dennard Scaling – Power limits are being hit. Liquid cooling is coming down the cost curve rapidly.

Intelligrated ……

Ben Thompson of Stratchery in his recent blog on Intel Split prompted me to coin the word “Intelligrated“, which is a counterpoint to his thesis. – No, its not in the dictionary. Before we get to that, lets start with one topic he brings up as it is near and dear to me and many of my old fellow chip nerds from that time (1987-2003) which I would call as EDA 1.0 era.

EDA changed microprocessor roadmap starting Circa 1987 and continued through late 1990s: Ben references Pat’s paper on Intel EDA methodology which scaled design methodology to track moore’s law. Intel invested heavily in EDA internally as the industry was immature. Around the same time Sun Microsystems which built its business selling EDA/MCAD workstations was changing the industry EDA landscape (methodology and eco-system). [An aside: Would not be surprised if x86 CPUs till Pentium IV, were designed using Sun workstations]. Both companies had parallel but different approaches.

EDA 1.0: Intel vs Sun approach: Sun’s approach was industry tools and if it does not exist enable the industry eco-system to be built. It perhaps started in 1989 when a motley crew (25) of engineers (including yours truly) built the first CMOS SPARC SOC (Tsunami – referenced here) with no prior experience in custom VLSI. We all came out of a cancelled ECL SPARC microprocessor where none of us had done any custom VLSI design. The CAD approach was…

Necessity is the mother of invention. Sunil Joshi captured the EDA methodology then in the MicroSPARC HotChips presentation. 486 (Pat’s chip) had 1.2M transistors (100+ engineers perhaps) vs the more integrated MicroSPARC at 800K transistors that came 2 years later (same time as Pentium which had 200+ engineers) but a full SOC. AS noted, we had only 2 mask designers and every engineer had to write RTL, synthesize, verify, do their own P&R, timing analyze and two of us built the CAD flow so that it can be push button. Auto-generated standard cells were automatically P&R using compiler tools. That was not the norm for ‘custom VLSI’ circa 1991 for that scale.

That eventally got the name ‘Construct by correction vs Correct by construction’ and throughout the 1990s, this evolved, scaled and made the processor design competitive as well raised a lot of boats in the EDA industry that evenutally got Intel to adopt industry tools with a healthy mix of in-house tools. With no in-house EDA teams, we creatively partnered (Synopsys, Mentor), invested (Magma, mid 1990s), helped M&A (gateway design – verilog, Cooper and Chyan – Cadence), spin-out (Pearl timing analyzer to replace motive). At the same time IBM EDA tools which were superior to both Sun’s approach and Intel, but was locked up inside IBM until later in the decade, when it was too late.

In parallel, there was a great degree of systems innovation (SoCs, glue-less SMP, VIS, Ethernet, graphics accelerators, multi-core+threading) that was enabled by EDA 1.0 and CMOS custom VLSI by the industry at large with Sun leading the parade. Allow me to term it as ISM 1.0 (Integrated Systems).

Now IDM 1.0 is what made Intel successful to beat all the RISC vendors. We (Sun and compatriot RISC vendors) could beat Intel in architecture, design methodology and even engineering talent and in some cases like Sun which had OS and Platform built a strong moat. But, we could not beat Intel on manufacturing technology. Here’s a historical view of tech roadmap.

Tech History (litho nm only)

Caution: Given dated history, some data from 1990s could be incorrect – corrections please notify.

In a prior blog I have called out how Intel caught up with TI and IBM on process technology by 1998 (they were the manufacturing leader but not the technology leader w.r.t xtor FOM or metal litho until 1998). TI Process technlogists used to complain ‘At Intel design follows fab and you folks are asking fab to follow design’ as we demanded xTOR FOM and Metal litho more than Moore in the 1990s. By 1998 with coppertone, Intel raced ahead with both litho as well as xtor FOM (60% improvement in 180nm with coppertone to boost Pentium MhZ). So Intel was not xtor FOM leader in early 1990s, but they pulled ahead by 1-2 generations by late 1990s. It has been done by Intel. When they did go ahead is when IBM and TI became non-competitive (starting 1998) for high end microprocessors and the beginning of our end and my departure from microprocessor (unless I go to Intel). (Side note: Both TI and Intel bet on BiCMOS till 650nm). Unlikely history will repeat itself as the dynamics are different today with consolidation, but it has been done before by Intel.

Intel’s leadership with IDM 1.0 and co-opting ISM 1.0 architectural elements (by 2002 – multi-core, glue-less SMP, MMX, integrated DRAM controllers) into its processors made it difficult for fabless CPU companies to thrive despite having systems business to fund – which was not sufficient by 2002. Even 500K CPUs/year (Sun/SPARC) was not economically justifiable. IBM, SGI, HP and many more esp. dropped as cost of silicon design and tech went up. [Side note: I am not sure on a standalone basis Graviton is economically viable for Amazon – if 500K CPUs was not viable in 2000, 50K is certainly not viable in 2020 – but sure they can wrap other elements in the TCO stack to justify for a few generations – not sustainable over 3-5 generations). Regardless, 20 years later…

IDM 2.0 is a necessary first step and good that Pat & Intel are putting that focus back. But IDM 2.0 needs ISM 2.0 and the same chutzpah of late 1980s of design EDA innovation but this time perhaps own the ‘silicon systems platform SW stack’.

ISM 2.0 is ‘Integrated Systems 2.0’. If SoC were the outcome of Moore’s law in the 1990s, SoP (Systems on Package) is the new substrate to re-imagine systems as the platform is becoming heterogenous for compute (CPU, DPU, IPU, xPU), Memory (DRAM, 3D xpoint, CXL memories) and Networks (Ethernet and CXL). There will always be a CPU (x86 or ARM), but increasingly we will find a DPU/IPU/XPU in the system that sweeps all the new workloads. The IPU/DPU/GPU/XPU will increasingly be an SoP with diversity of silicon types to meet the diversity of workload needs. But it will need a common and coherent software model to be effective in enabling platforms and workloads including low level VMs or run-time with standard APIs (e.g. P4/Sonic, Pytorch+TF and others).

On economies of scale which killed the RISC revolution (amongst other reasons), I have written about SoC vs SoP in a different blog here, but its important to consider the diversity of customers from OEM platforms to cloud platforms, emerging telco/edge service providers, and emerging ML/AI or domain specific service providers that have a large TAM. Each one needs customization i.e. no more one size fits all platforms, its multiple chips into multiple SoPs to different ways to package and deliver to these new channels of delivery – OEM (Single System), Cloud (distributed systems) and emerging decentralized cloud. But to retain the economies of scale of chip manufacturing while delivering customized solutions to the old and new category of customers, we are moving towards disaggregate at the component level and aggregation at the platform software level.

Just to get a better sense of varied sales motion- Nvidia is a chip company, a box company (mellanox switches and DGX) as well as a cloud company (delivering ML/AI as a service w/Equinix).

This is more than Multi-core and Virtualization that happened in Circa 2003 (VMware). An entire new layer of software will be a key enabler for imagining the new ‘integrated systems’ and delivery of them. For lack of a better TLA , let me call it EDA 2.0. The design tooling to assemble these variety of SoP solutions requires new design tooling to enable customization of the ‘socket’. The old mantra was sell 1 chip SKU in millions. That is still true. But we will have multiple SoP SKUs using the same multi-million unit chip SKU. The design tooling to assemble these SoP has not only manufacturing programmability but in the field as there will be some FPGA elements in the SoP as well as low level resource management functionality.

Hijacking the OSI model as a metaphor to represent the infrastructure stack…..

7 layers of Infrastructure Stack
Bottom 3 layers form the emerging silicon systems plane

The homogenous monolithic CPU has now become heterogenous CPU+DPU/IPU/XPU/FPGA. Memory from being homogenous DDR to DDR + 3Dxpoint on DDR bus and CXL bus.

So ‘Integrated Systems’ is an assembly of these Integrated chips based on target segments and customers, but have to manage the three axes of flexibility vs performance vs cost. While silicon retains that one mask set for most target segments (economics), the customization at the packaging level enables the new ‘integrated system’ (the bottom three layers in above visual) . This new building block will become complex over time (hardware and software) thus value extraction (simplification of complexity results in value extraction) but requires capital and talent. Both exists ironically either with the chip vendor or with the cloud service provider, the two ends of the current value chain.

The pendulum is starting swing back from the vertically integrated chip company (1980-2000) to the era when OEMs owned Silicon (Sun, HP, IBM) or chip companies owned fab (Intel), to Horizontalization with the success of fabless chip (Nvidia, Broadcom……) + TSMC (2000-2020) to again vertical integration at sub-system level for certain segments or markets.

Back to Intel Split vs Intel Integrated. In this era, if there is any lesson learnt from the EDA 1.0 era, it would be smarter to do a build, buy and partner i.e. build the new tooling (EDA 2.0 and BIOS 2.0) in a smart combination of build, buy, partner and expand into invest, open source and build a moat around that eco-system that will be hard for competitive chip companies to compete. EDA 2.0 is not the same as EDA 1.0 – its both design tools pre-manufacturing and low level programming and resource management frameworks. Directionally some of it is captured here by Chris Lattner ( MLIR & CIRCT). We have a chicken and egg situation that to create that layer we will need new silicon constructs, but to get to the right silicon, you will need the new layer (akin to how Unix+C enabled RISC and RISC accelerated Unix and C eventually referenced here..)

Coming back to Intel v TSMC and splitting, TSMC is good at manufacturing – but has not (yet) built eco-systems and platforms at higher levels. Its their customers (fabless). Intel knows that and done that many times over. I make the case that Intel Integrated with IDM 2.0 and ISM 2.0 and being flexible in delivery SoC, SoP and even rack level products to emerging decentralized-edge cloud providers will the emerging opportunity.

Splitting the company will devalue the sum of parts than being whole in this case. While, the point of split (fab) driving accountability and customer focus and serviceability is there, there are perhaps other ways to achieve the same without a formal split while retain the value of integration of the parts.

Smart integration of the various assets and delivery. Creatively combining design engineering, EDA 2.0, BIOS 2.0 and linking it to SoP and SOC manufacturing including field level customization will be a huge value extraction play. The Apple vs Android model of market share vs margin share. IDM 2.0 with ISM 2.0 will get market share and margin share for dominant compute platforms.

A POV – Intel has do 3 things. IDM 2.0 (under way), ISM 2.0 (will elaborate that in a future note) and something else ( aligned with Intel’s manufacturing DNA) truly out of the box before 2030 when its going to be economically and physically hard to get more from semiconductors as we know. That has to be put in place between now and 2030…

————————————————————-

References to EDA at Sun…

Historical CPU/Process Node Data..

SoC to SoP

A reflection of moore’s law, personal history and coming Tsunami of Systems

This blog was prompted by Pat Gelsinger in his recent keynote talking about Systems on Package (SOP). That brought memories of Systems on a Chip (SoC) – back to Circa 1991. While this term is common in the lingua franca of chip nerds these days, it was not the case back in 1991. Perhaps one of the first SoCs on the planet was one in which I was lucky to be involved with that also helped bootstrap my professional life in Silicon and Systems. It was Microsparc-I (aka Tsunami) while at Sun and that had a few firsts. All CMOS, first SoC and had a TAB package. All-in-one.

Image result from https://www.computerhistory.org/collections/catalog/102626774
MicroSPARC – 1 in a TAB package (Circa 1991)

This chip was in the system. Good to know its in the computer history museum archives.

Image
SparcStation

The label Sun 386i was a joke. Used to have Sun 386i platform and the joke was, this was faster and cheaper than any PC then.

Image
SparcStation
Image
MicroSPARC-1 on the board

That was the beginning of my semiconductor run in my professional life. It started with an ECL machine for SPARC we did back in 1987-1990, which got shelved eventually as it was going to be hard to manufacture and sustain volume production. Some of us without a job, were asked to work on a ‘low cost’ SPARC and work with TI on their 0.8uM CMOS process. While the rage then was BiCMOS (SuperSPARC for Sun) and Intel Pentium. It showed Intel despite being a tech and manufacturing power house, has made mistakes in the past, not just recently…We will come to that

The First SoC (Microprocessor SoC) had many firsts back in 1991.

  1. It was all CMOS (when BiCMOS and ECL were still ruling the roost
  2. It was all integrated (Integer Unit, Floating Point Unit, Icache, Dcache, MMU/TLBs, DRAM controller (SDRAM) and Sbus Controller (Pre PCI).
  3. It was in 0.8 uM CMOS (TI) and in a TAB package (as seen above)
  4. It was entirely Software driven tool chain – the physical layout was done with Mentor GDT tools – programmatically assemble the entire chip form basic standard cells and GDT P&R tools, Synopsys synthesis, Verilog. All SW driven Silicon – A first. There is a reference to it here. This led to the entire EDA industry rallying around the way Sun designed microprocessors and a whole sleuth of companies formed around that (Synopsys, Ambit, Pearl, CCT->Cadence and many many more).
  5. It was the beginning of the low cost workstation (and server) – approach $1000 and ‘fastest’ from a clock rage (MHz – when that was the primary performance driver in the early years).
  6. From 1991 through 2003 by the time I left Sun, was involved in 8 different generations/versions of SPARC chips and looking back, the Sun platform/Canvas not only helped me be part of the team that changed the microprocessor landscape, we changed the EDA industry and by late 1990s brought ODM manufacturing to traditional vertically integrated companies to completely outsource systems manufacturing.

A visual of the height of Moore’s law growth and the success I rode with that Tsunami (Co-incidently the first chip for me was named Tsunami). From 0.8 uM 2LM CMOS to 0.65uM 10 LM CMOS. From 50 MHz to 2 GHz, 0.8M xtors to 500M xtors.

1991-1994 – Microsparc – The first CMOS SoC Microprocessor that extended Sun workstations and servers to the ‘low end’ and drove technology leadership with EDA companies named above in driving many ‘SW driven VLSI’. We built the chip with the following philosophy ‘construct by correction’ vs ‘correct by construction’ – which was the prevailing methodology. In modern parlance of Cloud – its DevOps vs ITops.

1995-1998 – UltraSPARC II and IIe – With the introduction of 64 bit computing, we continued to lead both on architectural performance (IPC), semiconductor technology (lead CMOS @ TI along with IBM until Intel took control of that by 1998), Clock Rate and many system level innovation (at Scale Symmetric Multi-processor, glue-less SMP at low cost, Media instructions). This was the Ultra family of compute infrastructure that was the backbone of the internet until the dot-com bust (2001-2003)!

1998-2001 – UltraSPARC I & E series: Created 2 product families and both drove new business ($1B+) for Sun. The Telco/Compact PCI business went form $0 to $1B in no time, the extension of workstations and servers to $1K and glue-less SMP (4-way) for <$20K, another industry first. The beginning of NUMA issues and pre-cursor to the dawn of the multi-core era. UltraSPARC IIi (codenamed Jalapeno) was the highest lifetime volume CPU for the entire lifetime of SPARC.

Clock Rate (SPARC vs x86)

While clock rate is not a good representation of actual device technology improvements, its the best first order relative metric I can share here given the dated history. Suffice it to say as you can see, until 1998 we had good technology (CMOS) FOM improvements per node until 0.18uM (Intel coppertone), when Intel decided to boost its performance by 60% when the industry average was 30%. That was the beginning of the end on two fronts – Sun + TI having enough capital and skills to keep up with the tech treadmill against Intel (althought we introduced copper for metal ahead of Intel) and the decision to start shifting architecture from pure IPC and clock to multi-core threading. Recognizing this, I started the multi-core effort around Circa 1998, but it took another 5 years to bear fruit. I digress.

As a side note: Look at Intel technology improvement performance lately. I would never have in my wildest imaginations thought this would happen.

2001-2003 – Dawn of Multi-core and threading: While the results of these happened in 2001-2003, the seeds of this were sown in both multi-core in the form of dual core UltraSPARC IIe and eventually Niagara (UltraSPARC T Series).

The next 10 Year years is going to be as dramatic as the 1990s for completely different reasons at the system level. While Moore’s law has slowed down, the SoP is an important and critical technology shift to enable one to keep up the effective Moore curve. With Moore you got performance, power and cost at the same time./ We won’t get all three, but we can strive 2 out of 3 – i.e. Performance at constant cost or power.

SoP (Systems on Package) is an important milestone and glad to see Intel leading that and so is AMD and rest – but this can be a compelling way to construct the new system. In the next blog we will explore why the next 10 years is going be disruptive at the system level, but SoP like SoC and CMOS+Moore law was the Tsunami wave that raised a lot of boats including my career, many companies success and changed the industry and computing stack in a fundamental way.

I expect many firsts or changes or disruptions from design methodology to now customization by customer of various heterogenous silicon components (CPU, IPU, FPGA, memory elements and a lot more). Associated with that will be tools to assemble this, but also tools to make these look like one monolithic’ fungible computing element to the end user.

Virtualization to-date has been dominated by leveraging multi-core and improving utilization by spawning of many VMs that subdivide the machine into smaller chunks. New software layers either above or below the standard frameworks like Lambda (Server-less), PyTorch/TF (ML/AI) or Crypto will drive new ways to effectively use the dramatic increase in total silicon real estate including tiering of memory, scheduling code chunks to accelerators in coherent space (via CXL), new intra-rack and intra-node connectivity models via CXL and many more to come. Strap in for that ride/discussion. HW is getting more disaggregated from aggregation that started back in 1991 via SoC to now with SoP , Software will have to do the ‘aggregation’.

As I signoff, will share some more images from the 25 year anniversary of SPARC is captured here in this montage below.

Open Systems to Open Source to Open Cloud.

“I’m all for sharing, but I recognize the truly great things may not come from that environment.” – Bill Joy (Sun Founder, BSD Unix hacker) commenting on opensource back in 2010.

In March at Google Next’17, Google officially called their cloud offering as ‘Open Cloud’. That prompted me to pen this note and reflect upon the history of Open (System, Source, Cloud) and what are the properties that make them successful.

A little know tidbit in history is that the early opensource effort of modern computing era (1980s to present) was perhaps Sun Microsystems. Back in 1982,  each shipment of the workstation was bundled with a tape of BSD – i.e. modern day version of open source distribution (BSD License?). In many ways much earlier to Linux, Open Source was initiated by Sun and in fact driven by Bill Joy. But Sun is associated with ‘Open Systems’ instead of Open Source. Had the AT&T lawyers not held Sun hostage, the trajectory of Open Source would be completely different i.e. Linux may look different. While Sun and Scott McNealy (open source)  tried to go back to its roots as an open source model, the 2nd attempt did not get rewarded with success (20 years later).

My view on success of any open source model requires the following 3 conditions to make it viable or perhaps as a sustainable, standalone business and distribution model.

  • Ubiquity: Everybody needs it i.e. its ubiquitous and large user base
  • Governance: Requires a ‘benevolent dictator’ to guide/shape and set the direction. Democracy is anarchy in this model.
  • Support: Well defined and  credible support model. Throwing over the wall will not work.

Back to Open Systems: Sun early in its life shifted to a marketing message of open systems rather effectively. Publish the APIs, interfaces and specs and compete on implementation. A powerful story telling that resonated with the customer base and to a large extent Sun was all about open systems. Sun used that to take on Apollo and effectively out market and outsell Apollo workstations. The Open Systems mantra was the biggest selling story for Sun through 1980s and 1990s.

In parallel around 1985, Richard Stallman pushed free software and evolution of that model led to the origins of Open Source as a distribution before it became a business model, starting with Linus and thus Linux.  Its ironic that 15+ years after the initial sale of Open systems, Open source via Linux came to impact Sun’s Unix (Solaris).

With Linux – The Open Source era was born (perhaps around 1994 with the first full release of Linux). A number of companies have been formed, notably RedHat that exploited and by far the largest and longest viable standalone open source company as well.

 

Slide1

The open systems in the modern era  perhaps began with Sun in 1982 and perhaps continued for 20 odd years with Open Source becoming a distribution and business model between 1995 and 2015 but will continue for another decade. 20 years later, we see the emergence of ‘open cloud’ or at-least the marketing term from Google.

In the past 20 years of the existence of Open Source, it has followed the classical bell curve of interest, adoption, hype, exuberance, disillusionment and beginning of decline. There is no hard data to assert Open Source is in decline, but its obvious based on simple analyses that with the emergence of the cloud (AWS, Azure and  Google), the consumption model of Open Source infrastructure software has changed. The big three in cloud have effectively killed the model as the consumption and distribution model of infrastructure technologies is rapidly changing. There are few open source products that are in vogue today that has reasonable traction, but struggling to find a viable standalone business model are elastic , SPARK (Data Bricks), Open Stack (Mirantis, SUSE, RHAT), Cassandra (Data Stax) ,  amongst others. Success requires all three conditions- Ubiquity, Governance and Support.

The Open Source model for infrastructure is effectively in decline when you talk to the venture community. While that was THE model until perhaps 2016, Open Source has been the ‘in thing’, the decline is accelerating with the emergence of public cloud consumption model.

Quoting Bill (circa 2010) – says a lot about the viability of open source model – “The Open Source theorem says that if you give away source code, innovation will occur. Certainly, Unix was done this way. With Netscape and Linux we’ve seen this phenomenon become even bigger. However, the corollary states that the innovation will occur elsewhere. No matter how many people you hire. So the only way to get close to the state of the art is to give the people who are going to be doing the innovative things the means to do it. That’s why we had built-in source code with Unix. Open source is tapping the energy that’s out there”.  The smart people now work at one of the big three (AWS, Azure and Google) and that is adding to the problems for both innovation and consumption of open source.

That brings to Open Cloud – what is it? While  Google announced they are the open cloud – what does that mean? Does that mean Google is going to open source all the technologies it uses in its cloud? Does that mean its going to expose the APIs and enable one to move any application from GCP to AWS or Azure seamlessly i.e .compete on the implementation? It certainly has done a few things. Open Sourced Kubernetes. It has opened up Tensor flow (ML framework). But the term open cloud is not clear. Certainly the marketing message of ‘open cloud’ is out there. Like Yin and Yang, for every successful ‘closed’ platform, there has been a successful ‘open’ platform.  If there is an open cloud, what is a closed cloud. The what and who needs to be defined and clarified in the coming years. From Google we have seen a number of ideas and technologies that eventually has ended up as open source projects. From AWS we see a number of services becoming de-facto standard (much like the Open Systems thesis – S3 to name one).

Kubernetes is the most recent ubiquitous open source software that seems to be well supported. Its still missing the ‘benevolent dictator’ – personality like Bill Joy or Richard Stallman or Linus Torvalds to drive its direction. Perhaps its ‘Google’ not a single person?  Using  the same criteria above  – Open Stack has the challenge of missing that ‘benevolent dictator’. Extending beyond Kubernetes, it will be interesting to see the evolution and adoption of containers+kubernetes vs the evolution of new computing frameworks like Lamda (functions etc). Is it time to consider an ‘open’ version of Lambda.

Regardless of all of these framework and API debates and open vs closed –  one observation:

Is Open Cloud really ‘Open Data’ as Data is the new oil that drives the whole new category of computing in the cloud. Algorithms and APIs will eventually open up. But Data can remain ‘closed’ and that remains a key value especially in the emerging ML/DL space.

Time will tell…..

On Oct 28th, IBM announced acquisition of RedHat. This marks the end of open source as we know today. Open Source will thrive, but not in the form of a large standalone business.

Time will tell…

OEM -> ODM -> OCM?

The OEM supply chain model has been in existence in multiple industries including computing for a long time. In the computing industry, Original Equipment Model (OEM) was perhaps kickstarted in a formal way in the 1980s  with the emergence of the PC and Intel with its processors. Prior to the PC and perhaps the Apple Mac, in the 70s, computing was delivered by vertically integrated companies. Notable ones are IBM, DEC, Prime, ICL (England), Wang, Sperry, Burroughs etc.  The OEM model led to the separation of the various layers in the delivery chain. Specifically, the chip (or processor) as a business came into full force and the separation of the processor, software (Microsoft) and the delivery of these two as an integrated platform led to the emergence of the OEM business.

Over the past 30 years, the OEM model was supplanted by the ODM (Original Design and Manufacturing) companies (like Quanta, Tyan) from Taiwan and China. That model was perhaps kickstarted in the late 1990s driven by Intel and emergence of the Taiwan/China manufacturing capabilities. This model exploded from 2000 onwards with the emergence of the cloud companies as the end customer.

The value in OEM model is the integration of either silicon (engineered by the OEM) and/or Software. Typically both Silicon and Software (as demonstrated by companies like Sun, Cisco, EMC, SGI and many more).  Over time, the consolidation of the silicon (for processors, it was Intel, for switches – it is Broadcom) combined with the emergence of open source software (Linux to start with, but perhaps a whole lot of other components that is found in apache.org) has eroded the key value proposition. After 30 years, with the consolidation in the industry (EMC/Dell as an example), has the OEM model run its course?

The value in the ODM model is in the manufacturing (cost-effective) and scale. To some extent the ODM model eroded one of the key capabilities of OEM given the consolidation of key semiconductor components (processors, switch/networking ASIC, storage controllers). But the inability of the ODM companies to move up the value chain (either own the silicon or the key SW IP), they have reached a plateau with nowhere to go but continue to manufacture at scale and do it cost effectively. The notion that an ODM can disrupt the OEMs has not happened. Sure, they have had an impact on many companies, but the 70/30 rule applies. The OEMs that have had strong band equity, have retained their position and the only the smaller OEMs have lost their business to ODMs.

Here’s a simple visual of the value chain.

Image result for OEM vs ODM

But is it time now for the emergence of a new model?    The OEM model is now facing a perfect storm. . One component of the perfect storm is the cloud as a business. The second disruptor is the emergence of Software Defined X (Compute, storage, Network) and in many cases tied to open-source . The third element of the disruption and the main disruption is the value shift to the component i.e. the semiconductor component. This I would term as the emergence of the OCM model.

oem_odm_ocm

OCM stands for Original Component Manufacturer as typified by companies like Intel, Broadcom but the more interesting ones are Seagate, Western Digital, Micron, Samsung.  The visual above show the three different supply chain models. The OEM model relies on the ODM as well to deliver the end system to the customer. The OCM model as typified by component companies (one good example is Mellanox – which sells chips and switches) leveraging either 3rd party or open source software to deliver system level solution to the same target customers that OEMs have addressed. While there are significant challenges in the evolution of OCM to have the same capability as the OEM, the OCM already have customers like the big cloud providers (AWS, Google, Microsoft). with a significant portion of their business (soon to be 40%) being protected by direct sell to these cloud providers which will grow while potentially  seeing reduction in profit margins. This has two effects for the OCMs. They have to find alternative higher margin (absolute margin) models as well as being able to challenge OEMs and ODMs as a good percentage of their business is already shifting to major cloud providers.

So, will these OCMs emerge? Back to the Wintel model of value shifting to component and software, but in the case, the OCM becomes the integrator of the SW along with the component to deliver a complete system. Unlike the ODM, the OCM has both financial and technical capabilities to move up the value chain.

Lets revisit this in 2020 and see if this happens.

Update – March 2019 – Nvidia to acquire Mellanox. Both companies designs and sells chips and makes boxes. Nvidia with DGX-2 (ML boxes) and Mellanox with switches..

https://nvidianews.nvidia.com/news/nvidia-to-acquire-mellanox-for-6-9-billion

Cloud and Fabs – different but similar

With all the buzz about cloud, multi-cloud and the ongoing consolidation in the cloud, I was reminded of a conversation with Ryan Floyd a couple of years back. Back then, we were comparing and contrasting the viability of cloud as a business.  The cloud was rapidly looking like the fab business, while Ryan felt different.  The conversation then was on the capital intensive nature of cloud as a business and the analogies with semiconductor fabs . There are some interesting similarities and differences. Lets contrast the two….

Fabs:  A view on Semiconductor/Fab business in the context of this thread:  It has taken 30 years of Moore’s law and consolidation to result in perhaps 3 companies that have the capital, capability and platform stack. In the case of logic – its Intel, TSMC and Global Foundries. In the case of Memory its Samsung, Toshiba, Micron and perhaps Hynix. 1985 was the year of the modern CMOS logic fabs (with Intel shifting to logic from memory). But what is interesting is where they top 3 are in terms of their approach.  Intel is vertically integrated (fabs + products) and trying to move upstream. TSMC has taken a horizontal ‘platform’ approach and Global Foundries has had a mix (processors – x86 and Power now) and still trying to find its way into the Horizontal vs Vertical integration chasm.

Cloud: By all accounts – the cloud business kicked off circa 2006 with AWS launch of web services. It has taken roughly 10 years to arrive at the same stage in the cloud business with the big three AWS, Google and Microsoft cosolidating the category .  All three have reached scale, capacity and technology stack, that its going to be hard to be created by others. Sure, there might be Tier-2 cloud (Ebay, Apple, SAP, Oracle, IBM etc) or geo specific (China) or compliance specific cloud operations,  but these three will drive consolidation and adoption. Its no longer just capital, its the technology stack as well.

What’s more interesting is to compare and contrast the top 3 semiconductor fabs and the 3 cloud companies in their approach and where do they go from here.

Lets start with the fabs and will focus on the logic side of the business as its the fountain head of all compute infrastructure. Their combined revenue is close to $100B ($22B in capital spend – 2016). Apart from being capital intensive, we now have a complex technology stack to deliver silicon (design rules, to libraries to IP, packaging and even software/tooling) to make effective use of the billions of transistors.   Similarly, now with cloud with the value is moving up the stack to the platform aspects. It is no longer logging into and renting compute or dump data into S3 via a simple get/put API. It is about how to use infrastructure at scale with Lambda, Functions or PaaS/Platform level features, APIs that is specific to that Cloud. You can now query on your S3 data. That is API/vendor lock-in.

The tooling required to deliver a chip product, while not specific to a fab, the optimizations are specific to the fab. Same with the cloud. The tooling might be generic (VM, containers for e.g.) or open source, but increasingly its proprietary to the cloud operations and that is the way it will be.

From a technology stack, competency and approach to market, Google looks more like Intel, AWS like TSMC and Microsoft like Global foundries (not in the sense of today’s market leadership).  Intel is vertically integrated and Google shows more of that.  Intel has deep technology investments and leads the sector and so does Google/GCP as contrasted with AWS or Microsoft by far. Every fabless semiconductor first use is TSMC foundry and same with any cloud based business (AWS first).  Cloud infrastructure unlike fabs were not the primary business to start with. All three leveraged their internal needs (Google for search, AWS for books/shopping and Microsoft for bing and its enterprise apps) leveraged their initial or primary business to fund the infrastructure.

Can one build your own cloud  i.e. “Real Men have fabs” – [TJ Rodgers – CEO of Cypress Semiconductor famously quoted]. While building and developing semiconductor is both capital intensive and needs deep technology and operational experience, cloud can be built with all the open source code that is available. While that is true,  despite the availability of plethora of open source tools, its the breadth and depth of tooling that is difficult to pull off. Sure, we can assemble one with CoreOS, Mesosphere, Openstack, KVM, Xen, Graphana, Kibana, Elastic search etc etc. Increasingly the stack becoming deep and broad, its going to be hard for any one company (including the big Tier-2 clouds named above) to pull it off at scale and gain operational efficiency. Sure, one could build a cloud in 1 or 2 locations. How to do you step and repeat and make it available around the globe and at scale. Intel and TSMC eventually excelled in operational efficiency.  Sure Dropbox might find it cheaper to build their own, but the value is shifting from just storing to how to make it available for compute. That level of integration will force the swing back to the big three.

Cloud Arbitrage: Multi-cloud vs Multi-fab: There is the rage today to go multi-cloud. How great it would be to move from AWS to GCP to Azure at the click of a button. Tried the same in the 1990s, while at Sun for processors. Wanted to have multi-fab. TI was the main fab and wanted TSMC, UMC and had engagements with AMD. The reality is, the platform stack has unique features that the solution will naturally gravitate to. Its more expensive to be in multi-cloud as a strategic direction than picking a cloud partner and drive deep integration. Yes, from a business continuity and leverage of spend, one would want multi-cloud. The reality is Netflix is with AWS and not with Microsoft or Google. They are doing fine as a business. Perhaps, you don’t have to pick the entire application stack to run in all the cloud. You are better off picking specific categories and LOBs can run them in specific cloud. That brings diversity and perhaps continuity of business as well as leverage the unique properties. For e.g. developing machine learning type apps, GCP is better than AWS. For  video streaming, maybe AWS is just fine (although google will tell you they have more POPs and capability due to YouTube for this).

Where do we go from here in the cloud? I most certainly will be wrong if you were to come and read this blog in 2020. But there are some truisms (not a complete list – but a start..)

Vertical Integration: The current $18B Cloud business will be >$100B between 2020-2025. That is a seismic shift that will impact all including businesses down the infrastructure stack (semiconductor companies for e.g.) as the big three will show signs of more vertical integration in their stack including having their own silicon. Intel likewise is trying to get more vertically integrated and microsoft is trying to find its way there. Maybe the exception here is going to be TSMC staying truly horizontal. The big three cloud operations are and will be more vertically integrated. There is also a culture or gene pool aspect of this.

Service v Services (or IaaS vs PaaS):  Despite all the technology chops at Intel, they have had mixed results  in the fab service business. While AWS has excelled in the IaaS part, its ability to build an compelling eco-system around its platform strategy will be tested. Likewise for Google, while traditionally it has strong in-house platform assets,  building a strong developer community (e.g.  Android) while delivering a great customer centric experience would be the challenge.  Microsoft by nature of having a strong enterprise apps footprint can and could get both the service and services right. Goes back to the gene pool or the service mentality vs services mentality. AWS has excelled in the service aspect  (TSMC excelled in the 90s) and leader in services as well. GCP (akin to Intel) has the platform strengths and has to supplant it with a modern era customer engagement model to gain market share. This  will require  cultural/organizational shift to be service oriented. Not just technology or business model.

Lock-in: Lock-in is reality. Have to be careful which lock and key you will want, but that will be real and go with eyes wide open.  Its now at the API level and moving up the stack.

Data gravity: Increasingly data will be differentiator. Each one of the big three will hoard data. There are three types broadly speaking (private, shared and global)  of data and applications will use all of them. This will be a gravitational pull to use a specific cloud for specific applications. IBM has started the trend to acquire data (weather.com). Expect the big three to acquire data that is needed by applications as part of their offering. This will be another form of lock-in.

Cloud native programming (Lambda, Functions, Tensorflow…..) A similar holy grail approach has been attempted in the silicon side as well with ESL and high level synthesis. What is interesting for comparison is, generic application development is approaching what the hardware guys have been doing for decades – event driven (sync, async) programming or approach data flow. This is an obvious trend that kicked off in 2016.  This is a chasm for the generic programmer to cross (despite the crumminess of verilog, its hard to program). This is where each one of the big three will take different approaches and differentiate and create the next lock-in.

In Summary, today (2017), AWS looks more like TSMC. GCP more like Intel and Microsoft somewhere in-between (GF?). We will revisit in 2020 to see what they look like – more similar or different?

wrong tool

You are finite. Zathras is finite. This is wrong tool.

----- Thinking Path -------

"knowledge speaks but wisdom listens" Jimi Hendrix.

The Daily Post

The Art and Craft of Blogging

WordPress.com News

The latest news on WordPress.com and the WordPress community.

%d bloggers like this: