1987 – 2017: SPARC Systems & Computing Epochs

 

30 Years of SPARC systems is this month in July’1987 when Sun 4/260 was launched.  A month before,  I started my professional career at Sun – to be exact June 15, 1987. 16+ years of my professional life was shaped by Sun, SPARC, Systems and more importantly the whole gamut of innovations that Sun did from chips to systems to operating systems to programming languages, covering the entire spectrum of computing architecture. I used to pinch myself for getting paid to work at Sun.  It was one great computer company that changed computing landscape.

 

Sun4_Launch

 

The SPARC story starts with Bill Joy without whom Sun would not be in existence (Bill was the fourth founder, though) as he basically drove re-inventing computing systems at Sun and thus the world at large.  Bill drove  technical direction of computing at Sun and initiated many efforts – Unix/Solaris, Programming languages, RISC to name a few.  David Patterson (UC Berkeley, now at Google) influenced the RISC direction. [David advised students who changed the computing industry and seems like he is involved with the next shift with TPUs @ Google – more later]. I call out Bill amongst the four (Andy Bechtolsheim, Scott Mcnealy, Vinod Khosla) in this context as without Bill, Sun would not have pulled the talent – he was basically a big black hole that sucked all talent across the country/globe. Without that talent, the innovations and the 30 year history of computing would not have been possible. A good architecture is one that lives for 30 years. This one did. Not sure about the next 30. More on this later. Back then, I dropped my PhD on AI (was getting disillusioned with then version of AI) for a Unix on my desktop and Bill Joy. Decision was that simple.

From historic accounts, the SPARC sytem initiative started in 1984. I joined Sun when the Sun 4/260 (Robert Garner was the lead) was going to be announced in July. It was  VME based backplane built both as a pedestal computer (12 VME boards) as well as rack mount system replaced then Sun 3/260 and Sun 3/280.  It housed the first SPARC chip (Sunrise) built out from gate arrays with Fujitsu.

 

This was an important product in the modern era of computing (198X-201X). 1985-87 was  the beginning of exploitation of Instruction level parallelism (ILP) with RISC implementations from Sun and MIPS. IBM/Power followed later, although it was incubated within IBM earlier than both . The guiding principles being, compilers can do better than humans and can generate code that is optimal and simpler with the  orthogonal instruction set. The raging debate then was “can compilers beat humans in generating code for all cases”?. It was settled with the dawn of the RISC era. This was the  era when C (Fortran, Pascal, ADA were other candidates) became the dominant programming language and compilers were growing leaps and bounds in capabilities. Steve Muchnik led the SPARC compilers, while Fred Chow did with MIPs. Recall the big debates about register windows (Bill even to-date argues about the decision on register windows) and code generation.  In brief it was the introduction of pipelining, orthogonal instruction sets and compilers (in some sense compilers were that era’s version of today’s rage in “machine learning” where machines started to outperform human ability to program and generate optimized code).

There were many categories enabled by Sun and the first SPARC system.

  1. The first chip was implemented in a gate array, which was more cost effective as well as faster TTD (Time to Design). The fabless semiconductor was born out of this gate array model and eventually exploited by many companies. A new category emerged in the semiconductor business.
  2. EDA industry starting with Synopsys and their design compiler were enabled and driven by Sun. Verilog as a language for hardware was formalized. It was an event driven evaluation model. Today’s reactive program is yesterday’s verilog (not really, but making a point here that HDL forever was event driven programming).
  3. Create an open eco-system of (effectively free) licensable architecture. It was followed by Opensource for hardware (OpenSPARC) which was a miserable failure.

The first system was followed by the pizza box (SPARCstation 1) using the Sunrise chip. Series of systems innovations were delivered with associated innovations in SPARC.

  1. 1987 Sun4/260 Sunrise – early RISC (gate array)
  2. 1989 SPARCstation 1 (Sunray) – Custom RISC
  3. 1991 Sun LX  (Tsunami) – First SoC
  4. 1992 SPARCstation 10 (Viking) – Desktop MP
  5. 1992 SPARCserver (Viking) – SMP servers
  6. 1995 UltraSPARC 1, Sunfire (Spitfire) – 64 bit, VIS, desktop to servers
  7. 1998 Starfire (Blackbird), Sparcstation 5 (Sabre) – Big SMP
  8. 2001 Serengeti (UltraSPARC III) – Bigger SMP
  9. 2002 Ultra 5 (Jalapeno) – Low Cost MP
  10. 2005 UltraSPARC T-1 (Niagara) – Chip Multi-threading
  11. 2007 UltraSPARC T-2 – Encryption co-processor
  12. 2011 SPARC T4
  13. 2013 SPARC T5, M5
  14. 2015 SPARC M7 (Tahoe)
  15. 2017 M8…

The systems innovations were driven by both SPARC and Solaris or SunOS back then. There are 2 key punctuations in the innovation and we have entered the third era in computing. The first two was led by Sun and I was lucky to be part of that innovation and be able to shape that as well.

1984-1987 was the dawn of the ILP era, which continued on for the next 15 years until thread level parallelism became the architectural thesis thanks to Java, internet and throughput computing. A few things that Sun did was very smart. That includes

  1. Took  a quick and fast approach to implement chip by adoption of gate arrays. This surprised IBM and perhaps MIPS w.r.t speed of execution. Just 2 engineers (Anant Agrawal and Masood Namjoo) did the integer unit. No Floating point. MIPS was meanwhile designing a custom chip
  2. It was immediately followed by Sunrise a full custom chip done with Cypress (for Integer unit) and TI (for floating point unit). Of all the different RISC chips that were designed around the same era, SPARC along with MIPS stood out (eventually Power).
  3. That was the one two punch enabled by Sun owning the architectural paradigm shift (C/Unix/RISC) as compute stack of then.

Industry’s first Pipelining, super scalar (more than 1 instruction/clock) became the drivers of performance. Sun innovated both at the processor level (with compilers) and system level with symmetric multi-processing with operating system to drive the ‘attack of the killer micros‘. A number of success and failures followed the initial RISC (Sunrise based platform).

  1. Suntan was an ECL sparc chip that was built, but not taken to market for two reasons. [Have an ECL board up in my attic]. The debate of CMOS vs ECL was ending with CMOS rapidly gaining speed-power ratio of ECL and more importantly the ability of Sun to continue with ECL would have drained the company relative to the value of the high end of the market. MIPS carried through and perhaps drained significant capital focus by doing so.
  2. SuperSPARC was the first super scalar process that came out in 1991 working with Xerox, Sun delivered the first glue-less SMP (M-bus and X-bus).
  3. 1995 was a 64 bit CPU  (MIPS beat to market – but was too soon) with integrated VIS ( media SIMD instructions)

After that the next big architectural shift was multi-core and threading. It was executed with mainstream Sparc but accelerated with the acquisition of Afara and its Niagara family of CPUs. If there is a ‘hall of fame’ for computer architects, a shout out goes to Les Kohn who led two key innovations – UltraSPARC (64bit, VIS) and UltraSPARC T-1 (multi-threading). Seeds of that shift was sown in 1998 and family of products exploiting multi-core and threading were brought to market starting in 2002/2003.

1998, in my view is the dawn of the second wave of computing in the modern era (1987-2017) in the industry and again Sun drove this transition. The move to web or cloud centric workloads, the emergence of Java as a programming language for enterprise and web applications enabled the shift to TLP – Thread Level Parallelism. In short, this is classical space-time tradeoff where clock rate had diminishing returns and shift to threading and multi-core began with the workload shift. Here again SPARC was the innovator with multi-core and multi-threading. The results of this shift started showing in systems around 2003  – roughly 15 years after the first introduction of SPARC with Sun 4/260.  In that 15 years, computing power grew by 300+ times and memory capacity grew by 128 times, roughly following Moore’s law.

While the first 15 years was the ILP era and the 2nd 15 years was about multi-core and threading (TLP). What is the third? We are dawning upon that era.  I phrase it as ‘MLP’- memory level parallelism.  Maybe not. But we know a few things now. The new computing era is more ‘data path’ oriented – be it GPU, FPGA or TPU – some form of high compute throughput matched by emerging ML/DL applications. A number of key issues have to be addressed.

 

Computing_trends

Every 30 years, technology, business and people have to re-invent themselves otherwise they stand to whither away. There is a pattern there.

There is a pattern here with SPARC as well. SPARC and SPARC based systems have reached 30 year life and it looks to be at the beginning of the end , while a new generation of processing is emerging.

Where do go from here? Defintely applications and technologies are the drivers. ML/DL is the obvious driver. Technologies range from memory, coherent ‘programable datapath accelerators’, programming models (event driven?), user space resource managers/schedulers and lots more. A few but key meta trends

  • The past 30 years – Hardware (Silicon and Systems) aggregated (for e.g. SMP) the resources while Software disaggregated (VMware). I believe the reverse will be true for the next 30 i.e. disaggregated hardware (e.g. accelerators or memory) but software will aggregate (for e.g. vertical NUMA, heterogenous schedulers).
  • Separation of control flow dominated code vs data path oriented code will happen (early signs with TPUs).

 

data_flow

  • Event driven (for e.g. reactive) programming models will become more prevalent. The ‘serverless’ trend will only accelerate this model as traditional programmers (procedural) have to be retrained to do event driven programming/coding (Hardware folks have been doing for decades).
  • We will build machines that will be hard (not in the sense of complexity – more in the sense of scale) for humans to program.

CMOS, RISC, Unix and C was the mantra in 1980s. Its going to be memory (some form of resistive memory structure) and re–thinking of the stack needs to happen. Unix is also 30+ years old.

Then_v_now

 

Just when you thought Moore’s law is slowing, the amalgamation of these emerging concepts  and ideas in a simple but clever system will rise to the next 30 years of innovations in computing.

Strap yourself for the ride…

 

 

30 (15 and 7) Year Technology Cycles

30_year_img

We all have heard the 7 year itch. Have you ever asked why its 7? Why it is not  10 or 4. Looking back,  I have had those 7 year itches.

Have you had the urge to leave your job or change your role significantly around 15 years? Think about this  very carefully or ask around!! I left my first job after 16 years

What about 30 years. Perhaps time to retire. Or seek a completely new profession?

Now lets look at businesses. Lets review two examples.

  1. Apple: From 1997 to 2007 – it was the Apple IIe.  7 years after founding, 1984 (MAC). 7 years later (1991 – Powerbook).  7 years later (Steve is back with iMAC). 30 years since founding (iPhone). New product, New Business model. Apple re-invented. (http://applemuseum.bott.org/sections/history.html)
  2. Sun: 1982-1989 – Motorola + BSD Unix. 1989-1996 – Solaris and SPARC. 1996-UltraSPARC 1 and Java. 2009 (27 years), acquired by Oracle.  http://devtome.com/doku.php?id=timeline_of_sun_microsystems_history

Why 7? Why 15?  Why 30? The 7 year itch feeds into the 15 year peak which feeds into 30 year retire/re-invention cycle. Since businesses are made of people, especially in the technology sector, business cycles have the same 7, 15 and 30 year cycles.  At 7, one becomes proficient in a domain (10,000+ hours) and change is in the air. At 15, you feel like you reach a peak. At 30 you become obsolete (retire) or re-invent.

At the core of these 30 year transitions, the DNA has to be relevant or one ‘retires’. The biological DNA that define our value proposition, businesses  do as well. Apple has the core DNA of designing eye candy products (materials +  UI). This has transcended to the desktop to the mobile.  IBM has the DNA of general computing business. That has unchanged over decades.  Sun had the DNA of Open-source and open systems. Intel has the core DNA of device engineering and manufacturing (Moore’s law).

The similarity has to do with human productive cycles. While this assertion is not statistically accurate, but there is enough data to suggest why there is a linkage between business cycles and human productivity cycles.

First lets visit some 30 year cycles…In most cases the business model changes – not just technology. Usually the incumbent sees the technology shift and can handle it. But they fail to see or react to the business model change.

Survivors:

  1. IBM: Strangely IBM has revived itself every 30 years. There was mainframe from 1950s to 1980s. Then there was PC and Global Services from 1980s to 2010s. Now its on a re-invention cycle with Watson and Enterprise Cloud. IBM has done this over its 100 year history repeatedly and perhaps the only one. But its facing that issue once more…Time will tell…
  2. Apple: From 1997 to 2007 – it was the Apple IIe.  7 years after founding, 1984 (MAC). 7 years later (1991 – Powerbook).  7 years later (Steve is back with iMAC). 30 years since founding (iPhone). New product, New Business model. But same DNA. 1992 was a critical year as we all know. 2014 is 7 years after the first iPhone. Another change cycle? The smartness with Apple was the move from shrink wrapped software to Appstore. New business model. Fantastic!  The iPhone which is 10 years old will reach its peak the next few years.  Apple will have to have its next big act before 2021 ?
  3. HP: Another survivor has gone through two  30 year cycles and in the middle of the 3rd. Remains to be seen how it will evolve

Recycled

  1. Sun: My Alma mater. Started in 1992 and dies or got gobbled up in 2010. Perhaps lives inside Oracle – not really. Again 28 years and died. Founding DNA = Opensource + Interactivity and big memory. Lost its way when Linux took over (Opensource – around 1998 – 15 year!!!). 
  2. DEC: 1957-1987; VAX9000 remember. That was the beginning of 30 year run. Got gobbled up and could not handle the next business transition
  3. Many other companies fall into this category (SGI…..)
Challenged
  1. Cisco: 1984-2014 (30 years). The networking category is being challenged in a similar way VMware came and altered the server market with Xeon. Commodity switches + SW. Other problems. Good news for Cisco in that it has lots of cash. Needs to find new growth categories that are significant. They are trying with a new CEO and moving upstream.
  2. Intel: Intel got out of the DRAM business in 1984. Now its facing challenges in its core having missed the whole handheld and tablet market. Again another 30 year transition cycle for Intel 
  3. Microsoft: 1997-2007. Windows Vista, Windows 7 Windows 8 – not a growth engine. Missed the phone. Failed to grasp the new business model. With a change in CEO and now the shift to the cloud, albeit late, but seems to have made or making the transition. Its a work-in-progress.
  4. Oracle: The 30 year old database market has run its course. The shift to in-memory and more importantly the shift to the cloud (AWS/Aurora and Google/Spanner) combined with the sedimentation of the plumbing layer (how many database companies are out there). The challenge for Oracle is not technology (which it can by acquiring the next viable business). The shift is in the move of the enterprise from licensed software to the cloud model. 
There is a pattern in this 30 year cycle. Typically a technology and a business model shift occurs and the companies. As you can see very few have done the 30 year transition. If you look deeper there seems to be a 15 year sub cycle and a 7 year sub cycle.
Guess what Google is past the 15 year cycle. Amazon founded in 1994, had its second act starting around 2008 (AWS).
1980s  was the golden era for  many tech companies
–Computing (Intel, Sun, Apple, Dell…)
–Networking (Cisco, 3COM)
–Storage (EMC, Netapp later)
Common Theme: Business model of selling integrated systems (OEM model)
30 years later that model has turned upside down… We are into the the next 30 year cycle with the cloud.
I touched upon this topic in one of the UC Berkely (2014) Aspire retreat(s). An image of a handwritten slide is shown below.  This time like in 1984, when DRAM gave way to logic (Intel), new memory technology will drive the new technology stack and thus new business opportunities. Maybe or atleast hopeful of it. This will be the topic for the next post…
30_Yr_Tech_re_invention
Meanwhile  – what comes after 7 15 30 ?
Coming back to 30 year technology cycles and individuals – is there a link?
Like to hear more about it from others..

Tesla – Chemistry & Intelligence

My 2030 Prediction – Tesla will be valued at $1 Trillion and will be valued for 2 key technologies that will have a significant barrier relative to others. Chemistry and Intelligence (Machine Learning). While Tesla has motors in its logo today and will continue to build motors and cars and other automotive and transportation (electric) vehicles, I see its future has less to do with electric motors or even solar as the key driver of the company core value/IP and thus its business proposition.

At the core of what Elon is building and ‘hard to do’ is battery chemistry and more interestingly is machine learning or intelligence.

Let me start with the simpler one first and the more obvious one – Chemistry.

It’s well known Tesla’s $5B factory in Nevada will be the driver of electric vehicles as we know and Elon as well as Tesla will continue to master the battery chemistry at scale for cars, trucks, home and anything anywhere that needs storage. While there are more efficient forms of energy storage, and the issue of supply of Lithium, cost of battery and efficiency growth of 5-7%/year are issues,  the sheer momentum and size of market with cars and trucks will enable Tesla to have to engineer better batteries (i.e. chemistry)  as it is at the core of value proposition of all electric vehicles. There are a couple of other side effects. First the oil glut will get worse and I expect it to start around 2018 and be in full swing by 2020.  The big shift in the entire value chain of the internal combustion eco-system will start around the same time. We will need fewer gas stations. We will need fewer auto service shops. With Tesla model of Apple style sales going direct, it will have an impact of the overall employment workforce as well.

The second more interesting one is intelligence or machine learning or AI as its called  today.  We are at the Cambrian explosion era for machine intelligence or machine learning. Clearly the autonomous car is the driver for it. What is unique about Elon’s and Tesla’s approach is that the car is the best vehicle to be on the exponential curve of this technology evolution. If you consider the parallels with biological evolution, survival required better audio and later visual perception. The advent of mobility ( one of the inflection points amongst many) of the biological organisms leading to rapid evolution of the visual cortex which holds highest percentage of the brain’s volumetric space. This evolutionary growth while driven by the need for survival was a big trial and error experiment over perhaps a few billion years and out of that sensory path all kinds of decision making processes were developed.

So the assertion here is that Autonomous car is going to drive machine and deep learning tools and techniques more so than any other platform. That will include chips, platform (HW), platform (SW) and more importantly algorithms and decision processes need to meet and perhaps exceed Level 5 standards or approach a human and perhaps exceed.

Tesla has a lead on this over others including Google as they have now >100K vehicles on the road. Iterating on this and soon by 2018 going upto 500K and soon after 1M.  That will be a key inflection point – analogous to the  million eyeballs or developers being key for a platform success in the 1980s or 1990s (windows , mac and solaris were #1, #2 and #3 and all others dropped off), 100M for the web/social era,  the same will hold for autonomous car driving the platform capability for ML/DL.  Now lets contrast Tesla’s approach to Google. Tesla is taking an iterative, real world approach.  Waymo i.e. Google is trying to build the best autonomous system first. We know the latter is harder and likelihood of market success is linked  to multiple factors over time (except for Steve Jobs who knows how to build the perfect platform before he releases one).

So the Tesla approach is solving the hard ML/DL problems using the car as the ‘driver’, while the google approach is taking a platform (ofcourse solving Google’s other applications) approach. What’s interesting is ‘intelligence’ exploded in the biological evolution with mobility linked to survival and thus mobility became a critical ‘driver’. There are many other drivers of ‘intelligence’ but for the near term – this is a key ‘killer’ use case to drive platform evolution with significant business value.

So if we take the evolution of battery chemistry of the next 14 years and the evolution of autonomous system including the level 5 decision making system which is mobile and making real time decisions at speed and criticality of the ‘goods’ it carries, I have to posit that the core value of Tesla in 2030 will the chemistry knowledge of the battery and the ML/DL platform that it will build. Behind the ML/DL platform will be silicon for processing in the car, the data (and energy) storage, the intelligence at the car level, the big data hub or data center that each car will provide and on and on and on…The Android vs iOS analogy applies. One took a platform approach to go to market and the other took a vertical integration approach.

So, while Google will drive the developer adoption of ML/DL, quietly but surely Mr Musk (and thus Tesla) could well emerge as the leader and in delivering the industry leading ML/DL platform. It won’t be restricted to just automobiles or transportation.  Like biological evolution, its a winner takes all game. Homosapiens killed all other forms.  Expect Tesla to drive this platform and that might well be its monetization model.

I thus speculate, It will be in the battery chemistry and the platform intelligence that will be at the core value of Tesla’s drive to a $1 Trillion in valuation by 2030.

Block Device is dead. Long Live the block device

Matias Bjorling  in the paper he co-wrote in 2012 calls for the necessary death of the block device interface in the linux kernel as we know it.    Flash was just emerging as a storage tier in enteprise and infrastructure IO stack back then.

Going back to the era of the creation of block device abstration in Unix (late 1970s – early 1980s), POSIX (IEEE standardization efforts) also published file and directory access APIs that are OS independent.  Around the same time 3.5″ HDD (Circa 1983) came into existence that enabled both the PC and the workstation form factors.  The operating system level abstraction and the IEEE standardization process enabled storage to be segregated as a set of well defined APIs resulting in storage as an industry – which over the past 30 years is more than $50B in size.

30 years later, the flash entered the enterprise or infrastructure segment. Around the same time a number of KV stores have emerged that have tried to map application use cases (NoSQL, databases, messaging to name a few) to flash and used variety of KV abstraction APIs to enhance the integration of Flash in the platform.  Around the same period, we have also seen object stores emerge and the cloud and S3 has emerged to be a default standard effectively as an object store, specifically to users of AWS.

With the emergence of the NVRAM (or 3D xpoint), the reasons outlined in the paper and the rationale are even more obvious. Until recently, I believed that a well defined and designed KV store is the new ‘block device’. While that remains true, without the standardization process, it will never have wide acceptance or become the new ‘block device’.  Similar to late 1970s, there are three things that are forming the storm clouds to posit the new block device. They are

  1. Emergence of 3D Xpoint or SCM as an interim tier between memory and flash which has both memory semantics as well as storage (or persistence) semantics
  2. Emergence of S3 as a dominant API for application programmers to leverage cloud based storage and in general S3 as a dominant API for today’s programmer.
  3. The need for a POSIX like OS independent (today you will call it cloud independent) ‘KV store’ that addresses both the new stack (SCM + Flash) as well as handles latency and throughput attributes that these new media offer that would be otherwise limiting with the old block interface

Its obvious that the new storage API will be some variant of a KV store.

Its obvious that the new storage will be ‘memory centric’ in the sense that it has to comprehend the SCM and Flash Tier as the primary storage tiers and thus adhere to latency and throughput as well as failure mode requirements.

If the new interface is necssarily KV like, why not make  ‘S3 compatible’ interface for the emerging new persistence tier (SCM and Flash).  Standardization is key and why not co-opt the ever popular S3 API?

AWS has a unique opportunity to re-imagine the new memory stack (SCM, flash) and propose a ‘high performance’ S3 compatible API and offer it as the new ‘POSIX’ standard.

 

 

Leadership Principles

This is my personal diary of leadership attributes that I have learnt and accumulated over my professional career. I thought I would share this and be as a reference of who I am or might want to be. On many of these attributes, I have real experiences of being tested and tried and still believe while I have gained some muscle memory on these, many are still work in progress.

This is also a summary from 2 letters I had to write to my Son when he finished middle school and finishing high school. I had to pen some ‘lesson learnt from parent to child’. I find the list is also a list of key attributes I care as a leader and wanted to pass it along to my family (immediate) and if the opportunity presents itself with my friends as well.

My role models or people from whom I have learnt important lessons that I remind myself constantly  are  Scott McNealy, Steve Jobs, Colin Powell, Ronald Reagan, Jeff Bezos, Ed Zander, Elon Musk, Vinod Khosla, George Pavlov. With that, what are the key attributes?

Conviction: A strong belief in something leads to conviction that takes care of fear, worries of failure and even failure. Steve Jobs is a role model.

Change Agent:  Leaders tend to change status quo, By that, I mean if you want to be a leader, expect change and change.  Scott McNealy was a change agent and he is/was a  role model for this.

Responsibility: Being a leader means being responsible. With assumption of responsibility, a lot of other skills can be enhanced or acquired. Almost all of them have this key attribute.

communication. Observe everything. Take all the inputs and be able to synthesize. All of the names above are good at it. Ronald Reagan, Colin Power to start with.

Taking Risks: As the famous saying goes, there is no risk, then there is no reward. To me, risk is not just about reward, but its about experimentation and learning early enough. Its important to fail early and soon than later which is harder. So take risks. The person who epitomizes this for me is Vinod Khosla.

Perseverance: Conviction + perseverance is what takes one to be successful (Just make sure you have people to tell you about your blind slides and listen to them).  The person who epitomizes this for me are almost everybody but two in particular – Elon Musk & George Pavlov – a colleague at a former company.

Think big, but act small: Its actually TBASIF – Think Big, Act Small, Invest Frugally.  Dream the big things. But learn to find the simple, easy starting or insertion point. The person who epitomizes this attribute are Elon Musk and Vinod Khosla.

Sell, Sell, Sell: An entrepreneur is selling all day long to everybody. Selling to his friends, colleagues, investors, board members, employees and his customers and most importantly to himself as well. We all need to be sold or reminded as well. Most of the above names are – notable ones being Jeff Bezos, Elon Musk and Steve Jobs.

Complainer vs solver: If you complain all the time, you will never be able to lead. If you lead a group of people and are constantly solving their problems, you are not a leader. Tell them to come up with solutions not problems. Become more of a facilitator. Perhaps Colin Powell.

Believe in your gut: As Ed Zander the former COO of Sun Microsystems said – life is all about refining your gut. I think nature has built our chemistry in a way that the complex pro/con analyses results in a simple answer and that answer is reflected in driving some hormones in your stomach (perhaps from the survival instincts in early years of evolution).  When you go with your gut, you feel good. So when you come to cross roads in your decisions, ask you gut and check infrequently how you did against your gut.

Decisions: The first decision is the right decision. The second decision is worst decision.  The third decision is no decision – Scott McNealy.

OEM -> ODM -> OCM?

The OEM supply chain model has been in existence in multiple industries including computing for a long time. In the computing industry, Original Equipment Model (OEM) was perhaps kickstarted in a formal way in the 1980s  with the emergence of the PC and Intel with its processors. Prior to the PC and perhaps the Apple Mac, in the 70s, computing was delivered by vertically integrated companies. Notable ones are IBM, DEC, Prime, ICL (England), Wang, Sperry, Burroughs etc.  The OEM model led to the separation of the various layers in the delivery chain. Specifically, the chip (or processor) as a business came into full force and the separation of the processor, software (Microsoft) and the delivery of these two as an integrated platform led to the emergence of the OEM business.

Over the past 30 years, the OEM model was supplanted by the ODM (Original Design and Manufacturing) companies (like Quanta, Tyan) from Taiwan and China. That model was perhaps kickstarted in the late 1990s driven by Intel and emergence of the Taiwan/China manufacturing capabilities. This model exploded from 2000 onwards with the emergence of the cloud companies as the end customer.

The value in OEM model is the integration of either silicon (engineered by the OEM) and/or Software. Typically both Silicon and Software (as demonstrated by companies like Sun, Cisco, EMC, SGI and many more).  Over time, the consolidation of the silicon (for processors, it was Intel, for switches – it is Broadcom) combined with the emergence of open source software (Linux to start with, but perhaps a whole lot of other components that is found in apache.org) has eroded the key value proposition. After 30 years, with the consolidation in the industry (EMC/Dell as an example), has the OEM model run its course?

The value in the ODM model is in the manufacturing (cost-effective) and scale. To some extent the ODM model eroded one of the key capabilities of OEM given the consolidation of key semiconductor components (processors, switch/networking ASIC, storage controllers). But the inability of the ODM companies to move up the value chain (either own the silicon or the key SW IP), they have reached a plateau with nowhere to go but continue to manufacture at scale and do it cost effectively. The notion that an ODM can disrupt the OEMs has not happened. Sure, they have had an impact on many companies, but the 70/30 rule applies. The OEMs that have had strong band equity, have retained their position and the only the smaller OEMs have lost their business to ODMs.

Here’s a simple visual of the value chain.

Image result for OEM vs ODM

 

But is it time now for the emergence of a new model?    The OEM model is now facing a perfect storm. . One component of the perfect storm is the cloud as a business. The second disruptor is the emergence of Software Defined X (Compute, storage, Network) and in many cases tied to open-source . The third element of the disruption and the main disruption is the value shift to the component i.e. the semiconductor component. This I would term as the emergence of the OCM model.

oem_odm_ocm

OCM standards for Original Component Manufacturer as typified by companies like Intel, Broadcom but more interesting ones are Seagate, Western Digital, Micron, Samsung.  The visual above show the three different supply chain models. The OEM model relies on the ODM as well to deliver the end system to the customer. The OCM model as typified by component companies (one good example is Mellanox – which sells chips and switches) leveraging either 3rd party or open source software to deliver system level solution to the same target customers that OEMs have addressed. While there are significant challenges in the evolution of OCM to have the same capability as the OEM, the OCM already have direction to the big cloud providers (AWS, Google, Microsoft), a significant portion of their business (soon to be 40%) is both protected by direct sell to these cloud providers who will grow while potentially  seeing reduction in profit margins. This has two effects for the OCMs. They have to find alternative higher margin (absolute margin) models as well as being able to challenge OEMs and ODMs as a good percentage of their business is already shifting to major cloud providers.

So, will these OCMs emerge? Back to the Wintel model of value shifting to component and software, but in the case, the OCM becomes the integrator of the SW along with the component to deliver a complete system. Unlike the ODM, the OCM has both financial and technical capabilities to move up the value chain.

Lets revisit this in 2020 and see if this happens.

Cloud and Fabs – different but similar

With all the buzz about cloud, multi-cloud and the ongoing consolidation in the cloud, I was reminded of a conversation with Ryan Floyd a couple of years back. Back then, we were comparing and contrasting the viability of cloud as a business.  The cloud was rapidly looking like the fab business, while Ryan felt different.  The conversation then was on the capital intensive nature of cloud as a business and the analogies with semiconductor fabs . There are some interesting similarities and differences. Lets contrast the two….

Fabs:  A view on Semiconductor/Fab business in the context of this thread:  It has taken 30 years of Moore’s law and consolidation to result in perhaps 3 companies that have the capital, capability and platform stack. In the case of logic – its Intel, TSMC and Global Foundries. In the case of Memory its Samsung, Toshiba, Micron and perhaps Hynix. 1985 was the year of the modern CMOS logic fabs (with Intel shifting to logic from memory). But what is interesting is where they top 3 are in terms of their approach.  Intel is vertically integrated (fabs + products) and trying to move upstream. TSMC has taken a horizontal ‘platform’ approach and Global Foundries has had a mix (processors – x86 and Power now) and still trying to find its way into the Horizontal vs Vertical integration chasm.

Cloud: By all accounts – the cloud business kicked off circa 2006 with AWS launch of web services. It has taken roughly 10 years to arrive at the same stage in the cloud business with the big three AWS, Google and Microsoft cosolidating the category .  All three have reached scale, capacity and technology stack, that its going to be hard to be created by others. Sure, there might be Tier-2 cloud (Ebay, Apple, SAP, Oracle, IBM etc) or geo specific (China) or compliance specific cloud operations,  but these three will drive consolidation and adoption. Its no longer just capital, its the technology stack as well.

What’s more interesting is to compare and contrast the top 3 semiconductor fabs and the 3 cloud companies in their approach and where do they go from here.

Lets start with the fabs and will focus on the logic side of the business as its the fountain head of all compute infrastructure. Their combined revenue is close to $100B ($22B in capital spend – 2016). Apart from being capital intensive, we now have a complex technology stack to deliver silicon (design rules, to libraries to IP, packaging and even software/tooling) to make effective use of the billions of transistors.   Similarly, now with cloud with the value is moving up the stack to the platform aspects. It is no longer logging into and renting compute or dump data into S3 via a simple get/put API. It is about how to use infrastructure at scale with Lambda, Functions or PaaS/Platform level features, APIs that is specific to that Cloud. You can now query on your S3 data. That is API/vendor lock-in.

The tooling required to deliver a chip product, while not specific to a fab, the optimizations are specific to the fab. Same with the cloud. The tooling might be generic (VM, containers for e.g.) or open source, but increasingly its proprietary to the cloud operations and that is the way it will be.

From a technology stack, competency and approach to market, Google looks more like Intel, AWS like TSMC and Microsoft like Global foundries (not in the sense of today’s market leadership).  Intel is vertically integrated and Google shows more of that.  Intel has deep technology investments and leads the sector and so does Google/GCP as contrasted with AWS or Microsoft by far. Every fabless semiconductor first use is TSMC foundry and same with any cloud based business (AWS first).  Cloud infrastructure unlike fabs were not the primary business to start with. All three leveraged their internal needs (Google for search, AWS for books/shopping and Microsoft for bing and its enterprise apps) leveraged their initial or primary business to fund the infrastructure.

Can one build your own cloud  i.e. “Real Men have fabs” – [TJ Rodgers – CEO of Cypress Semiconductor famously quoted]. While building and developing semiconductor is both capital intensive and needs deep technology and operational experience, cloud can be built with all the open source code that is available. While that is true,  despite the availability of plethora of open source tools, its the breadth and depth of tooling that is difficult to pull off. Sure, we can assemble one with CoreOS, Mesosphere, Openstack, KVM, Xen, Graphana, Kibana, Elastic search etc etc. Increasingly the stack becoming deep and broad, its going to be hard for any one company (including the big Tier-2 clouds named above) to pull it off at scale and gain operational efficiency. Sure, one could build a cloud in 1 or 2 locations. How to do you step and repeat and make it available around the globe and at scale. Intel and TSMC eventually excelled in operational efficiency.  Sure Dropbox might find it cheaper to build their own, but the value is shifting from just storing to how to make it available for compute. That level of integration will force the swing back to the big three.

Cloud Arbitrage: Multi-cloud vs Multi-fab: There is the rage today to go multi-cloud. How great it would be to move from AWS to GCP to Azure at the click of a button. Tried the same in the 1990s, while at Sun for processors. Wanted to have multi-fab. TI was the main fab and wanted TSMC, UMC and had engagements with AMD. The reality is, the platform stack has unique features that the solution will naturally gravitate to. Its more expensive to be in multi-cloud as a strategic direction than picking a cloud partner and drive deep integration. Yes, from a business continuity and leverage of spend, one would want multi-cloud. The reality is Netflix is with AWS and not with Microsoft or Google. They are doing fine as a business. Perhaps, you don’t have to pick the entire application stack to run in all the cloud. You are better off picking specific categories and LOBs can run them in specific cloud. That brings diversity and perhaps continuity of business as well as leverage the unique properties. For e.g. developing machine learning type apps, GCP is better than AWS. For  video streaming, maybe AWS is just fine (although google will tell you they have more POPs and capability due to YouTube for this).

Where do we go from here in the cloud? I most certainly will be wrong if you were to come and read this blog in 2020. But there are some truisms (not a complete list – but a start..)

Vertical Integration: The current $18B Cloud business will be >$100B between 2020-2025. That is a seismic shift that will impact all including businesses down the infrastructure stack (semiconductor companies for e.g.) as the big three will show signs of more vertical integration in their stack including having their own silicon. Intel likewise is trying to get more vertically integrated and microsoft is trying to find its way there. Maybe the exception here is going to be TSMC staying truly horizontal. The big three cloud operations are and will be more vertically integrated. There is also a culture or gene pool aspect of this.

Service v Services (or IaaS vs PaaS):  Despite all the technology chops at Intel, they have had mixed results  in the fab service business. While AWS has excelled in the IaaS part, its ability to build an compelling eco-system around its platform strategy will be tested. Likewise for Google, while traditionally it has strong in-house platform assets,  building a strong developer community (e.g.  Android) while delivering a great customer centric experience would be the challenge.  Microsoft by nature of having a strong enterprise apps footprint can and could get both the service and services right. Goes back to the gene pool or the service mentality vs services mentality. AWS has excelled in the service aspect  (TSMC excelled in the 90s) and leader in services as well. GCP (akin to Intel) has the platform strengths and has to supplant it with a modern era customer engagement model to gain market share. This  will require  cultural/organizational shift to be service oriented. Not just technology or business model.

Lock-in: Lock-in is reality. Have to be careful which lock and key you will want, but that will be real and go with eyes wide open.  Its now at the API level and moving up the stack.

Data gravity: Increasingly data will be differentiator. Each one of the big three will hoard data. There are three types broadly speaking (private, shared and global)  of data and applications will use all of them. This will be a gravitational pull to use a specific cloud for specific applications. IBM has started the trend to acquire data (weather.com). Expect the big three to acquire data that is needed by applications as part of their offering. This will be another form of lock-in.

Cloud native programming (Lambda, Functions, Tensorflow…..) A similar holy grail approach has been attempted in the silicon side as well with ESL and high level synthesis. What is interesting for comparison is, generic application development is approaching what the hardware guys have been doing for decades – event driven (sync, async) programming or approach data flow. This is an obvious trend that kicked off in 2016.  This is a chasm for the generic programmer to cross (despite the crumminess of verilog, its hard to program). This is where each one of the big three will take different approaches and differentiate and create the next lock-in.

In Summary, today (2017), AWS looks more like TSMC. GCP more like Intel and Microsoft somewhere in-between (GF?). We will revisit in 2020 to see what they look like – more similar or different?