Achieving Better Platform Quality
by Mark W. Olson
Value added developers are highly dependent upon the quality and security of the platform on which they build. As a developer of an automotive IoT product, I am concerned about the potential liability I may share with the platform vendor should our devices get hacked and damages are incurred. Platform vendors do not publish objective metrics about the quality of their software, making it impossible to gauge baseline quality and subsequent improvements. By definition, security holes are bugs, and many product bugs are security holes. Conversely, testing only shows the presence of bugs, but not the absence of bugs. As a result, it can be difficult to prove the quality level of any product. Based upon the number of updates and the many reported successful hacking incidents, there is likely significant room for the improvement of quality and security of today’s software products.
The US semiconductor industry survived a near death experience in the late 1970s when Japanese competitors used much higher product quality to differentiate their products . While US companies were marketing their products’ functionality and not disclosing quality levels, Japanese companies were actively promoting quality advantages. The peak threat occurred in the early 1980s when HP published the fact that Japanese Dynamic Random Access Memory (DRAM) vendors had defect rates of 160 PPM (Parts Per Million) compared to average US vendor defect rates of 780PPM, a difference of almost 5X . It was readily apparent what the commercial value was to customers once it was objectively measured and communicated.
By US chip manufacturers adapting the Total Quality Control (TQC) concept used by their Japanese competitors, American companies were able to eliminate the quality gap by the end of the 1980s. The US companies changed their culture to ensure that quality optimization permeated every step of the product life cycle. Semiconductor volumes had grown to many millions of units, as chip technology was pervasive in many aspects of life. Customers’ quality expectations rose dramatically as well. Consequently, the potential cost of a semiconductor quality problem was astronomical. Intel’s infamous Pentium Flaw in 1994 cost the company over $400M . And in 2011, Intel announced a bug in their Sandy Bridge graphics chip that cost the company $1B, $300M in lost sales and $700M in repairs . “A billion here, a billion there, pretty soon you’re talking real money .”
Due to the enormous costs of semiconductor quality problems, extreme measures were institutionalized into the fibers of the culture in order to minimize the risk of such problems. Methodologies such as module reuse, extensive testing at all module levels and the use of modern statistical methods have driven the high levels of quality in today’s semiconductors.
The software industry of today is reminiscent of the semiconductor industry of the 1970s. Measurement data of software product quality is not publicly available to my knowledge, strongly implying secrecy. There is also a dearth of data regarding the number and frequency of discovered security holes. Currently, software products are differentiated only by features, not by quality. We all have experienced a constant stream of new software releases, many of which contain numerous bug or security fixes. It is readily apparent that the software industry prefers not to differentiate based on product quality, just as the semiconductor industry did back in the ‘70s. Their reluctance is likely due to fear that it will require significant work, impact time-to-market and require increased discipline to move to a TQC model.
Lack of published software quality metrics cause me to lump all software into the same low quality bucket. Since I have little experience with open source software platforms such as Linux, I can only assume it is equally bad.
Software developers and their customers have been lulled into accepting less than optimal software quality, including many security holes. The only direct cost of software bug fixes for developers (including security patches) is some cursory testing and an automated release of an update. Compared to the cost of a hardware remedy action, this cost is minuscule. Poor quality of software opens up the increasingly pervasive technologies in our lives to threats from bad luck and from bad actors from around the world.
Comparing the development processes of semiconductor products and software products, it is apparent that they have evolved along parallel paths to become almost indistinguishable. Semiconductor Electrical Computer Aided Design (ECAD) tools have turned the process of semiconductor development into a massive software project, very much like any other large software project. The product electrical functionality is accurately simulated as is the logic level functionality. Design rules are checked and regression tests are run against all changes to the designs. It is all done with software until the last two steps, where the software is instantiated into masks which are then used to fabricate the semiconductor devices. By contrast, large software products are finally instantiated as a new release often distributed by web downloads and not requiring a physical delivery vehicle. Consequently, there is no cost to the developer for incremental update releases.
Modern software development methodologies such as the Agile Methodology enable rapid incremental cycles of change, delivering the nimbleness needed to respond to sudden changes in the market. (Agile software development refers to a group of software development methodologies based on iterative development, where requirements and solutions evolve through collaboration between self-organizing cross-functional teams .) However, use of the Agile Methodology also enables less discipline in the product definition and development process. That lack of discipline often manifests itself in lower product quality and security holes. Agile Methodology could be paraphrased as, “Never enough time to do it right, always enough time to do it over.”
Semiconductors make extensive reuse of modules that have been tested and improved for years, frequently based on feedback from an installed base of hundreds of millions of units. While modern object-oriented languages encourage module reuse, it is not without irony that often a software designer will prefer to re-invent his/her own module. The downside of this practice is that the code is forfeiting the benefits of using a proven module from a library with feedback from an installed base of hundreds of millions of units.
From a manufacturing perspective, software duplication is comparatively simple. It does not have the process variations inherent in semiconductor device manufacturing, thus simplifying the duplication and distribution process. Applying semiconductor product quality methodologies and disciplines combined with rigorous module testing and feedback from large installed bases should potentially provide the same high quality results for the software industry.
The motivations for software platform vendors to dedicate significant efforts to improve quality by applying semiconductor development methodologies are:
- Product differentiation – The software industry is ripe for major competitors to distinguish their offering by improving quality and using the resulting quality data as a product differentiator. Differentiation losers will see their market share erode. This is a big carrot that should motivate innovative software product developers to make such a commitment.
- Fear of damages – The threat of huge legal liabilities should also work as a stick to prod the software industry into taking strong actions to improve their product quality and security. Major security breaches can be prohibitively expensive and have untold consequences for consumers, companies and governments. Should the courts hold software vendors liable for not employing proven development methodologies and disciplines, software product developers may be deeply hurt financially, directly from awarded damages and indirectly from damaged brand value.
There is little reason for people who understand semiconductor product development methodologies to interface with people who drive software product development methodologies. As a result, there is little chance that the software industry on its own will benefit from what the semiconductor industry has learned. This represents an opportunity for the IEEE to take that leadership role in driving the industry cross pollination effort required to improve the quality of platforms, reducing risks for all involved. The first step could be publication of software platform quality metrics across the industry.
- Dr. William F. Finan, “Matching Japan in Quality: How the Leading U.S. Semiconductor Firms Caught Up with the Best in Japan”, The MIT Japan Program, https://dspace.mit.edu/bitstream/handle/1721.1/17109/JP-WP-93-01-27732004.pdf?seq
- Dean Takahashi, “A billion-dollar mistake: Intel recalls a supporting chip for popular Sandy Bridge platform”, Venture Beat, 1/31/2011, https://venturebeat.com/2011/01/31/a-billion-dollar-mistake-intel-recalls-a-supporting-chip-for-popular-sandy-bridge-platform/
- Commonly attributed to Senator Everett Dirksen, https://en.wikiquote.org/wiki/Everett_Dirksen
- What is Agile? What is SCRUM? https://www.cprime.com/resources/what-is-agile-what-is-scrum/
Mubashir Husain Rehmani (M’14-SM’15) received the B.Eng. degree in computer systems engineering from Mehran University of Engineering and Technology, Jamshoro, Pakistan, in 2004, the M.S. degree from the University of Paris XI, Paris, France, in 2008, and the Ph.D. degree from the University Pierre and Marie Curie, Paris, in 2011. He is currently an Assistant Professor at COMSATS Institute of Information Technology, Wah Cantt., Pakistan. He was a Postdoctoral Fellow at the University of Paris Est, France, in 2012. His current research interests include cognitive radio ad hoc networks, smart grid, wireless sensor networks, and mobile ad hoc networks. Dr. Rehmani served in the TPC for IEEE ICC 2016, IEEE GlobeCom 2016, CROWNCOM 2016, IEEE VTC Spring 2016, IEEE ICC 2015, IEEE WoWMoM 2014, IEEE ICC 2014, ACM CoNEXT Student Workshop 2013, IEEE ICC 2013, and IEEE IWCMC 2013 conferences. He is currently an Editor of the IEEE Communications Surveys and Tutorials and an Associate Editor of the IEEE Communications Magazine, IEEE Access journal, Elsevier Computers and Electrical Engineering (CAEE) journal, Elsevier Journal of Network and Computer Applications (JNCA), Ad Hoc Sensor Wireless Networks (AHSWN) journal, Springer Wireless Networks Journal, KSII Transactions on Internet and Information Systems, and the Journal of Communications and Networks (JCN). He is also serving as a Guest Editor of Elsevier Ad Hoc Networks journal, Elsevier Future Generation Computer Systems journal, IEEE Access journal, the IEEE Transactions on Industrial Informatics, Elsevier Pervasive and Mobile Computing journal and Elsevier Computers and Electrical Engineering journal. He has authored/ edited two books published by IGI Global, USA, one book published by CRC Press, USA, and one book is in progress with Wiley, U.K. He is the founding member of IEEE Special Interest Group (SIG) on Green and Sustainable Networking and Computing with Cognition and Cooperation. He received “Best Researcher of the Year 2015 of COMSATS Wah” award in 2015. He received the certificate of appreciation, “Exemplary Editor of the IEEE Communications Surveys and Tutorials for the year 2015” from the IEEE Communications Society. He received Best Paper Award from IEEE ComSoc Technical Committee on Communications Systems Integration and Modeling (CSIM), 2017.