Monday, February 16, 2009

Moore’s Law: the End Game

A lot of premature predictions have been made about the end of Moore’s Law. Most of them have been proven wrong as the result of clever engineering. But clever engineering can’t defeat the laws of physics and we’re almost there. The economic constraints on Moore’s Law are just as looming although the current technology roadmap takes us past 2022. My purpose here is not to predict when Moore’s Law will end. Instead, I’d like to discuss the technical and economic constraints and their consequences as the end game approaches.

Moore’s Law (or observation) states that transistor density doubles every two years. It also encompasses transistor cost relative to die size and defect density (such as foreign matter or particles). A number of factors determine the ultimate cost per transistor, most notably yield. More transistors and larger die size increase the chance of a bad chip. The industry has recently coined a term “More Moore” which complements conventional scaling with the concept of “equivalent scaling.” The most visible example of equivalent scaling is the introduction of multi-core processors.

Before addressing the cool technical stuff, let’s talk about the fundamental economic constraints underlying Moore’s Law – capital investment and R&D expense. The global electronic systems market is forecast to be $1.2 trillion or around 2% of global GDP in 2009. This market grows at single digit rates closely tied to global GDP growth. The semiconductor market is about $250 billion, 20% of the electronic systems market. The semiconductor content of electronic systems has varied between 15% and 25% historically. The semiconductor companies spend 10-15% of revenue on R&D and 15-20% on capital, 80% of which goes towards fab equipment, about $30-40 billion. The capital equipment industry can afford to spend 15% of revenue on R&D to sustain Moore’s Law, $5-6 billion annually. Photolithography, the most critical and expensive tool to design and build gets 20% of the total equipment industry’s R&D, about $1 billion to split up among the three top suppliers (ASML, Nikon and Canon) and the handful of second tier suppliers.

Optical scaling has enabled Moore’s Law more than any other chip manufacturing or design technology. The current state of the art is 157nm, which refers to the wavelength of the light being used to expose the pattern on the wafer. That translates into minimum line widths are on the order of 50nm (yes, shorter than the wavelength of the source). The wafer is exposed one die (chip) at a time for each layer. Memory has 20-25 layers while complicated processors can have more than thirty. Cost is proportional to the number of layers and the size of the die. Memory density has driven lithography for the past few generations, not processors. Intel’s Penryn 45nm Intel® Core™2 quad-core processors have 820 million transistors packed into 107mm2 of Silicon while SanDisk/Toshiba’s 43nm 16 Gbit flash fits on 120mm2. Each new technology node fits the same number of transistors or bits in one half of the die area as the previous node. This results in a scaling factor of .7X (actually .707) between nodes. The next node will be 32nm (0.707*45). The most recent roadmap actually shows half-nodes every year that scale by a factor of .9X through 2022. From a production perspective though, it takes five years for a given node to reach peak volume production and a very

Let’s get small and look at things on the scale of Silicon atoms. The spacing between two silicon atoms in their lattice is about 0.5nm. That 50nm linewidth is used in 45nm devices and is about 100 Silicon atoms wide. The gap between lines is about the same as the lines themselves. The problem with scaling interconnects is that as their cross section approaches the mean free path of electrons in the material, their resistivity shoots up dramatically above that of the bulk material (Copper for example). This is because the mean free path of electrons (the distance an electron travels on average before bumping into something) in Copper is about 40nm, almost the current linewidth. The reason high resistivity is bad is because it generates heat and contributes to switching delays. This is called RC (Resistance x Capacitance) delay. It varies linearly with interconnect resistance, the capacitive load they create, and the square of their length. The reason the industry switched to Copper interconnects from Aluminum was to address the resistance part of the problem. The capacitance problem has been whittled at steadily with the introduction of low-κ materials. The “κ” or permittivity of the best low-κ materials is around 2.2. Dry air has a permittivity of 1.0. The best low-κ is material turns out to be air. But dry air is terrible heat conductor so it is impractical. The only reason RC delay has not been a problem so far is because of the scaling of the interconnect length – the squared term of the equation.

Moore’s Law is transistor-centric. However, it’s Flash memory cells that face the toughest challenges. While you don’t need fast transistors for Flash, they still have to be physically smaller. Memory adds the storage capacitor to the equation. These are actually built on top of the transistors. There is a common challenge in scaling transistors and capacitors – the necessity for dielectric layers. In the past, the most common dielectric material for transistor gates and capacitors was Silicon Dioxide (glass). Until recently, the Silicon Dioxide layers were thick enough to provide sufficient insulation in the gate and charge retention in the capacitors. Silicon Dioxide ran out of gas (due to leakage) as a gate dielectric when it was around 1.5nm thick – three Silicon atoms thick. That’s why the industry developed high-κ dielectrics (Hafnium). High-κ dielectrics breathed new life into conventional transistor and capacitor designs. The industry also has designs that rearrange the building blocks of transistors vertically rather than horizontally that will allow higher densities and the use of existing materials.

Lithography drives conventional scaling but there are several other critical processes involved in making a chip. With some of those, we have already reached some limits. For example thin films can be deposited and controlled one atomic (or molecular) layer at a time. Thermal processes can be controlled to 1ºC across a 300mm wafer with almost unbelievable precision. This translates into thickness variations that are less than the distance between Silicon atoms for some films. We can also similarly control other critical processes such as ion implantation and etch. Even quantum effects have been steadily creeping into chip design. There’s still room at the bottom as Richard Feynman theorized, we just don’t know how to get there. Quantum computing, spintronics, and other “way out” approaches are just that. They are not going to be commercialized anytime soon, certainly not in my lifetime even if I live to be 100.

I mentioned “More Moore” and multi-core chips earlier. When was the last time that Intel touted clock speed and techies eagerly awaited the next major milestone? Remember when the GHz barrier was broken? How about the two and three GHz barriers? There’s a reason that Intel dropped the emphasis on clock frequency. It’s cheaper to put two cores on a chip than to double chip performance in terms of pure transistor switching speed. Plus it’s pretty easy to market to consumers – two cores are better than one, four cores are better than two, etc. The implication is that performance scales with the number of cores, but that’s not the case. As it turns out, there is a law that applies to multi-processor systems called Amdahl’s Law. Gene Amdahl was an IBM designer who worked on some of IBM’s earliest mainframes then went on to found his own supercomputer company, Amdahl Corporation. Amdahl’s Law states that as the fraction of work that can be processed in parallel diminishes, so does the equivalent improvement in performance. For example, in a two processor system if only 50% of the work can be processed parallel, the equivalent performance is 1.33X. In the same example, if we push the number to 99%, we achieve equivalent performance of 1.98X. Now let’s take a ridiculous example and build a 10,000 core chip. At 50% it would achieve equivalent performance of 2X while at 99% it would only achieve 99X. What’s worse is that most computing tasks do not lend themselves to parallel computing. (As an aside, developing a tool that can take conventional code and execute it in parallel without having to design it that way is one of the holy grails of computing. If you can solve this problem, you’ll be golden.)

A rule of thumb for the cost of chips is that one third of their cost is in the die, one third in the package and one third in test. By integrating two chips in the same package, you can effectively cut the package cost in half. The industry seems to be settling on a handful of ways of accomplishing this and more than two die can be stacked together. Think of this as a 3-D expansion of Moore’s Law. Instead of transistors per square area, think about transistors, bits or functions per cubic volume. There are some technical problems to be solved but these are more in the category of manufacturability than technology. In a nutshell, the problem is drilling holes through the die, aligning them, and then filling the holes with conductive material. The die can be ground very thinly – the state of the art is about one tenth of a millimeter – the thickness of a sheet of paper. About 15% of chips being produced today use some sort of die stacked packaging.

Another key aspect of staying on Moore’s Law has been the use of increasingly larger wafers. Wafer size has grown from 25mm in the early days of the industry to 300mm which entered production in 2000. 300mm was the first wafer size transition to be funded by the equipment manufacturers. In other words, the equipment manufacturers paid for the development cost of the tool set. Previous conversions were funded by the chip manufacturers. IBM funded the conversion to 150mm and Intel for 200mm. Not only did the chipmakers push the development cost of the tools back onto the equipment makers, they also told them that they could not cost more than 1.2X the cost of the 200mm tools. So in effect the industry wanted tools that could process 2.25X as many die of a given size per wafer for only 20% more. That would have been a 50% decrease in the capital cost per die and a substantial reduction in the waste of chemicals (which mainly go up the fab exhaust – more than 95% of some) and utilities. The equipment industry balked at the 1.2X “rule” and all critical tools exceeded the target, most notably lithography. Even so, the industry was still able to achieve enough productivity improvements to make the conversion worthwhile for the highest volume manufacturers. Another characteristic of wafer size transitions is they take even longer than technology nodes to reach peak production. In the eight years since the first production 300mm fab was switched on, a total of one hundred or so has been built. They account for 25% of global wafers out while 200mm still accounts for 50%. Old fabs don’t die; they are just dismantled and shipped to China. Semico Research predicts that the recently flooded but usually attractive used equipment market will be $8 billion in 2009, or about 25% of the total market. Well maintained fab equipment can easily have a ten year useful life.

The next wafer size transition will be to 450mm and is one of the hottest debates in the industry. Intel, Samsung and TSMC are pushing equipment makers to develop the new tools. There is no widely accepted estimate for the actual cost of the 300mm transition. My estimate as an ex-insider is that it was more than $50 billion. Once again, the equipment industry will have to cover the development cost of the next wafer size transition. They will likely exceed $100 billion. The transition to 300mm was anything but smooth either and the initial milestones were blown by years. The development cycle for a new tool is typically longer than for a new technology node. This was painfully obvious in the 300mm transition. Most equipment makers had to scrap or improve the first set of tools they developed for 300mm in order to meet the technology requirements of the next node. This is Catch-22 situation is unavoidable though given the time need to develop the equipment and to qualify it for production, another costly and lengthy process. The industry has wisely pushed out the next wafer size transition past 2020 and is focusing instead on improving the productivity of new and future 300mm fabs with a program dubbed “300 Prime.” This is an acknowledgement that the economics to support the next wafer size are not there – at least not in the next few green field fab investment cycles. Some even argue that 300mm will be the last wafer size.

The cost of capital equipment in a fab is about 70% of the total capital investment. In the mid-90’s you could build a 200mm / 250nm fab that would crank out 20,000 wafers per month for about $1 billion. Today, a 300mm / 45nm fab that can crank out the same volume is $3 billion. Many are predicting $10 billion fabs in the not too distant future. I mentioned earlier that the chip industry spends about 20% on capital. At today’s volume, the whole industry would only be able to afford five 450mm fabs per year. In reality it would be able to support one or two. That compares with five to ten 300mm fabs per year today.

The highest cost labor in a fab is the team of engineers that support and maintain the manufacturing technology and the tool set. Moving fabs to Asia opened up a supply of well educated scientists and technicians that could be developed at a lower cost than in Silicon Valley. This economic driver of Moore’s Law is also under pressure as salaries globalize and the cost advantage narrows. Direct labor has for the most part been eliminated. One of the key considerations was that physical weight of the wafer carriers themselves – 10 kilograms or 22 pounds – was considered a safety and ergonomic problem. Modern fabs are designed to run “lights-out” although I don’t know of one that literally turns them off.

Another strategy the industry has employed to stay on increasingly expensive cost to develop a new node is the formation of “fab clubs.” IBM’s technology is co-developed with AMD, Toshiba, Sony, and four others. Each company then builds its own devices using a common process technology. The best process technology becomes available to those who may not afford it at all at the same time as the technology leader of the past. In the end, the gap between technology have’s and have not’s disappears.

Moore’s Law fits the classic technology life cycle s-curve. The first modern transistor patent was filed in 1925. The team at Bell Labs built the first transistor in 1947. It probably had a few Dollars worth of materials and hours of fabrication time versus 1/10,000 of a Cent today. It took another seven years for TI to ship the first commercial Silicon transistors. A different way to look at Moore’s Law is that the technology that enabled it had a 50 year incubation period before producing the first integrated circuit. That technology is in its rapid growth phase now. So far it has lasted 45 years but inevitably it will reach its technological and economic limits.

4 comments:

  1. Julio, why is Moore's Law the doubling of transistor density every two years? That seems arbitrary. Why not every 18 months or 3 years? Why not tripling? Is there math behind it or is that just what Gordon decided on and the engineers took their marching orders?

    ReplyDelete
  2. My understanding is Moore just observed past results and used this as a rule of thumb for future results. The "law" then became a target, or a self fulfilling prophesy. "How long should we budget to make these improvements?" "Oh, two years." "Alright then."

    If I recall Moore might have been the one to give us in high tech the casual dress work environment. I think he might have been one of the original engineers that left William Shockley's transistor company start up in silicon valley. Shockley was not only a eugenicist but I guess a real ogre to work for. His chief engineers grew tired of his tin pot dictator ways and left in droves to set up their own transistor companies. They all hated his formal work attire rule and mandated casual dress at their new companies.

    But don't cite me in any academic papers on this.

    ReplyDelete
  3. That is a complicated answer on one level and a simple one on another. The simple answer is that Moore observed in 1965 what the industry had demonstrated since 1959. Like Newton's observation of gravity. Once the treadmill was going, you had to stay on to survive. Moore's Law also states that there is an optimal cost point per transistor determined by a number of factors including density, yield, and die size. There are many more "knobs" than just the process technology. The industry works all the knobs to stay on Moore's Law. Turns out that was two years. The beauty of Moore's Law is that it's exponential.

    This is what Moore had to say about it in 2005.

    Moore on Moore's Law

    ReplyDelete
  4. "Inside Intel" is a pretty interesting account of their early days. I have a friend who was one of their memory customers. He said Andy Grove personally apologized to him for the quality of their DRAM's. Imagine that!

    ReplyDelete