Challenges 3D IC heterogeneous assemblies need to overcome to ensure reliability
Reliability has been challenging in the integrated circuit (IC) space but was manageable with design rule checking (DRC) rules. Now, it’s more challenging, and you can see situations where a die can be DRC-clean or LVS-clean but still fails on the manufacturing floor. What’s worse is when it works fine in the test bench but fails unexpectedly and prematurely once sold and used in the field. This failure risk leaves IC designers uncertain, particularly at the more advanced nodes.
Problems seen in the IC space are now starting to appear at a higher level from heterogeneous 3D IC environments. Unsurprisingly, these are formed by how devices and wires interact. That doesn’t change whether you’re connecting things within a chip or across multiple chips or chiplets. The problems are still there, particularly with larger electrostatic discharge (ESD) applications. Other cases include electrical overstress, especially if you’re targeting automotive or military-aerospace applications where reliability is critical. You don’t want to be on the road, and suddenly the car dies in the middle of traffic. Latch-up is a way to get two things inadvertently shorted together; they’re toggling together. Usually comes with a high voltage that enables it, something larger than expected, but those things can and do happen.
When we think of ESD, we think of a human finger, walking around scuffing your feet (especially as the weather gets colder), picking up a charge, and touching a chip somewhere that dissipates the charge into the chip and blows up the transistors. Yikes – that’s not a good thing and, unfortunately, not just a human discharge that can cause this. From chip manufacturing to its use in an application, there are many opportunities to cause an unexpected charge to appear and cause catastrophic events.
For system-on-chip (SoC) designs, we can find these situations. Even in the more complex scenarios, with tools like Calibre PERC, we can find them and apply known solutions to address them. In the case of latch-up, you need a good guard ring and a sufficient amount of taps and current ties. On the ESD side, it’s about ensuring you have adequate protection pads. With a sudden, unexpected charge, you can give it a fast path to ground instead of going through transistors.
The challenge in 3D IC design is that it differs from an SoC, where everything is in a single process and characterized all upfront. Designers connect things from multiple sources with slightly different behaviors, potentially from one die to the next. Design teams must make decisions to manage some of the differences. For an ESD issue that combines multiple dies, the designer must decide which die to place the fixes in or how to separate them across multiple dies. Another example is the size of the protection devices may change depending on the process of the individual chiplet. Many users already do this for SoC design. The challenge comes with scenarios where we don’t have to spend a lot of time thinking about them today but will need to – areas of thermal and stress.
3D heterogeneous integrations and thermal-related reliability issues
Design rules typically address thermal concerns for a single-process SoC. By following design rules, the design remains within relatively known performance and should behave. But as we introduce heat, we know the transistor is dependent on the operating temperature.
As the heat increases, electrons and the holes you’re trying to push through get heated, and that heat dissipates to the neighboring environment. To help minimize the impact, you move the heat around a little bit and, in the process, toggle a transistor. The faster you toggle that transistor or, the more current you try to push to make that happen, the hotter it will get.
With transistors that are close to each other, they typically start to heat up together. With SoC, design rules prevent those situations by not allowing transistors to be too close together and help designers avoid the dual-heating problem. With multi-die chiplets and heterogeneous processes, things are different, especially with floor planning.
For an SoC, floor planning involves identifying the plane and where to put the different components to get the behaviors with the optimal settings. For 3D IC, it’s not floor planning in the traditional sense because not everything is on the same floor, but the process uses many of the same concepts.
For example, you want to avoid having things that toggle fast and are frequently used next to similar items. Designers avoid putting a GPU and CPU close together because they get hot. With 3D designs, designers also avoid putting GPU on the CPU to prevent dissipating heat on the horizon plane and up and down in space.
When stacking dies on a traditional SoC, the transistors are on the silicon plane, and there’s a clear path for how temperature and heat are dissipated – it goes through the substrate. When multiple thin dies are stacked up, there could be a hot temperature in one that may have no place to go. In particular, the heat will move downward for face-to-face bonded copper-bonded dies with a top tier with a silicone component. The heat has no other place to go but to the transistors that are on the die that’s below it. With a longer path to an actual thermal escape, such as a thermal heat sink, the temperatures are hot and stay hot longer.
With all of these new variables, there are a lot of new considerations for 3D IC designs. Mitigating factors such as TSV through silicone to dissipate heat faster or a through insulator via to serve as a chimney to centralize heat take up space and introduce stresses. Temperature is going to impact how electrical behaviors present. The big challenge is figuring this out can only be done with collaboration.
3D IC designers need to understand a whole system from a multi-physics perspective to understand the behavior as the design evolves. It’s not typically known upfront what’s the best way to design a 3D IC – there are many, many more options on how to build a design with all the new dimensions. On the flip side, you now have many more opportunities for your particular design needs, and it’s more work to evaluate the appropriate options than it used to be.
Thermally-induced mechanical stress and 3D IC design reliability
Mechanical and physical stress – such as when you put a heavy piece of concrete on what you thought would be a straight board, the board bends right or maybe breaks. Stress and temperature are interrelated, but sometimes it’s easy to forget.
Considering the basics, look at the old ideal gas law: PV=nRT. Pressure, volume, and temperature are all interrelated. We’re not necessarily talking about a gas, but as a basic concept, stress increases as temperature increases. As temperatures come down, stresses go down, and vice versa — assuming volume is held constant. With an increase in stress on the transistor, it will behave differently.
For traditional SoC in the LVS space, we look into the post-layout simulation space to identify each transistor in context to evaluate the stressors. We capture the stressors as properties on the transistors that we feed into the transistor model for electrical analysis. Temperature creates stress, and stress creates temperature, so we want to identify the stress of a device from thermal impact. The most significant thermal impact comes from when the temperature is the hottest – typically in the manufacturer stage for an IC.
In typical advanced node processes, we conduct a rapid thermal anneal (RTA), which involves heating it to high temperatures for a short time. RTA should improve the crystallinity of the polysilicon and other design elements. But it does induce stress at that point due to the high temperature, and that stress remains. Think of it as little fissures or little cracks that you didn’t want to be there. Those weak points are going to get worse until they eventually fail. For chiplets, if you understand the manufacturing process and get information about the maximum RTA temperature, that’s a good starting point.
Material properties are another challenge for both thermal and stress. How much stress something can take depends on what it is — a piece of steel is stronger than a popsicle stick. For 3D IC design, understanding the material and manufacturing process is essential to understanding the historical stress.
The other part comes from when you’re in a heterogeneous environment – and the stresses come from what’s around you. Before silicon, you create transistors and some metallization and oxides on top that are well characterized. With the three-dimensional design, there are different processes and connections with metal through a package or TSVs or with Microbox. One of the biggest is ball grid arrays (BGAs) if you’re connecting that package onto a board somewhere, the BGAs are large and create a lot of stress. As you can see, we’re adding stress in ways that aren’t relevant in an SoC and design teams need to account for those new stress points and handle them appropriately. These impact system behavior or performance, and the system behavior or performance impacts the stresses and temperature depending on the frequency. High frequencies will be hotter than low frequencies. Using that information will help predict the thermal impacts in this system, thermal information, and physical information – ultimately determining the stresses. Both change the electrical behavior, resulting in potentially multiple cycles until closure.
When semiconductor design teams should start to ensure reliability in 3D IC design
To ensure reliability – you have to start early. But the earlier you start, the less you know. So, it’s an improve-as-you-go process. Determining the kind of heterogeneous 3D IC to build is a helpful starting point. If it’s unknown, you can start with individual dies as if they’re homogeneous structures and start putting them in place with their TSVs or other connections to assess potential thermal and stress problems. From there, you can place components on top of or next to others. You can play trade-offs at an early stage, but you’re missing some of the details.
For details, you need to know everything about the chipsets and the connections. As you start to add in more, you make decisions. Then continue through the cycle and make multiple rounds of analysis. Of course, you can find these things at the end as you put the design together. But as you move further along in the design, the more expensive it is to go back.
Designing a reliable package requires the design team to focus on analysis as early as possible, such as at the prototype and planning stage, and understand how it could drive mechanical stress and electrical-related failures. It’s more work than in the past, but that’s the benefit of it because you have so many more options. Semiconductor design teams have more room to be clever and find ways to do things not done before.
Authored Article by: Heather George, Siemens