# Noriyuki Ito

DA Development Department, Fujitsu Limited Nakahara-ku, Kawasaki 211, Japan

# ABSTRACT

This paper presents a system which automatically incorporates testability circuits into ECL chips. This system incorporates three types of circuit: (1) random access scan circuit, (2) clock suppression circuit for delay fault testing, and (3) pin scan-out circuit for chip I/O pin observation in board testing. Fanout destinations of each gate in the testability circuits are localized on a chip to keep the logical net length within the limit. This system was used to develop the new Fujitsu VP-2000 supercomputer.

## 1. Introduction

As gate density and processing speed of chips have increased; many kinds of testability circuits have been introduced to make testing easier [1]. Scan design is one of the most accepted testability circuits, which makes latches controllable and observable. Several recent improvements in testing have been based on this scan design. Latch-to-latch delay fault testing can be performed by controlling and observing the logic values of latches [2, 3]. This testing becomes much easier under an additional testability circuit. Boundary scan to control and observe all the primary inputs and outputs of chips on a board is also based on the scan design [4, 5]. Both scan design and these additional testability circuits are inevitable for the testing of high-density, high-speed chips and boards used in very large-scale computers.

However, it is not practical for logic designers to actually design these circuits in a short time without an error. Several systems to automate the design process of testability circuits have been reported in the literature [6-10]. Although these systems are very informative, severe constraints are not imposed on testability circuits. For example, the line length of each net must be within a certain limit in ECL chips. Otherwise, even a testability circuit will not work due to the voltage drop in a signal line.

This paper presents a system to automatically

incorporate on-chip testability circuits on which two physical constraints such as fanout limit and net length limit are imposed. The layout is carried out based on the generated floorplan information. The technique presented here is also applicable to the automatic incorporation of clock and/or reset signal distribution circuits.

## 2. Design for Testability

The testability circuits for chips in the Fujitsu VP-2000 series are based on the random access scan design [11]. This section outlines this scan design, and describes two other types of testability circuit which are also incorporated to perform the delay fault testing and board testing.

### 2.1 Random Access Scan Design

As detailed in the literature [11], a latch shown in Figure 1 is used as one of scannable elements in this scan design. This type of latch has four scan signals: scan-in PR, scan addresses XADR, YADR, and scan-out -SDO. A latch is selected when both XADR and YADR are set to 1. The -SDO of a latch outputs 1 when it is not selected, and outputs the inverted value of the latch holding state when it is selected. Prior to scan-in operation, all the scannable latches are cleared by supplying a negative pulse to clear lines -CL. The latch holds 0 when it is cleared. 1 is scanned into this latch by supplying a positive pulse to a scan-in line PR while it is selected.



Figure 1 Latch with scan function

Figure 2 is the configuration of the random access scan design that controls scan-in and scan-out operations of scannable elements in a chip. When the latch shown in Figure 1 is implemented by the ECL

27th ACM/IEEE Design Automation Conference®

technology, it is made of OR/NOR gates. In this case, the scan signal polarity is different from the one in Figure 1. However, the following discussions assume the latch shown in Figure 1 to avoid confusion.



Figure 2 Random access scan design

### 2.2 Clock Suppression Logic

Delay fault testing is important for high-speed chips. A delay fault between latches can be detected by transmitting a signal transition between latches in a clock cycle of normal operation [2, 3]. In Figure 3, the path from a data-out line of latch L1 to a data-in line of latch L2 is sensitized, and the paths from a clock primary input to clock lines of L1 and L2 are sensitized. Then, 0 is scanned into L1 and L2. Next, required values are scanned into the related latches and placed on the related primary inputs so that 1 is set on the data-in line of L1. Under these preparations, a clock is issued to L1 in the first cycle, and to L2 in the second cycle. Then L2 is scanned out. If its value is 0, an over-delay fault exists between L1 and L2.



Figure 3 Delay testing between latches

If clocks are issued only to the sending latch (L1) and the receiving latch (L2), the automatic test pattern generation for this testing becomes much easier. This is because the value of each latch related to the path sensitization can be kept unchanged during this delay fault testing. The circuit shown in Figure 4 is adopted for VLSI chips used in the VP-2000 series. In the X-direction, the clock enable lines of latches are controlled by a scan only latch (SOL). In the Y-direction, the clock enable line of each clock chopper that chops a clock pulse to a latch is controlled by an SOL. When the SOLs are reset, all clock enable lines are enabled. When clocks are desired to be supplied only to some two latches in delay fault testing, all the SOLs except X- and Y-direction SOLs corresponding to those two latches must be scanned in.



Figure 4 Clock suppression circuit

# 2.3 Pin Scan-Out Logic

To develop a high-speed computer, not only must the speed of chips be increased, but also chips must be densely mounted on a board. Dense mounting, this makes direct probing of chip I/O pins difficult in board testing. To solve this problem, it is desired that I/O pins of chips on a board be controlled and observed without direct probing. The capability to enable this is called boundary scan, the standard of which is proposed by JTAG/IEEE P1149.1 [4, 5].

The pin scan-out circuit in Figure 5 is adopted for chips used in the VP-2000 series. This circuit is not the same as the proposed standard in that the pin scan-out circuit is controlled under the random access scan design. It has no scan-in capability, either. Adding scan-in capability causes two gate delays in the data path through a chip I/O pin because of a multiplexer. However, to add only scan-out capability does not cause this performance deterioration.



Figure 5 Pin scan-out circuit

#### 3. Formulation of Automatic Incorporation

The process to incorporate testability circuits is divided into three phases, which are the generation, backannotation, and layout of testability circuits. The first phase is further divided into two steps. This section describes these two steps which formulate the automatic incorporation.

#### 3.1 Determination of Scan Points

Scan points are categorized into three types according to the testing purposes. The first type is a set of scan points which make internal latches controllable and observable mainly for the static functional testing. The second type is a set of scan points which suppress clocks to latches for the delay fault testing. The third type is a set of scan points which make chip I/O pins observable for the board testing.

Ideally, the first type of scan points should be determined based on the result of the adequate testability analysis. However, a very high fault coverage is required for VLSI chips used in the VP-2000 series. Therefore, it is assumed that all the latches except regularly configured registers have the scan function. Actually, the scan points for this type are determined by logic designers when the system logic is designed. They add a non-scan attribute to a latch when it is desired not to have the scan function. Until the completion of the testability circuit incorporation, all the scan-related lines of scan latches are hidden from the logic designers logically. However, the delay characteristic, gate size, and physical shape which logic designers view for a scan latch are the same as the ones of an actual scan latch. This strategy is useful in that logic designers do not have to worry about the scan logic suppression in simulation.

Scan points of the second type can be determined straightforward by adding SOLs to control the clock enable lines of clock choppers and latches. This is done after the completion of system logic placement. The fanout destination latches of each X-direction SOL tend to be placed all over a chip surface. If the placement of system logic is not completed, an output of each X-direction SOL cannot be distributed so that the fanout destinations of each distribution gate are localized on a chip surface. The output of each Y-direction SOL can always be connected to the clock enable line of a clock chopper directly, not through a repeater gate.

Scan points of the third type can also be determined straightforward. Pin scan-out circuits are added to all the primary inputs and outputs of a chip except scan address pins and a scan-out pin. This exclusion is to avoid self-contradiction and unnecessary self-loop.

After all these scan points are determined, the scan capability is assumed at each scan point. The physical locations of scan points are used to make the logical net length be within the limit when the scan control circuit itself is laid out. The placement of the latches as the first type of scan points must already be completed. The placement of both the SOLs as the second type and the related distribution gates should be tentatively done, and their final placement can be done with the scan control circuit later. For the third type, pin scan-out gates are assumed to be placed near the corresponding I/O pins.



Figure 6 Formulated scan control circuit

#### 3.2 Generation of Scan Control Circuit

The scan control circuit interfaces scan related signals between I/O pins of a chip and scan points. It is the circuit shown inside the dashed line in Figure 6. This scan control circuit has the following input signals as its maximum: a scan-in SI from a chip input, scan addresses  $ADDR_i$  (i = 0, ..., n-1) from chip inputs, and scan-out  $SDO_j$  (j = 0, ..., 2<sup>n</sup>-1) from scan points. And output signals are a scan-out SO to a chip output, scan-in  $PR_j$  to scan points, X-addresses XADR<sub>j</sub> to scan points, and Y-addresses YADR<sub>i</sub> to scan points.

More precisely, this maximal scan control circuit must satisfy the following boolean equations for the above input and output variables. Equations (1), (2), and (3) define the scan operation when a scan point is addressed by  $ADDR_i$ . Equation (4) is to guarantee that  $ADDR_i$  cannot address more than one scan point at a time.

$$n-1$$
When  $\Sigma$  (ADDR<sub>p</sub> \* 2P) = j:  
p=0  
XADR<sub>j</sub> • YADR<sub>j</sub> = 1 (1)  
PR<sub>j</sub> = SI (2)  
SO = SDO<sub>j</sub> (3)

When 
$$\sum_{p=0}^{n-1} (ADDR_p * 2P) \neq j:$$
  
p=0  
PR<sub>i</sub> • XADR<sub>i</sub> • YADR<sub>i</sub> = 0 (4)

This scan control circuit generation might be deduced to the automatic logic synthesis of a general logic circuit. However, it is not easy for an automatic logic synthesis to conform to the net length constraint. To overcome this difficulty, this automatic incorporation system uses a technique based on a sort of circuit reduction after clustering scan points.

This technique assumes that  $2^n$  scan points are uniformly distributed on a chip. A quarter of the scan address space is assigned to each quadrant (Q0 to Q3 shown in Figure 7). Under these assumptions, one scan control circuit which will keep the fanout limit and logical net length limit is prepared manually as an "ideal scan control circuit". This circuit must be verified by simulation in advance. After these preparations, the scan addresses are assigned to the scan points after clustering scan points into several groups on each chip. The necessary part is then extracted from the ideal scan control circuit. Once an ideal scan control circuit is designed, it is applicable to all chips of the same type. The next section describes the detail of this process.



Figure 7 Quadrant definition on a chip

#### 4. Incorporation of Random Access Scan Circuit

In this section, an example of random access scan circuit incorporation is described for a chip with an 8-bit scan address. The concept can easily be generalized for a chip with an n-bit  $(n \ge 8)$  scan address by regarding it as a chip that consists of  $2^{n-8}$  sub-chips with an 8-bit scan address.

## 4.1 Ideal Scan Control Circuit

Figures 8 and 9 show an example of the ideal scan control circuit that satisfies equations (1) through (4) for an 8-bit scan address. The circuit in Figure 8 collects scan-out data from scan points, and distributes scan-in data to scan points. The circuit in Figure 9 generates X- and Y-addresses, and distributes them to scan points. In this example, an 8-bit scan address (ADDR7 to ADDR0) is partitioned from the most significant bit ADDR7 into three groups: a 2-bit group and two 3-bit groups. The decoded values of three groups can be represented by the 4-valued Q, and the 8-valued X and Y variables. Using these variables, the 8-bit scan address is represented as  $Q * 2^6 + X * 2^3 +$ Y. Q selects all scan points in one of the quadrants (QO to Q3) by controlling the propagation of scan-in and scan-out data. The combination of X and Y selects one of the scan points in each quadrant by controlling the XADR and YADR inputs of latches.

Each gate of the ideal scan control circuit has floor information as a layout attribute. It indicates the floor on which the gate must be placed. QUADO to QUAD3, BOUNDO to BOUND7, CTR, and ANY are the floor names, and these definitions are shown in Figure 10. Floors BOUND0 to BOUND7 are for pin scan-out



(): To or from chip external pin

O: To or from chip internal pin

Figure 8 Scan-in and scan-out circuit

circuits. To get a better layout result, overlapped floors are allowed here. These layout attributes are assigned to gates based on the signal flow when the circuit is prepared. Assume that a scan signal is distributed from a primary input of a chip to a scan point in quadrant Q0 through three gates of the scan control circuit. In this case, layout attributes ANY, CTR, and QUADO are assigned to the first, second, and third gates, respectively. In Figure 8, layout attributes QUAD3, QUAD3, CTR, ANY, ANY, ANY, CTR, CTR, and QUADO are assigned to gates G1 to G9, respectively. If the scan addresses are assigned to the scan points adequately, each gate can be laid out according to the layout attribute so that the logical net length is within the limit. This adequate scan address assignment is done based on scan point clustering.

### 4.2 Scan Point Clustering

Scan addresses are assigned after clustering scan points according to their physical locations on a chip surface. This clustering is to determine how to distribute a scan signal to scan points and how to collect scan signals from them. The fanout destinations which each output of the scan control circuit drives are



Figure 9 Scan address circuit



Figure 10 Floor definition on a chip

made closer by clustering. The fanin sources are also made closer by clustering. By this locality of destinations and sources, the logical net length can be within the limit. If scan points are clustered hierarchically, the fanout destinations of the preceding gate are also made closer. In the example of an 8-bit scan address, the line length constraint of each net in the scan control circuit can be observed by the following clustering algorithm.

algorithm CLUSTER S + {sj | sj is a scan point.};  $Q_0, Q_1, Q_2, Q_3 \leftarrow CLUSTER1(S);$ for (q = 0,1,2,3) begin x ← 0; while  $(Q_q \neq \phi)$  begin y <del>+</del> 0;  $A_x \leftarrow \{s_{q,x,0} \mid s_{q,x,0} \text{ is a new label of the } s_j \in Q_q \}$  that is closest to the outside corner of quadrant Qq. };  $Q_q \leftarrow Q_q - A_x$ ; while (y < fanout limit) begin T ← {sq,x,y+1 | sq,x,y+1 is a new label of the sj (sj ∈ Qq) that is closest to sq,x,0 (sq,x,0 ∈ Ax). }; r ← STEINER(Ax UT); if  $(r + \alpha \leq net length limit)$  begin  $A_x \leftarrow A_x \cup T; Q_q \leftarrow Q_q - T; y \leftarrow y + 1;$ end else break ; /\* break the inner while \*/ end x + x + 1;end  $\overline{C} \leftarrow \{ all A_{x} \};$  $G_0, G_1 \leftarrow CLUSTER2(C);$ end return({ all scan points sq,x,y}, G0, G1); end

Procedure CLUSTER1 reduces or enlarges the boundary definition of each quadrant so that the number of scan points in each quadrant is equal to a quarter of the scan address space or less. And it returns the scan point groups  $Q_0$ ,  $Q_1$ ,  $Q_2$ , and  $Q_3$  contained in each quadrant. Procedure STEINER calculates the length of the Steiner tree containing the scan points in a given group. Procedure CLUSTER2 calculates the center coordinates of each group  $A_{\chi}$ based on the physical locations of scan points in the group. Then, this procedure partitions { all Ax } into two super groups G<sub>0</sub> and G<sub>1</sub> by clustering the adjacent groups.  $\alpha$  in the algorithm is the expected average logical line length from the distribution gate to the Steiner tree of the fanout destinations (Note that the scan control circuit itself is not laid out yet at this point.). The "fanout limit" is 8 in this example by assuming that each gate can drive 8 gate inputs. The circuits in Figures 8 and 9 are designed considering this fanout limit. Figure 11 shows the scan point clustering in Q2 of a chip with an 11-bit scan address. This map shows the scan points (except pin scan-out points) contained in the groups  $A_X$  in the algorithm. Scan points that are represented by the same digits or characters are included in the same group Ax.

As a result of this clustering, a label  $s_{q,x,y}$  is assigned to each scan point, and super groups  $G_0$  and  $G_1$ are produced. Here, a new label  $s_{q,x,y,g}$  replaces each scan point  $s_{q,x,y}$  which is contained in the super group  $G_g$ . Based on this new labeling, the scan address q \* 2<sup>6</sup>

| 0 0 0 2 2<br>0 0 0 2 2<br>0 0 0 2 2<br>0 0 1 1 2 2 5<br>0 0 0 2 2<br>0 0 1 1 2 2 5<br>0 0 0 2 2<br>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0                                                                                          |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1.4.2.6.6.9.9.9.8.K.K.K.U.NUDR.R.B.SS5<br>1.4.7.7.6.9.9.K.K.K.U.NUDR.R.SS5<br>1.4.7.7.6.9.9.K.K.K.U.U.NUDR.R.SS5<br>1.4.7.7.6.9.9.K.K.D.U.UV.<br>1.8.7.8.A.A.A.D.K.D.U.UV.<br>1.8.8.8.A.A.A.C.D.D.V.U.V.<br>1.8.8.8.A.A.A.C.D.D.V.U.V.<br>1.8.8.8.A.A.C.D.D.V.U.V.<br>1.8.8.8.A.A.C.D.D.V.U.V.<br>1.8.8.8.C.<br>1.6.7.7.7.7.7.7.7.7.7.7.7.7.7.7.7.7.7.7. |
| B.B.C.C.E.Y.F.F.Y.Y.W.W.<br>B.B.C.C.E.Y.F.Y.Y.W.W.<br>B.E.C.E.Y.Y.Y.Y.W.W.X.<br>I.E.E.E.E.E.Y.Y.Y.W.W.X.                                                                                                                                                                                                                                                 |

Figure 11 Scan point clustering in Q2

+ x \*  $2^3$  + y is assigned to each scan point  $s_{q,x,y,g}$ . Once the scan address is determined, the connection of the scan control circuit to each scan point is uniquely determined. This connection leads to the localization of fanout destinations and famin sources. In Figures 8 and 9, the I/O pins of the scan control circuit are connected to the input and output lines of scan point  $s_{q,x,y,g}$  as follows: a Qq.SOx.y pin of the scan control circuit to a scan-out output of a scan point  $s_{q,x,y,g}$ , a Qq.SIx pin to a scan-in input, a Qq.Xx pin to a scan address X input, and a Qq.Yy.g pin to a scan address Y input. This correspondence is used to backannotate the scan control circuit into the original chip.

### 4.3 Circuit Extraction and Optimization

After the scan address of each scan point is determined, the necessary part of the ideal scan control circuit is identified and extracted. To identify it, the scan control circuit is traced based on the scan point label set  $\{s_{q,x,y,g}\}$  acquired by clustering the scan points. Then, the circuit part having trace-pass marks is extracted. The tracing algorithm is as follows:

- Step 1: Forward tracing is done from each input pin other than ADDRO to ADDR7 in the scan control circuit. Non-address marks are assigned to the passing gate input pins.
- Step 2: Forward tracing is done from each input pin ADDRO to ADDR7 in the scan control circuit. Address marks are assigned to the passing gate input pins. When the tracing passes an input pin of a gate, another input pin of the same gate may have a non-address mark. In this case, this tracing backtracks after an address mark is assigned to the current passing gate input pin.
- Step 3: The following tracings are repeated for each scan point of {sq,x,y,g} iteratively. Backward tracing is done from each output pin Qq.SIx, Qq.Xx, and Qq.Yy.g. Trace-pass marks are assigned to the passing pins and gates. Forward tracing is done from the input pin Qq.SOx.y, and trace-pass marks are assigned to the passing pins and gates. When the forward tracing passes an input pin of a gate, another input pin of the same gate may have an address mark. In this case, backward tracing is started concurrently from this input pin to assign trace-pass marks to the passing pins and gates.

When the tracing according to the above algorithm terminates, the pins and gates having trace-pass marks are identified as the necessary circuit part. Before the circuit is extracted, some gates with unused pins may be replaced by gates with a smaller size. Then, the circuit part having trace-pass marks is extracted with the floor name assigned to each gate. This extracted scan control circuit is backannotated into the chip, and its layout is submitted to an automatic placement based on the floorplan.

When the automatic placement is completed, the redundant gates must be deleted from the scan control circuit. For each no-logic-inversion repeater gate, the possibility of deleting the gate is checked. Assuming that the gate is deleted and that the input net is connected to the output net directly, the new net is checked for design constraints such as fanout limit and logical line length limit. If the net does not violate any constraint, the above gate is deleted as a redundant gate. If the net violates any constraint, the gate is not regarded as redundant and it is not deleted.

# 5. Results

This testability circuit incorporation system was applied to ECL chips used in the VP-2000 series. There are four types of ECL chips: 15K-gate logic, 3.5K-gate logic with 64K-bit RAM, and two others. Mostly, an 11-bit scan address is assigned to each type of chip. Table 1 lists the average number per chip of each type. SORG is the average number of gates used per chip before testability circuits are incorporated. SSCANFF is the average number of scannable latches or flip-flops per chip before testability circuits are incorporated. SDELAY is the average gate count of the automatically incorporated testability circuit for clock suppression. SSCAN is the average gate count of the automatically incorporated scan control circuit including the pin scan-out circuit and reset distribution circuit. Since one scannable latch or flip-flop has a two-gate overhead (see Figure 1), the percentage of overhead due to the testability circuits can be calculated as follows:

For the  $L_{15K}$  type in Table 1, this overhead is about 29% (About 2% out of this percentage is the overhead due to the reset distribution circuit.). In other words, the logic design of a chip is automated by 29% in terms of gate count. It takes less than three minutes to generate and backannotate the testability circuits into a chip on a Fujitsu M-780 computer.

| Chip type          | SORG | SSCANFF | SDELAY | SSCAN |
|--------------------|------|---------|--------|-------|
| L <sub>15K</sub>   | 9487 | 621     | 636    | 1392  |
| Logic + RAM64K bit | 1871 | 25      | 147    | 1016  |
| Chip type 3        | 5341 | 394     | 471    | 1307  |
| Chip type 4        | 4894 | 266     | 380    | 1177  |

Table 1 Testability circuit size per chip

# 6. Conclusion

This paper has described how to automatically incorporate testability circuits so that fanout limit and logical net length limit are satisfied. This system was used to develop ECL chips for the new Fujitsu VP-2000 series.

Automatic EC (engineering change) processing of the testability circuits is the next problem to be solved. For example, when scannable latches are added to the system logic, scan addresses must be assigned to these new latches. The scan addresses of the existing latches should remain unchanged, and the scan control circuit must be expanded to control the newly added scan addresses. The fanout limit and logical net length limit must also be satisfied. Automatic EC processing that satisfies these requirements is the subject of the author's ongoing research.

### ACKNOWLEDGEMENTS

The author would like to thank Mr. Hirofumi Hamamura for his support and encouragement, and DA Department of Amdahl Corporation for providing an excellent work environment in Sunnyvale. The author is also grateful to Mr. Toshihiko Tada, logic designers, and Technology Development Department for their helpful discussions and valuable suggestions during this work.

### REFERENCES

- T. W. Williams and K. P. Parker, "Design for Testability - A survey," Proceedings of the IEEE, Vol. 71, No. 1, January 1983, pp.98-112.
- [2] E. P. Hsieh, R. A. Rasmussen, L. J. Vidunas, and W. T. Davis, "Delay Test Generation," Proceedings of the 14th DA Conference, June 1977, pp.486-491.
- [3] K. Kishida, F. Shirotori, Y. Ikemoto, S. Isiyama, and Y. Hayashi, "A Delay Test System for High-speed Logic LSI's," Proceedings of the 23rd DA Conference, June 1986, pp.786-790.
- [4] JTAG Boundary-Scan Architecture Standard Proposal Version 2.0, 30 March 1988.
- [5] J. Turino, "IEEE P1149 Proposed Standard Testability Bus — An Update with Case Histories," Proceedings IEEE International Conference on Computer Design, October 1988, pp.334-337.
- [6] V. D. Agrawal, S. K. Jain, and D. M. Stinger, "Automation in Design for Testability," Proceedings of Custom Integrated Circuit Conference, May 1984, pp.159-163.
- [7] T. Hayashi, K. Hatayama, Y. Kunitomo, and S. Kuboki, "An Approach to Design Automation for Highly Testable Logic Circuits," Proceedings of ICCAD, November 1986, pp.98-101.
- [8] C. E. Stroud, "An Automated BIST Approach for General Sequential Logic Synthesis," Proceedings of the 25th DA Conference, June 1988, pp.3-8.
- [9] K. Kim, J. G. Tront, and D. S. Ha, "Automatic Insertion of BIST Hardware Using VHDL," Proceedings of the 25th DA Conference, June 1988, pp.9-15.
- [10] C. H. Gebotys and M. I. Elmasry, "VLSI Design Synthesis with Testability," Proceedings of the 25th DA Conference, June 1988, pp.16-21.
- [11] H. Ando, "Testing VLSI with Random Access Scan," Digest COMPCON 1980, February 1980, pp.50-52.