TTCAN - Time Triggered Controller Area Network


TTCAN - Time Triggered Controller Area Network
Abstract
As early as the 1950s electronic elements started to appear in passenger vehicles. Over the years the electronic content and complexity continued to grow in vehicles. In 1983, it was formally stated at Robert Bosch AG that a real-time communication link was required between three electronic control units: engine control, automatic transmission control and the anti-skid braking system. 

Despite the existence of a number of proprietary automotive multiplexing protocols, a new serial communications protocol emerged from Bosch's endeavor, the Controller Area Network (CAN). In mid 1987 the first working silicon for CAN became available. In 1993 CAN was standardized by the International Standardization Organization (ISO).
In time-triggered CAN the exchange of messages is controlled essentially by the temporal progression of time. The exchange of a specific message may only occur at a predefined point 'in relative' time during time-triggered operation of the protocol. This 'benchmark' in time, to which all other communication transactions are related, is defined by the start of frame (SOF) bit of a specific message known as the reference message. 

The reference message is transmitted either periodically (in time triggered mode) or on the occurrence of a specific external event (in event triggered mode).The reference message is recognized by all nodes participating in the TTCAN network by virtue of its CAN frame identifier. Each node synchronizes to the reference message, which provides a reference point in the temporal domain for the static schedule of the message transactions. 

The static schedule sequence is based on a time division access (TDA) scheme whereby message exchanges may only occur during specific time slots or time windows.

When the nodes are synchronized, any message can be transmitted at a specific time slot, without competing with other messages for the bus. Thus the loss of arbitration is avoided, the latency time becomes predictable. 

The TTCAN protocol is in the process of standardization by ISO TC22/SC3/ WG1/TF6 describes TTCAN on system level. This paper describes the implementation of TTCAN features into a CAN module and the evaluation of a TTCAN network.

Introduction
CAN is the dominating network for automotive applications. New concepts in automotive control systems require a time triggered communication. This is provided by TTCAN, ISO 11898-4 . The main features of TTCAN are the synchronization of the communication schedules of all CAN nodes in a network, the possibility to synchronize the communication schedule to an external time base, and the global system time.

TTCAN nodes are fully compatible with CAN nodes, both in the data link layer (ISO 11898-1) and in the physical layer; they use the same bus line and bus transceivers. Dedicated bus guardians are not needed in TTCAN nodes, bus conflicts between nodes are prevented by CAN’s non-destructive bitwise arbitration mechanism and by CAN’s fault confinement (error-passive, bus-off). 

Existing CAN controllers can receive every message in a TTCAN network, TTCAN controllers can operate in existing CAN networks. A gradual migration from CAN to TTCAN is possible. The minimum additional hardware that is required to enhance an existing CAN controller to time triggered operation is a local time base and a mechanism to capture the time base, the capturing triggered by bus traffic. 

Based on this hardware, which is already existent in some CAN controllers, it is possible to implement in software a TTCAN controller capable of TTCAN level 1. A TTCAN controller capable of TTCAN level 2, providing the full range of TTCAN features like global time, time mark interrupts, and time base synchronization, has to be implemented in silicon. 

A TTCAN controller can be seen as an existing CAN controller (e.g. Bosch’s C_CAN module) enhanced with a Frame Synchronization Entity FSE and with a trigger memory containing the node’s view of the system matrix.

The TTCAN test chip (TTCAN_TC) is a standalone TTCAN controller and has been produced as a solution to the hen/egg problem of hardware availability versus tool support and research. The TTCAN_TC supports both TTCAN level 1 and TTCAN level 2; its time triggered communication is not depending on software control.

The need for serial communication in vehicles
Many vehicles already have a large number of electronic control systems. The growth of automotive electronics is the result partly of the customer’s wish for better safety and greater comfort and partly of the government’s requirements for improved emission control and reduced fuel consumption. 

Control devices that meet these requirements have been in use for some time in the area of engine timing, gearbox and carburetor throttle control and in anti-block systems (ABS) and acceleration skid control (ASC).The complexity of the functions implemented in these systems necessitates an exchange of data between them. 

With conventional systems, data is exchanged by means of dedicated signal lines, but this is becoming increasingly difficult and expensive as control functions become ever more complex. In the case of complex control systems, the number of connections cannot be increased much further.

Moreover, a number of systems are being developed which implement functions covering more than one control device. For instance, ASC requires the interplay of engine timing and carburetor control in order to reduce torque when drive wheel slippage occurs. Another example of functions spanning more than one control unit is electronic gearbox control, where ease of gear changing can be improved by a brief adjustment to ignition timing.

Controller Area Network (CAN), an overview
CAN (Controller Area Network) is a serial bus system, which was originally developed for automotive applications in the early 1980's. The CAN protocol was internationally standardized in 1993 as ISO 11898-1 and comprises the data link layer of the seven layer ISO/OSI reference model. 

CAN, which is by now available from around 40 semiconductor manufacturers in hardware, provides two communication services: the sending of a message (data frame transmission) and the requesting of a message (remote transmission request, RTR). All other services such as error signaling, automatic re-transmission of erroneous frames are user-transparent, which means the CAN chip automatically performs these services.

The equivalent of the CAN protocol in human communication are e.g. the Latin characters. This means a CAN controller is comparable to a printer or a type writer. CAN users still have to define the language/grammar and the words/vocabulary for communication. 

CAN provides 

  • A multi-master hierarchy, which allows building intelligent and redundant systems. If one network node is defect the network is still able to operate. 
  • Broadcast communication. A sender of information transmits to all devices on the bus. All receiving devices read the message and then decide if it is relevant to them. This guarantees data integrity as all devices in the system use the same information. 
  • Sophisticated error detecting mechanisms and re-transmission of faulty messages. This also guarantees data integrity.


TTCAN Definition
The TTCAN (time-triggered communication on CAN) protocol is a higher layer protocol on top of the CAN (Controller Area Network) data link layer as specified in ISO 11898-1. It may use standardized CAN physical layers such as specified in ISO 11898-2 (high-speed transceiver) or in ISO 11898-3 (fault-tolerant low-speed transceiver).
 Time-triggered communication means that activities are triggered by the elapsing of time segments. In a time-triggered communication system all points of time of message transmission are defined during the development of a system. A time-triggered communication system is ideal for applications in which the data traffic is of a periodic nature.

CAN Vs TTCAN
CAN is the dominating network for automotive applications. New concepts in automotive control systems require a time triggered communication. This is provided by TTCAN, ISO 11898-4. The main features of TTCAN are the synchronization of the communication schedules of all CAN nodes in a network, the possibility to synchronize the communication schedule to an external time base, and the global system time.

 TTCAN nodes are fully compatible with CAN nodes, both in the data link layer (ISO 11898-1 [1]) and in the physical layer; they use the same bus line and bus transceivers. Dedicated bus guardians are not needed in TTCAN nodes, bus conflicts between nodes are prevented by CAN’s nondestructive bitwise arbitration mechanism and by CAN’s fault confinement (error-passive, bus-off).

Existing CAN controllers can receive every message in a TTCAN network, TTCAN controllers can operate in existing CAN networks. A gradual migration from CAN to TTCAN is possible. The minimum additional hardware that is required to enhance an existing CAN controller to time triggered operation is a local time base and a mechanism to capture the time base, the capturing triggered by bus traffic. Based on this hardware, which is already existent in some CAN controllers, it is possible to implement in software a TTCAN controller capable of TTCAN level 1.

 A TTCAN controller capable of TTCAN level 2, providing the full range of TTCAN features like global time, time mark interrupts, and time base synchronization, has to be implemented in silicon. A TTCAN controller can be seen as an existing CAN controller (e.g. Bosch’s C_CAN module) enhanced with a Frame Synchronization Entity FSE and with a trigger memory containing the node’s view of the system matrix (see Figure 1). 

The TTCAN test chip (TTCAN_TC) is a standalone TTCAN controller and has been produced as a solution to the hen/egg problem of hardware availability versus tool support and research. The TTCAN_TC supports both TTCAN level 1 and TTCAN level 2; its time triggered communication is not depending on software control. All synchronization mechanisms described in this paper are supported by the TTCAN_TC.

The cyclic message transfer of TTCAN level 1 can be implemented in software, based on existing CAN modules. Depending on the CAN bit rate and on the number of messages in the system matrix, the software approach may result in a high CPU load. For the evaluation of the TTCAN protocol, the hardware approach was chosen.

In parallel to the standardization process, Bosch develops an IP module that implements the TTCAN protocol. This IP module, the TT_CAN, is based on the existing C_CAN IP module and will be available as VHDL code to be synthesized in FPGAs, supporting the development of CAN based time triggered communication networks.
The C_CAN consists of the components CAN Core, Message RAM, Message Handler, Control Registers, and Module Interface. The CAN_Core performs communication according to the CAN protocol version 2.0, as defined in ISO 11898-1.

 The bit rate can be programmed to values up to 1MBit/s depending on the used technology. For the connection to the physical layer additional transceiver hardware is required. For communication on a CAN network, individual Message Objects are configured. The Message Objects and Identifier Masks for acceptance filtering of received messages are stored in the Message RAM.

All functions concerning the handling of messages are implemented in the Message Handler. Those functions are the acceptance filtering, the transfer of messages between the CAN Core and the Message RAM, and the handling of transmission requests as well as the control of the module interrupt. The register set of the C_CAN can be accessed directly by an external CPU via the module interface. These registers are used to control/configure the CAN Core and the Message Handler and to access the Message RAM.

Several Module Interfaces are available, including interfaces to ARM, Motorola and Texas Instruments CPUs. Compared to the C_CAN, the TT_CAN is expanded by two functional blocks, the Trigger Memory and the Frame Synchronization Entity FSE. The Trigger Memory stores the time marks of the system matrix that are linked to the messages in the Message RAM; the data is provided to the Frame Synchronization Entity.

The Frame Synchronization Entity is the state machine that controls the time triggered communication. It synchronizes itself to the reference messages on the CAN bus, controls the cycle time, and generates Time Triggers. It is divided into five blocks, the Time Base Builder TBB, the Cycle Time Controller CTC, and the Time Schedule

Time Base Builder generates the local time from the node’s system clock and the time unit ratio. In TTCAN level 1, the TUR is defined at configuration, in level 2; it is continuously adapted by the GTU. The Cycle Time Controller gets the local time from the TBB, the Frame_Synchronisation events from the CAN_Core, and the reference messages from the Message Handler. It captures the Sync_Mark and the Ref_Mark to generate the cycle time and controls the sequence of the basic cycles in the matrix cycle.

 Cycle Count (part of the reference message) identifies the actual basic cycle inside the matrix cycle. Depending on whether the node itself is time master (transmitter of reference messages), Cycle Count is either generated from a cyclic counter or it is received in the reference message.

 The Time Schedule Organizer maintains the message schedule inside a basic cycle and checks for scheduling errors. The schedule is defined by data in the Trigger Memory. The data consists of time mark (measured in cycle time), and function (trigger for transmission or check of reception), and is linked to a message in the Message RAM. The same time mark may be, selected by Cycle Count, defined for different messages at different basic cycles of the matrix cycle. Other time marks are defined for the Ref Trigger and the Watch Trigger. 

The TSO compares the time marks with the cycle time and activates the Time Triggers for messages with matching time marks. The function of the TSO depends on the actual operating state; transmissions are disabled when the node is not synchronized to the system. If the node is time master, the Ref Trigger causes the reference message to be transmitted. The Watch Trigger becomes active at the end of a basic cycle, when the expected start of a new basic cycle (completion of a reference message) does not occur. 

This event is causes the MSA to change the operating state.
The Master State Administrator controls the FSE’s operating state. The operating state depends on whether the node is synchronized to the network, whether it is (trying to become) time master or whether it is a backup time master. The synchronization state differentiates between synchronizing after the initialization, the active mode, the loss of synchronization, and the fault recovery.

The function of the other blocks is monitored. In case of errors, transmissions are disabled and the master state is resigned
The Application Operation Monitor checks the function of the application program. The application controller has to serve the Application Alive input regularly. If the application program fails, the application watchdog causes the MSA to disable all transmissions, preventing invalid data to disturb the system.

The Global Time Unit (TTCAN level 2 only) generates the node’s view of the global time and controls the drift correction of the local time. When the node is the first time master of the network, the node’s local time is the global time. When the node is not operating as time master, the difference between local Ref_Mark and Master_Ref_Mark received in the reference message is compared and defines the actual offset between the local time and the global time.

 The actual offset is updated at each start of a basic cycle; when the node becomes time master, the last offset is kept, avoiding a discontinuity in the global time. The Global_Ref_Mark (captured in global time) is provided as Master_Ref_Mark for reference messages to be transmitted. The GTU compensates the drift between the local time and the global time by calibrating the local time. If the node itself is the time master, no calibration is done. Each time a reference message is completed, the actual length of the base cycle is measured in local time (Ref_Mark - previous Ref_Mark) and in global time (Master_Ref_Mark – previous Master_Ref_Mark).

 The difference between the two measured values divided by the length of the base cycle shows the factor by which the local time has to be calibrated in order to compensate the drift. The compensation is performed by adapting the TUR the TBB uses to generate the local time from the node’s system clock. The calibration process is on hold when the node is not synchronized to the system and is (re-)started when it (re-)gains synchronization. Frequent significant changes in the measured drift indicate an unreliable local time base. Time base errors are signaled to the MSA, causing it to stop all TTCAN operations.

 In order to synchronize different TTCAN networks, or to provide a physical time base, the global time may be synchronized (via the time master) to an external clock, e.g. GPS. The TTCAN implementation is done in two steps. In the first step, only level 1 is implemented, without the global system time and drift compensation of level2. In the second step, after the evaluation of the TT_CAN IP module in a TTCAN network, a global time unit will be added to the module. 

Time Bases of the TTCAN Protocol
Each node has its own time base, Local Time, which is a counter that is incremented each Network Time Unit NTU. The length of the NTU is defined by the TTCAN network configuration; it is the same for all nodes. It is generated locally, based on the local system clock period t-sys and the local Time Unit Ratio TUR. Different system clocks in the nodes require different (non-integer) TUR values. In TTCAN level 1, TUR is a constant and Local_Time is a 16 bit integer value, incremented once each NTU. The NTU is the CAN bit time.

 In TTCAN level 2, Local_Time consists of a 16 bit integer value extended by a fractional part of N (at least three) bit. Local_Time is incremented 2N times each NTU, providing a higher time resolution than in level 1. TUR is a non-integer value and may be adapted to compensate clock drift or to synchronize to an external time base.

Cycle_Time
In the TTCAN network, the synchronization of the nodes is maintained by so-called Reference Messages that are transmitted periodically by a specific node, the time master. The Reference Message is a CAN data frame, characterized by its identifier.

 Valid Reference Messages are recognized synchronously (disregarding signal propagation time) by all nodes. Each valid Reference Message starts a new basic cycle and causes a reset of each node’s Cycle_Time.

The value of Local_Time is captured as Sync_Mark at the start of frame (SOF) bit of each message. When a message is recognized as a valid Reference Message, this message’s Sync_Mark becomes the new Ref_Mark; Cycle_Time is the actual difference between Local_Time and Ref_Mark, restarting at the beginning of each basic cycle when Ref_Mark is reloaded. 

Even in a software implementation of TTCAN, the capturing of Local_Time into Sync_Mark at each SOF must be done in hardware (see Figure 1). ISO 11898-1 specifies the necessary hardware interface as an optional feature; it is already implemented in some CAN controllers.

Global_Time
There are two levels of implementation in TTCAN, level 1 and level 2. In TTCAN level 1, the common time base is the Cycle_Time which is restarted at the beginning of each basic cycle and is based on each node’s Local_Time. In TTCAN level 2, there is additionally the Global_Time which is a continuous value for the whole network and is the reference for the calibration of all local time bases. The time master captures its view of Global_Time at each Sync_Mark and transmits that value in the Reference Message, as Master_Ref_Mark. 

For all nodes, Global_Time is the sum of their Local_Time and their Local_Offset, Local_Offset being the difference between their Ref_Mark in Local_Time and the Master_Ref_Mark in Global_Time, received (or transmitted) as part of the Reference Message. The Local_Offset of the current time is master zero if no other node has been the current time master since network initialization.

The phase drift between Local_Time and Global_Time is compensated at each received Reference Message by updating Local_Offset. Changes in Local_Offset show differences in the local node’s TU and the actual time master’s NTU. 

The actual clock speed difference is calculated by dividing the differences between two consecutive Master_Ref_Marks (measured in global NTUs) and two consecutive Ref_Marks (measured in local NTUs). The clock speed drift is compensated by adapting the pre-scalar (TUR) that generates the local NTU from the local system clock

The factor df by which the local NTU has to be adjusted is calculated according to the formula:
The calibration process is on hold when the node is not synchronized to the system, it is (re-)started when it (re-)gains synchronization. The necessary accuracy of the calibration is defined by the system’s requirement; a plausibility check for the value of df ensures that the length of the NTU remains in a predefined range. This calibration, together with the higher resolution for the NTU, provides a high precision time base.

After initialization, before synchronizing to the network, each node sees its own Local_Time as Global_Time, the Local_Offset is zero. The actual time master establishes its own Global_Time as the network’s Global_Time by transmitting its own Global_Sync_Marks in the Reference Message, as Master_Ref_Marks. When a backup time master becomes the actual time master, it keeps its Local_Offset value constant, avoiding a discontinuity of Global_Time.

Synchronizing the Global_Time
When the TTCAN communication is initialized, the actual time master may adjust the phase of Global_Time by adding an offset (Global_Time_Preset, see Figure 5) to the transmitted Master_Ref_Mark value, e.g. to synchronize to an external clock. Any such intended discontinuity of Global_Time is signaled in the Reference Message, by setting the Disc_Bit. Reference Messages with a set Disc_Bit are not used for clock calibration.
The actual time master may adjust the speed of Global_Time by adjusting its TUR value, the other nodes in the TTCAN network will calibrate their own clocks. The external time base used for the synchronization of Global_Time may be a reference clock like GPS or the Global_Time monitored in another
TTCAN network. 

Synchronizing the Cycle_Time
TTCAN has the option to synchronize the communication schedule to specific events in the time masters’ nodes. When the communication is to be synchronized, the cyclic message transfer is discontinued after the end of a basic cycle and a time gap may appear between the end of that basic cycle and the beginning of the next, event synchronized basic cycle. The current time master announces the time gap by setting the Next_is_Gap bit in the Reference Message. The time gap ends as soon as the current time master or one of the potential time masters sends a Reference Message to start the following basic cycle of the matrix cycle. The transmission of the Reference Message will be triggered by the occurrence of a specific event or after a maximum waiting time.

Time Schedule Organizer – TSO
This block is a state machine that maintains the message schedule inside a basic cycle. The TSO gets its view of the message schedule from an array of time triggers in the trigger memory. Each time trigger has a time mark that defines at which Cycle_Time the trigger becomes active. A Tx_Trigger specifies when a specific message shall be transmitted. An Rx_Trigger specifies when the reception of a specific message shall be checked.

A Tx_Ref_Trigger (_Gap) triggers the transmission of a Reference Message; it finishes the current basic cycle and starts a new cycle. Ref_Triggers are used by potential time masters only. A Watch_Trigger (_Gap) has a Time_Mark with a higher value than the Tx_Ref_Trigger (_Gap) and checks if the time since the last valid Reference Message has been too long. 

When in the last Reference Message the Next_is_Gap bit was set, the TSO ignores Tx_Ref_Trigger and Watch_Trigger, it uses Tx_Ref_Trigger_Gapand Watch_Trigger_Gap instead. In all other cases, Tx_Ref_Trigger and Watch_Trigger are used, Tx_Ref_Trigger_Gap and Watch_Trigger_Gap are ignored. The maximum time allowed for a time gap is the difference Tx_Ref_Trigger_Gap - Tx_Ref_Trigger.

Host controlled Synchronization
Figure 7 shows an example how the host application of the time master can synchronize the TTCAN network’s Cycle_Time. First the host requests the time master to transmit a Reference Message with the Next_is_Gap bit set. The time gap starts when the basic cycle started by that reference Message is finished. The message schedule is restarted when the host triggers the next Reference Message. If the host fails to trigger the Reference Message within a specified time, the TSO itself triggers the Reference Message when its Cycle_Time reaches Tx_Ref_Trigger_Gap.

Automatic Synchronization
The implementation of TTCAN in hardware allows implementing some additional features (not required by TTCAN protocol) that cannot be provided in software. An Event Trigger input can be used to trigger Reference Messages. In this mode, the time master transmits each Reference Message with Next_is_Gap bit set. The input level at the time master’s EVT pin controls the time gap:

How the TTCAN network functions
When data are transmitted by TTCAN, no stations are addressed, but instead, the content of the message (e.g. rpm or engine temperature) is designated by an identifier that is unique throughout the network. The identifier defines not only the content but also the priority of the message. This is important for bus allocation when several stations are competing for bus access.

If the CPU of a given station wishes to send a message to one or more stations, it passes the data to be transmitted and their identifiers to the assigned CAN chip (”Make ready”). This is all the CPU has to do to initiate data exchange. The message is constructed and transmitted by the CAN chip. As soon as the CAN chip receives the bus allocation (”Send Message”) all other stations on the CAN network become receivers of this message (”Receive Message”). Each station in the CAN network, having received the message correctly, performs an acceptance test to determine whether the data received are relevant for that station (”Select”). If the data are of significance for the station concerned they are processed (”Accept”), otherwise they are ignored.

A high degree of system and configuration flexibility is achieved as a result of the content-oriented addressing scheme. It is very easy to add stations to the existing CAN network without making any hardware or software modifications to the existing stations, provided that the new stations are purely receivers. 

Because the data transmission protocol does not require physical destination addresses for the individual components, it supports the concept of modular electronics and also permits multiple reception (broadcast, multicast) and the synchronization of distributed processes: measurements needed as information by several controllers can be transmitted via the network, in such a way that it is unnecessary for each controller to have its own sensor.

Non-destructive bitwise arbitration
For the data to be processed in real time they must be transmitted rapidly. This not only requires a physical data transfer path with up to 1 Mbit/s but also calls for rapid bus allocation when several stations wish to send messages simultaneously.

In real-time processing the urgency of messages to be exchanged over the network can differ greatly: a rapidly changing dimension (E.g. engine load) has to be transmitted more frequently and therefore with less delay than other dimensions (e.g. engine temperature) which change relatively slowly. The priority at which a message is transmitted compared with another less urgent message
is specified by the identifier of the message concerned. The priorities are laid down during system design in the form of corresponding binary values and cannot be changed dynamically. The identifier with the lowest binary number has the highest priority. Bus access conflicts are resolved by bitwise arbitration on the identifiers involved by each station observing the bus level bit for bit. In accordance with the”wired and” mechanism, by which the dominant state (logical 0) overwrites the recessive state (logical 1), the competition for bus allocation is lost by all those stations with recessive transmission and dominant observation. All”losers” automatically become receivers of the message with the highest priority and do not reattempt transmission until the bus is available again. The method of bitwise arbitration using the identifier of the messages to be transmitted uniquely resolves any collision between a number of stations wanting to transmit, and it does this at the latest within 13 (standard format) or 33 (extended format) bit periods for any bus access period. Unlike the message-wise arbitration employed by the CSMA/CD method this nondestructive method of conflict resolution ensures that no bus capacity is used without transmitting useful information. Even in situations where the bus is overloaded the linkage of the bus access priority to the content of the message proves to be a beneficial system attribute compared with existing CSMA/CD or token protocols: inspite of the insufficient bus transport capacity, all outstanding transmission requests are processed in order of their importance to the overall system (as determined by the message priority).The available transmission capacity is utilized efficiently for the transmission of useful data since ”gaps” in bus allocation are kept very small. The collapse of the whole transmission system due to overload, as can
occur with the CSMA/CD protocol, is not possible with CAN. Thus, CAN permits implementation of fast, traffic-dependent bus access which is non-destructive because of bitwise arbitration based on the message priority employed. 

Time Measurement in TTCAN
In TTCAN level 1, there are two time bases, the Local_Time and the Cycle_Time. In level 2, there is additionally Global_Time. The host application has read access to all time bases; it can store the actual time value read at specific events, e.g. controlled by an interrupt service routine. A hardware implementation of TTCAN permits some features that are not possible in a software implementation, like bus-time-based interrupts, a stop-watch function, and the event trigger EVT.

When EVT is high at the end of a basic cycle, a time gap is started. The Reference Message to end the time gap is triggered at the next falling edge of EVT (see Figure 8). If the falling edge does not occur within a specified time, the TSO itself triggers the Reference Message when its Cycle_Time reaches Tx_Ref_Trigger_Gap. No time gap is started when EVT is low at the end of a basic cycle; only falling edges that occur during a time gap can trigger a Reference Message.

Time Mark Interrupt
Local_Time, Cycle_Time, and Global_Time can be compared to a time mark interrupt register. When the selected time value matches the register value, an interrupt is generated. This event may trigger the CPU’s interrupt line or may be directly connected to an output port. The TMI output(s) can be used to synchronize the application to the TTCAN’s Cycle_Time or Global_Time.

The Reference Message
TTCAN is based on a time triggered and periodic communication which is clocked by a time master’s reference message. The reference message can be easily recognized by its identifier. Within TTCAN’s level 1 the reference message only holds some control information of one byte, the rest of a CAN message can be used for data transfer. In extension level 2, the reference message holds additional control information, e.g. the global time information of the current TTCAN time master. The reference message of level 2 covers 4 bytes while downwards compatibility is guaranteed. The remaining 4 bytes are open for data communication as well.

The System Matrix
Practice has shown that applications include many control loops and tasks with different periods. They all need individual sending patterns for their information. 

The TTCAN basic cycle would not offer enough flexibility to satisfy this need. The TTCAN specification allows using more than one basic cycle to build the communication matrix or system matrix of the systems engineer’s needs. Several basic cycles are connected to build the matrix cycle. Most patterns are possible, e.g. sending every basic cycle, sending every second basic cycle, or sending only once within the whole system matrix.TTCAN specification allows also another useful exception. The system matrix is highly column oriented. It may make sense to ignore the columns in the case of two or more arbitrating time windows in series. 

The most important constraint for this construct is that the starting point of a spontaneous message within this merged arbitrating window is not allowed if it will not fit in the remaining time window. The start of the next periodic time window must be guaranteed. This is the task of an off-line design tool used to build TTCAN system matrices. The automatic retransmission within a merged arbitrating time window is allowed as long as the constraint already described above is satisfied. 

Conclusion
TTCAN is based on the most successful automotive control network to date. It appends a set of new features to the existing CAN protocol through the introduction of a session layer protocol onto the CAN protocol stack. The original CAN protocol may exhibit performance limitations in certain hard real-time applications. The time-triggered solution provided by TTCAN offers improved reliability, determinism, and synchronization quality for current and future hard real-time distributed applications. Many semiconductor manufacturers have recognized the benefits and potential market for TTCAN, and are currently working on their TTCAN compliant devices. TTCAN promises to offer design engineers a new robust solution to hard real-time distributed applications.
read more "TTCAN - Time Triggered Controller Area Network"

Tiger SHARC Processor - Engineering Seminar


Tiger SHARC processor
ABSTRACT
           The Tiger SHARC processor is the newest and most power member of this family which incorporates many mechanisms like SIMD, VLIW and short vector memory access in a single processor. This is the first time that all these have been combined in a real time processor.
      The TigerSHARC DSP is an ultra high performance static superscalar architecture that optimized for tele-communications infrastructure and other computationally demanding applications.
       The unique architecture combines elements of RISC, VLIW and standard DSP processors to provide native support for 8, 16,and 32-bit fixed, as well as floating point data types on single chip. Large on-chip memory, extremely high internal and external bandwidths and dual compute blocks provide the necessary capabilities to handle a vast array of computationally demanding, large signal processing tasks.

INTRODUCTION
Analog and digital signals
               In many cases, the signal of interest is initially in the form of an analog electrical voltage or current, produced for example by a microphone or some other type of transducer. An analog signal must be converted into digital form before DSP techniques can be applied. An analog electrical voltage signal, for example, can be digitized using an electronic circuit called an analog-to-digital converter or ADC. This generates a digital output as a stream of binary numbers whose values represent the electrical voltage input to the device at each sampling instant.

Signal processing
               Signals commonly need to be processed in a variety of ways. For example, the output signal from a transducer may well be contaminated with unwanted electrical "noise". The electrodes attached to a patient's chest when an ECG is taken measure tiny electrical voltage changes due to the activity of the heart and other muscles. The signal is often strongly affected by "mains pickup" due to electrical interference from the mains supply. Processing the signal using a filter circuit can remove or at least reduce the unwanted part of the signal. Increasingly nowadays, the filtering of signals to improve signal quality or to extract important information is done by DSP techniques rather than by analog electronics.

Digital Signal Processing
         Digital signal processing (DSP) is the study of signals in a digital representation and the processing methods of these signals. DSP and analog signal processing are subfields of signal processing  Digital Signal
by mathematical operations. In comparison, word processing and similar programs  merely rearrange stored  data. This  means  that computers designed for business and other  general applications  are  not optimized for algorithms such as digital  filtering  and  Fourier analysis. Digital Signal Processors are microprocessors specifically designed  to handle Digital Signal Processing tasks. These devices have seen tremendous growth in the last decade, finding use in everything from cellular telephones to advanced scientific instruments. In fact, hardware engineers use "DSP" to mean Digital Signal Processor, just as algorithm developers use "DSP" to mean Digital Signal Processing

Digital Signal Processors (DSPs)
                 DSP processors are microprocessors designed to perform digital signal processing- the mathematical manipulation of digitally represented signals. The introduction of the microprocessor in the late 1970's and early 1980's made it possible for DSP techniques to be used in a much wider range of applications. However, general-purpose microprocessors such as the Intel x86 family are not ideally suited to the numerically-intensive requirements of DSP, and during the 1980's the increasing importance of DSP led several major electronics manufacturers


(such as Texas Instruments, Analog Devices and Motorola) to develop Digital Signal Processor chips - specialised microprocessors with architectures designed specifically for the types of operations required in digital signal processing. (Note that the acronym DSP can variously mean Digital Signal Processing, the term used for a wide range of techniques for processing signals digitally, or Digital Signal Processor, a specialised type of microprocessor chip). Like a general-purpose microprocessor, a DSP is a programmable device, with its own native instruction code. DSP chips are capable of carrying out millions of floating point operations per second, and like their better-known general-purpose cousins, faster and more powerful versions are continually being introduced. DSPs can also be embedded within complex "system-on-chip" devices, often containing both analog and digital circuitry.

Architecture of the Digital Signal Processor
                One of the biggest bottlenecks in executing DSP algorithms is transferring information to and from memory. This includes data, such as samples from the input signal and the filter coefficients, as well as program instructions, the binary codes that go into the program sequencer. For example, suppose we need to multiply two numbers that reside somewhere in memory. To do this, we must fetch three binary values from memory, the numbers to be multiplied, plus the program instruction describing what to do.

Von Neumann architecture
               Figure 1(a).shows how this seemingly simple task is done in a traditional microprocessor. This is often called a Von Neumann architecture, after the brilliant American mathematician John Von Neumann (1903-1957). Von Neumann guided the mathematics of many important discoveries of the early twentieth century. His many achievements include: developing the concept of a stored program computer, formalizing the mathematics of quantum mechanics, and work on the atomic bomb.
              As shown in (a), a Von Neumann architecture contains a single memory and a single bus for transferring data into and out of the central processing unit (CPU). Multiplying two numbers requires at least three clock cycles, one to transfer each of the three numbers over the bus from the memory to the CPU. We don't count the time to transfer the result back to memory, because we assume that it remains in the CPU for additional manipulation (such as the sum of products in an FIR filter). The Von Neumann design is quite satisfactory when you are content to execute all of the required tasks in serial. In fact, most computers today are of the Von Neumann

design. When an instruction is processed in such a processor, units of the processor not involved at each instruction phase wait idly until control is passed on to them. Increase in processor speed is achieved by making the individual units operate faster, but there is a limit on how fast they can be made to operate. So we need other architectures when very fast processing is required, and we are willing to pay the price of increased complexity.

Harvard architecture
             This leads us to the Harvard architecture, shown in (b). This is named for the work done at Harvard University in the 1940s under the leadership of Howard Aiken (1900-1973). As shown in this illustration, Aiken insisted on separate memories for data and program instructions, with separate buses for each. Since the buses operate independently, program instructions and data can be fetched at the same time, improving the speed over the single bus design. Most present day DSPs use this dual bus architecture.

Super Harvard Architecture(SHARC)
              Figure (c) illustrates the next level of sophistication, the Super Harvard Architecture. This term was coined by Analog Devices to describe the internal operation of their ADSP-2106x and new ADSP-211xx families of Digital Signal Processors. These are called SHARC® DSPs, a contraction of the longer term, Super Harvard ARChitecture. The idea is to build upon the Harvard architecture by adding features to improve the throughput. While the SHARC DSPs are optimized in dozens of ways, two areas are important enough to be included in Fig. (c): an instruction cache, and an I/O controller.
         
              A handicap of the basic Harvard design is that the data memory bus is busier than the program memory bus. When two numbers are multiplied, two binary values (the numbers) must be passed over the data memory bus, while only one binary value (the program instruction) is passed over the program memory bus. To improve upon this situation, we start by relocating part of the "data" to program memory. For instance, we might place the filter coefficients in program memory, while keeping the input signal in data memory. (This relocated data is called "secondary data" in the illustration). At first glance, this doesn't seem to help the situation; now we must transfer one value over the data memory bus (the input signal sample), but two values over the program memory bus (the program instruction and the coefficient). In fact, if we were executing random instructions, this situation would be no better at all.
                However, DSP algorithms generally spend most of their execution time in loops. This means that the same set of program instructions will continually pass from program memory to the CPU. The Super Harvard architecture takes advantage of this situation by including an instruction cache in the CPU. This is a small memory that contains about 32 of the most recent program instructions. The first time through a loop, the program instructions must be passed over the program memory bus. This results in slower operation because of the conflict with the coefficients that must also be fetched along this path. However, on additional executions of the loop, the program instructions can be pulled from the instruction cache. This means that all of the memory to CPU information transfers can be accomplished in a single cycle: the sample from the input signal comes over the data memory bus, the coefficient comes over the program memory bus, and the program instruction comes from the instruction cache. In the jargon of the field, this efficient transfer of data is called a high memory-access bandwidth.
                     
                   Just as important, dedicated hardware allows these data streams to be transferred directly into memory (Direct Memory Access, or DMA), without having to pass through the CPU's registers. The main buses (program memory bus and data memory bus) are also accessible from outside the chip, providing an additional interface to off-chip memory and peripherals. This allows the SHARC DSPs to use a four Gigaword (16 Gbyte) memory, accessible at 40 Mwords/second (160 Mbytes/second), for 32 bit data.

                           This type of high speed I/O is a key characteristic of DSPs. The overriding goal is to move the data in, perform the math, and move the data out before the next sample is available. Everything else is secondary. Some DSPs have on-board analog-to-digital and digital-to-analog converters, a feature called mixed signal. However, all DSPs can interface with external converters through serial or parallel ports.
             
           At the top of the diagram are two blocks labeled Data Address Generator (DAG), one for each of the two memories. These control the addresses sent to the program and data memories, specifying where the information is to be read from or written to. In simpler microprocessors this task is handled as an inherent part of the program sequencer, and is quite transparent to the programmer. However, DSPs are designed to operate with circular buffers, and benefit from the extra hardware to manage them efficiently. This avoids needing to use precious CPU clock cycles to keep track of how the data are stored. For instance, in the SHARC DSPs, each of the two DAGs can control eight circular buffers. This means that each DAG holds 32 variables (4 per buffer), plus the required logic.
            Some DSP algorithms are best carried out in stages. For instance, IIR filters are more stable if implemented as a cascade of biquads (a stage containing two poles and up to two zeros). Multiple stages require multiple circular buffers for the fastest operation. The DAGs in the SHARC DSPs are also designed to efficiently carry out the Fast Fourier transform. In this mode, the DAGs are configured to generate bit-reversed addresses into the circular buffers, a necessary part of the FFT algorithm. In addition, an abundance of circular buffers greatly simplifies DSP code generation- both for the human programmer as well as high-level language compilers, such as C.
                The data register section of the CPU is used in the same way as in traditional microprocessors. In the ADSP-2106x SHARC DSPs, there are 16 general purpose registers of 40 bits each. These can hold intermediate calculations, prepare data for the math processor, serve as a buffer for data transfer, hold flags for program control, and so on. If needed, these registers can also be used to control loops and counters; however, the SHARC DSPs have extra hardware registers to carry out many of these functions.
       
              The math processing is broken into three sections, a multiplier, an arithmetic logic unit (ALU), and a barrel shifter. The multiplier takes the values from two registers, multiplies them, and places the result into another register. The ALU performs addition, subtraction, absolute value, logical operations (AND, OR, XOR, NOT), conversion between fixed and floating point formats, and similar functions. Elementary binary operations are carried out by the barrel shifter, such as shifting, rotating, extracting and depositing segments, and so on. A powerful feature of the SHARC family is that the multiplier and the ALU can be accessed in parallel. In a single clock cycle, data from registers 0-7 can be passed to the multiplier, data from registers 8-15 can be passed to the ALU, and the two results returned to any of the 16 registers.
              There are also many important features of the SHARC family architecture that aren't shown in this simplified illustration. For instance, an 80 bit accumulator is built into the multiplier to reduce the round-off error associated with multiple fixed-point math operations. Another interesting feature is the use of shadow registers for all the CPU's key registers. These are duplicate registers that can be switched with their counterparts in a single clock cycle. They are used for fast context switching, the ability to handle interrupts quickly. When an interrupt occurs in traditional microprocessors, all the internal data must be saved before the interrupt can be handled. This usually involves pushing all of the occupied registers onto the stack, one at a time. In comparison, an interrupt in the SHARC family is handled by moving the internal data into the shadow registers in a single clock cycle. When the interrupt routine is completed, the registers are just as quickly restored. This feature allows step 4 on our list (managing the sample-ready interrupt) to be handled very quickly and efficiently.

               SHARC has 32/42 bit floating and fixed point core.DMA controller and duel ported SRAM to move data into and out of memory without wasting core cycles. It has high performance computation unit. It has four bus performances. They include fetch next instruction, access 2 data values, performs DMA for I/O device.

The TigerSHARC Processor
              Tiger sharc processors provide the highest performance density for multiplexing applications with peak performance and well above a billion floating point operations per second. One Gbyte/sec multiprocessing link ports gluelessly multiple Tiger sharc processors, and versions are available with up to 24 Mbits of integrated, on chip memory.

         Keeping pace with the accelerating march of architectural innovation in DSPs, Analog devices (ADI) unveiled its third generation floating point DSP,TIGERSHARC.
              There architect Jose Fridman described a complex, high performance VLIW-based design incorporating unusually extensive single-instruction, multiple data (SIMD) capabilities. Unlike its predecessors, which are primarily aimed at application demanding floating point arithmetic, TigerSHARc has excellent fixed point capabilities and is better described as 16-bit fixed point DSP with floating point support than as a floating point DSP.
            The TigerSHARC® Processor provides leading-edge system performance while keeping the highest possible flexibility in software and hardware development.
             The TigerSHARC Processor's balanced architecture utilizes characteristics of RISC, VLIW, and DSP to provide a flexible, "all software" approach that adds capacity while reducing costs and bills of material.

FEATURES

  • Static Superscalar Architecture 
  • Two 32 bit MACs per cycle with 80-bit accumulation 
  • Eight 16-bit MACs per cycle with 40-bit accumulation 
  • Two 16-bit complex MACs per cycle 
  • Add-subtract instruction and bit reversal in hardware for FFTs 
  • 64-bit generalised bit manipulation unit 
  • Two billion MACs per second at 250 MHz 
  • 2 billion 16-bit MACs 
  • 500 million 32-bit MACs 
  • 12 GB/s of internal memory bandwidth for data and code 
  • 500 MHz, 2.0 ns instruction cycle rate.
  • 12 Mbits of  internal on-chip –DRAM memory
  • Dual computation blocks, each containing an ALU,a multiplier, a shifter and a register file
  • Dual integer ALUs, providing and data addressing and pointer manipulation
  • Single precision IEEE 32-bit and extended bit precision 40-bit floating point data formats and 8-,16-,32- and 64 bit fixed point data formats.
  • Integrated I/O include 14 channel DMA controller, external port,progamable flag pins, two timers and timer expired pin for system integration.


DESCRIPTION
             TigerSHARC processor is an ultrahigh performance, static superscalar processor optimized for large signal processing tasks and communication infrastructure. The DSP combines very memory widths with dual computation blocks-supporting floating point (IEEE 32-bit and extended precision 40-bit) and fixed point (8-,16-,32-,64- bits) processing to set a new standard of performance for digital signal processors. The TigerSHARC static superscalar architecture lets the DSP execute up to four instructions each cycle, performing 24 fixed point (16-bit) operations. Four independent 128-bit wide internal data buses, each connecting to the six 2M bit memory banks, enable quad –word data, instruction, and I/O address and provide 28 Gbytes per second of internal memory bandwidth.

                Like its competititor Texas Instruments’ TMC320C64x, TigerSHARC uses a very long instruction word (VLIW) load/store architecture.TigerSHARC executes as any as four instructions per cycle with its interlocking ten-stage pipeline and dual computation blocks. Each block contains a multiplier, an ALU, and a 64 –bit shifter and can perform one 32-*32 bit or four 16-*16-bit multiply –accumulates (MAC) per cycle.

                         TigerSHARC is aimed at telecommunications infrastructure applications, such as cellular telephone base stations. As illustrated in fig. the TigerSHARC architecture contains a program control unit two computation units, two address generators memory various peripherals and a DMA controller. With its VLIW architecture TigerSHARC is capable of executing up to four instructions in a single cycle, and its SIMD features enable it to perform arithmetic operations on multiple 32-bit floating point values or multiple 32-,16- or 8-bit fixed point values.

         Each of TigerSHARC’s computation units can perform two 32*32=64-bit fixed point multiply-accumulates in a single cycle, using two operands each made up of two concatenated 32-bit registers. Thus using both computation units TigerSHARc can perform four 32*32=64-bit fixed –point multiply-accumulate operations in a single cycle. Alternatively, TigerSHARC can perform two 32-bit floating point MAC operations per cycle.  
                In fixed point DSP applications, the most common word width is 16-bits.With four 16-bit fixed point elements concatenated in two 32-bit registers, one computation unit can in a single cycle perform four 16*16=32-bit multiply-accumulate operations (with 8 guard bits each to avoid overflow)-twice as many as any currently available fixed or floating point DSP can perform.
           
              TigerSHARC uses SIMD features at two levels-two separate computation units that each operate on SIMD operands .Fig illustrates how the two SIMD computation units divide the registers into different data sizes.
                  TigerSHARC is the first of the new wave of VLIW –based DSPs to provide extensive SIMD capabilities. This approach provides greater parallelism than that of its Texas Instruments competitors.

                               On-chip memory is divided into three banks: one for soft-ware and two for data. ADI will not disclose the amount of on-chip memory in the first TigerSHARC devices, but we expect that the vendor will continue to be generous with on-chip memory; the predecessor SHARC and Hammerhead devices include 68K to 512K of on-chip memory. When moving 64-bit or 128-bit data, TigerSHARC transfers data from consecutive memory locations to consecutive data registers, or vice versa. The smallest amount of data that can be transferred is 32 bits. If TigerSHARC programs use word sizes of 8 or 16 bits in a DSP algorithm, they cannot access individual words; any load or store will transfer at least four 8-bit or two 16-bit words. The chip includes a data alignment buffer and a short data alignment buffer that allow 64 or 128 bits of data to be transferred from (but not to) any memory location aligned on a 16-bit word boundary. TigerSHARC provides more flexibility than most processors with SIMD features, which often require that data be aligned at memory locations divisible by the size of the data transfer.
           
              Data is transferred between the computation units and on-chip memory in blocks of 32-,64-,or128-bits.When moving 64-bit or 128-bit data,TigerSHARC transfers data from consecutive memory locations to consecutive data registers, or vice versa. The smallest amount of data that can be transferred is 32-bits.If TigerSHARC programs use word size of 8 or 16 bits in a DSP algorithm ,they cannot access individual words, any load or store will transfer at least four 8-bit or two 16-bit words.
         
          The chip includes a data alignment buffer and a short data alignment buffer that allow 64 or 128 bits of data to be transferred from (but not to be) any memory location aligned on a 16-bit word boundary.TigerSHARC provides more flexibility than most processors with SIMD features, which often require that data be aligned at memory locations divisible by the size of the data transfer.

FUNCTIONAL BLOCK DIAGRAM
Architectural Features
              Flexibility without compromise—the TigerSHARC® Processor provides leading-edge system performance while keeping the highest possible flexibility in software and hardware development.
                The TigerSHARC Processor's balanced architecture utilizes characteristics of RISC, VLIW, and DSP to provide a flexible, "all software" approach that adds capacity while reducing costs and bills of material.
                The TigerSHARC® Processor is an ultra-high performance static superscalar DSP optimized for multi-processing applications requiring computationally demanding large signal processing tasks. This document describes the key features of the TigerSHARC Processor architecture that combine to offer the highest performance, flexibility, efficiency and scalability available to equipment manufacturers in the marketplace today

Adapts to evolving signal processing demands
The TigerSHARC's unique ability to process 1-, 8-, 16- and 32-bit fixed-point as well as floating-point data types on a single chip allows original equipment manufacturers to adapt to evolving telecommunications standards without encountering the limitations of traditional hardware approaches .Having the highest performance DSP for communications infrastructure and multiprocessing applications available, TigerSHARC allows wireless infrastructure manufacturers to continue evolving their design to meet the needs of their target system, while deploying a highly optimized and effective Node B solution that will realize significant overall cost savings.

Multiprocessor, general-purpose processing
           The TigerSHARC Processor's balanced architecture optimizes system, cost, power, and density. A single TigerSHARC Processor, with its large on-chip memory, zero overhead DMA engine, large I/O throughput, and integrated multiprocessing support, has the necessary integration to be a complete node of a multiprocessing system.
            This enables a multiprocessor network exclusively made up of TigerSHARCs without any expensive and power consuming external memories or logic.

Instruction Parallelism and SIMD Operation
            As a static superscalar DSP, the TigerSHARC Processor core can execute simultaneously from one to four 32-bit instructions encoded in a single instruction line. With a few exceptions, an instruction line, whether it contains one, two, three or four 32-bit instructions, executes with a throughput of one cycle in an eight-deep processor pipeline. The TigerSHARC Processor has a set of instruction parallelism rules that programmers must follow when encoding an instruction line. In general, the selection of instruction the DSP can execute in parallel each cycle depends on the instruction line resources each requires and on the source and destination of registers used. The programmer has direct control of the three core components - the IALU, the Computation Blocks, and the Program Sequencer.
            In most cases the TigerSHARC Processor has a two-cycle execution pipeline that is fully interlocked, so whenever a computation result is unavailable for another operation dependent on it, stall cycles are automatically inserted. Efficient

programming with dependency-free instructions can eliminate most computational and memory transfer dependencies. All of the instruction parallel rules and data dependencies are documented in the TigerSHARC Processor User's Guide.
             The TigerSHARC Processor also has the capability of supporting single-instruction, multiple-data SIMD operations through the use of both Computational Blocks in parallel as well as the use of SIMD specific computations. The programmer has the option of directing both Computation Blocks to operate on the same data (broadcast distribution) or different data (merged distribution). In addition, each Computation Block can execute four 16-bit or eight 8-bit SIMD computations in parallel.

Independent, Parallel Computation Blocks
            As mentioned above, the TigerSHARC Processor has two Computation Blocks that can operate either independently, in parallel or as a SIMD engine. The DSP can issue up to two compute instructions per Computation Block per cycle, instructing the ALU, multiplier or shifter to perform independent, simultaneous operations. The Computation Blocks each contain four computational units, an ALU, a multiplier, a 64-bit shifter, a CLU (ADSP-TS201S only) and a 32-bit register file.
             The 32-bit word, multi-ported register files are used for transferring data between the computational units and data buses, and for storing intermediate results. Instructions can access the registers in the register file individually (word-aligned) or in sets of two (dual-aligned) or four (quad-aligned). The ALU performs a standard set of arithmetic operations in both fixed-point and floating-point formats, while also performing logic operations. The multiplier performs both fixed-point and floating-

point multiplication as well as fixed-point multiply and accumulates. The 64-bit shifter performs logical and arithmetic shifts, bit and bit-stream manipulation, and field deposit and extraction.

CLU (Communications Logic Unit)
               The CLU on the ADSP-TS201S is a 128-bit unit which houses enhanced acceleration instructions specifically targeted at increasing the amount of Complex Multiplies per cycle and improving the Decoding efficiency of the TigerSHARC device. The CLU is not available on the ADSP-TS202S and ADSP-TS203S.

Integer ALUs
              The TigerSHARC Processor has two integer ALUs (IALUs) that provide powerful address generation capabilities and perform many general-purpose integer operations. Each IALU has a multi-ported 31-word register file. As address generators, the IALUs perform immediate or indirect (pre- and post-modify) addressing. They perform modulus and bit-reverse operations with no constraints placed on memory addresses for data buffer placement. Each IALU can specify either a single, dual- or quad- word access from memory.
             The TigerSHARC Processor IALUs enable implementation of circular buffers in hardware. Circular buffers facilitate efficient programming of delay lines and other data structures required in digital signal processing, and they are commonly used in digital filters and Fourier transforms. Each IALU provides registers for four circular buffers, so applications can set up a total of eight circular buffers. The IALUs handle address pointer wraparound automatically, reducing overhead, increasing performance, and simplifying implementation.

            Circular buffers can start and end at any memory location. Because the IALU's computational pipeline is one cycle deep, in most cases integer results are available in the next cycle. Hardware (register dependency check) causes a stall if a result is unavailable in a given cycle.

TigerSHARC Memory Integration
The large on-chip memory is divided into three separate blocks of equal size. Each block is 128-bits wide, offering the quad word structure and four addresses for every row. For data accesses, the processor can address one 32-bit word or two 32-bit words (long) or four 32-bit words (quad) and transfer it to/from a single computational unit or to both in a single processor cycle. The user only has to care that the start addresses are either modulo two or modulo four addresses when fetching long words and quad words. In applications that require computing data of a delay line in which the start address of the variable does not match the modulo requirements, or in other applications that require unaligned data fetches a data alignment buffer (DAB) is provided. Once the DAB is filled, quad word fetches can be made from it.Besides the internal memory, the TigerSHARC can access up to four giga words of memory.

Program Sequencer
                The TigerSHARC Processor Program Sequencer manages program structure and program flow by supplying addresses to memory for instruction fetches. Contained within the Program Sequencer, the Instruction Alignment Buffer (IAB) caches up to five fetched instruction lines waiting to execute. The Program Sequencer extracts an instruction line from the IAB and distributes it to the appropriate core component for execution. Other Program Sequencer functions include; determining flow according to instructions such as JUMP, CALL, RTI and RTS, decrement the loop counters, handle hardware interrupts and using branch prediction and 128-entry Branch Target Buffer (BTB) to reduce branch delays for efficient execution of conditional and unconditional branch instructions.

Flexible Integrated Memory
                 The ADSP-TS20xS family has three memory variants. The ADSP-TS201S has 24Mbits of on-chip embedded DRAM memory, divided into six blocks of 4Mbits (128 K words X 32-bits); the ADSP-TS202S has 12Mbits of on-chip embedded DRAM memory, divided into six blocks of 2Mbits (64 K words X 32-bits); the ADSP-TS203S has 4Mbits of on-chip embedded DRAM memory, divided into four blocks of 1Mbit (16 K words X 32-bits). On all variants, each block can store program memory, data memory or both, so programmers can configure the memory to suit their specific needs. The six memory blocks connect to the four 128-bit wide internal buses through a crossbar connection, enabling four memory transfers in the same cycle. The internal bus architecture of the ADSP-TS20xS family provides a total memory bandwidth of 32 Gbytes/second, enabling the core and I/O to access twelve 32-bit data words four 32-bit instructions per cycle.

 DMA Controller
            The TigerSHARC Processor on-chip DMA controller, with fourteen DMA channels, provides zero-overhead data transfers without processor intervention. The DMA controller operates independently and invisibly to the DSP's core, enabling DMA operations to occur while the core continues to execute program instructions.
             The DMA controller performs routine functions such as external port block transfers, link port transfers and AutoDMA transfers as well as additional features such as Flyby transfers, DMA chaining and Two-dimensional transfers.

 Link Ports
            The ADSP-TS201S and ADSP-TS202S have four full-duplex link ports each providing four-bit receive and four-bit transmit I/O capability, using Low-Voltage, Differential-Signal (LVDS) technology. With the ability to operate at a double data rate running at 500 MHz, each link can support up to 500 Mbytes per second per direction, for a combined maximum throughput of 4 Gbytes per second.
                The ADSP-TS203S has two full-duplex link ports each providing four-bit receive and four-bit transmit I/O capability, using Low-Voltage, Differential-Signal (LVDS) technology. With the ability to operate at a double data rate running at 250 MHz, each link can support up to 500 Mbytes per second per direction, for a combined maximum throughput of 4 Gbytes per second.
          Each Link Port has its own triple-buffered quad-word input and double-buffered quad-word output registers. The DSP's core can write directly to a Link Port's transmit register and read from a receive register, or the DMA controller can perform DMA transfers through eight dedicated Link Port DMA channels.

External Port
           The external port on TigerSHARC Processor is 64 bits wide and runs up to 125MHz. Using the external port, up to 8 TigerSHARC Processor's, a host and global memory can be shared without any external logic. This is the second way, in addition to link ports, that TigerSHARC DSP offers support for multiprocessor systems. SDRAM and SBSRAM controllers allow for a glueless interface to these types of memories. The external port also supports a fly by mode which allows a host to access a global shared memory.

Applications
                   At a 250 MHz clock rate, the ADSP-TS101S [TigerSHARC] offers a DSP industry-best 1500 MFLOPS peak performance and has native support for 8, 16, 32, and 40-bit data types. With a 1.5 watt typical power dissipation, 6 Mbits of on-chip memory, 14 channel zero-overhead DMA engine, integrated SDRAM controller, parallel host interface, cluster multiprocessing support, and link port multiprocessing support, the TigerSHARC is ideal for heat sensitive multiprocessing applications.
Here are some of the target applications for floating-point DSPs:
"TigerSHARC's exceptional speed and functionality are suited for applications in:
Defense - sonar, radar, digital maps, munitions guidance
Medical - ultrasound, CT scanners, MRI, digital X-ray
Industrial systems - data acquisition, control, test, and inspection systems
Video processing - editing, printers, copiers
Wireless Infrastructure - GSM, EDGE, and 3G cellular base stations."

Advantages of Tiger SHARC Processor
               The Analog Devices TigerSHARC® Processor architecture provides the greatest marriage of performance and flexibility enabling the most cost effective solution for baseband processing and other applications within the Wireless Infrastructure market space today. Wireless Infrastructure manufacturers can consider many approaches when developing baseband modem solutions for third generation wireless communications networks (3G), however the TigerSHARC Processor architecture provides the balance of attributes required to satisfy the entire range of challenges facing their 3G deployments.
                     The TigerSHARC Processor is the heart of a software defined solution for baseband modems where all of the implementation occurs in software rather than in hardware as is the approach taken by ASIC and other competing DSP solutions. The TigerSHARC Processor allows for the infrastructure vendor to establish a single baseband processing platform for all of the 3G standards with easily implemented software changes to update functionality and speed time to market.
             The very powerful architecture of the TigerSHARC, combining the best elements of RISC and DSP cores, is highly suited to deliver the performance required for upcoming applications in 3G mobile communications, xDSL technologies and imaging systems. The Static Superscalar architecture maintains determinism for security-sensitive applications and the high number of internal registers allows the efficient use of a high-level language, speeding up the development process of the designers.

Conclusion
                        As a result of its "Load Balancing" capabilities, high internal and external bandwidth, large integrated memory and unmatched level of flexibility, the TigerSHARC Processor proves to be an unconventional but extremely effective solution for baseband signal processing. In future generations of the TigerSHARC Processor we intend to continue the trend towards reduced systems cost and component count while increasing the functionality of the solution through clock speed enhancements and an expanded instruction set.
read more "Tiger SHARC Processor - Engineering Seminar"

10 Gigabit Ethernet Technology - Seminar paper


10 Gigabit Ethernet Technology
INTRODUCTION
From its origin more than 25 years ago, Ethernet has evolved to meet the increasing demands of packet-switched networks. Due to its proven low implementation cost, its known reliability, and relative simplicity of installation and maintenance, its popularity has grown to the point that today nearly all traffic on the Internet originates or ends with an Ethernet connection. Further, as the demand for ever-faster network speeds has grown, Ethernet has been adapted to handle these higher speeds and the concomitant surges in volume demand that accompany them.

The One Gigabit Ethernet standard is already being deployed in large numbers in both corporate and public data networks, and has begun to move Ethernet from the realm of the local area network out to encompass the metro area network. Meanwhile, an even faster 10 Gigabit Ethernet standard is nearing completion. This latest standard is being driven not only by the increase in normal data traffic but also by the proliferation of new, bandwidth-intensive applications.

The draft standard for 10 Gigabit Ethernet is significantly different in some respects from earlier Ethernet standards, primarily in that it will only function over optical fiber, and only operate in full-duplex mode, meaning that collision detection protocols are unnecessary. Ethernet can now step up to 10 gigabits per second, however, it remains Ethernet, including the packet format, and the current capabilities are easily transferable to the new draft standard.

In addition, 10 Gigabit Ethernet does not obsolete current investments in network infrastructure. The task force heading the standards effort has taken steps to ensure that 10 Gigabit Ethernet is interoperable with other networking technologies such as SONET. The standard enables Ethernet packets to travel across SONET links with very little inefficiency.

Ethernet’s expansion for use in metro area networks can now be expanded yet again onto wide area networks, both in concert with SONET and also end-to-end Ethernet. With the current balance of network traffic today heavily favoring packet-switched data over voice, it is expected that the new 10 Gigabit Ethernet standard will help to create a convergence between networks designed primarily for voice, and the new data centric networks.


10 GIGABIT ETHERNET TECHNOLOGY OVERVIEW
The 10 Gigabit Ethernet Alliance (10GEA) was established in order to promote standards-based 10 Gigabit Ethernet technology and to encourage the use and implementation of 10 Gigabit Ethernet as a key networking technology for connecting various computing, data and telecommunications devices. The charter of the 10 Gigabit Ethernet Alliance includes:

  • Supporting the 10 Gigabit Ethernet standards effort conducted in the IEEE 802.3 working group
  • Contributing resources to facilitate convergence and consensus on technical specifications
  • Promoting industry awareness, acceptance, and advancement of the 10 Gigabit Ethernet standard
  • Accelerating the adoption and usage of 10 Gigabit Ethernet products and services
  • Providing resources to establish and demonstrate multi-vendor interoperability and generally encourage and promote interoperability and interoperability events
  • Fostering communications between suppliers and users of 10 Gigabit Ethernet technology and products


The 10 Gigabit Ethernet Alliance

The purpose of the 10 Gigabit Ethernet proposed standard is to extend the 802.3 protocols to an operating speed of 10 Gbps and to expand the Ethernet application space to include WAN links. This will provide for a significant increase in bandwidth while maintaining maximum compatibility with the installed base of 802.3 interfaces, previous investment in research and development, and principles of network operation and management.

In order to be adopted as a standard, the IEEE’s 802.3ae Task Force has established five criteria that the new 10 Gigabit Ethernet P (proposed) standard must meet:


  • It must have broad market potential, supporting a broad set of applications, with multiple vendors supporting it, and multiple classes of customers.
  • It must be compatible with other existing 802.3 protocol standards, as well as with both Open Systems Interconnection (OSI) and Simple Network Management Protocol (SNMP) management specifications.
  • It must be substantially different from other 802.3 standards, making it a unique solution for a problem rather than an alternative solution.
  • It must have demonstrated technical feasibility prior to final ratification.
  • It must be economically feasible for customers to deploy, providing reasonable cost, including all installation and management costs, for the expected performance increase.



The 10 Gigabit Ethernet Standard
Under the International Standards Organization’s Open Systems Interconnection (OSI) model, Ethernet is fundamentally a Layer 2 protocol. 10 Gigabit Ethernet uses the IEEE 802.3 Ethernet Media Access Control (MAC) protocol, the IEEE 802.3 Ethernet frame format, and the minimum and maximum IEEE 802.3 frame size. 

Just as 1000BASE-X and 1000BASE-T (Gigabit Ethernet) remained true to the Ethernet model, 10 Gigabit Ethernet continues the natural evolution of Ethernet in speed and distance. Since it is a full-duplex only and fiber-only technology, it does not need the carrier-sensing multiple-access with collision detection (CSMA/CD) protocol that defines slower, half-duplex Ethernet technologies. In every other respect, 10 Gigabit Ethernet remains true to the original Ethernet model.

An Ethernet PHYsical layer device (PHY), which corresponds to Layer 1 of the OSI model, connects the media (optical or copper) to the MAC layer, which corresponds to OSI Layer 2. Ethernet architecture further divides the PHY (Layer 1) into a Physical Media Dependent (PMD) and a Physical Coding Sublayer (PCS). Optical transceivers, for example, are PMDs. The PCS is made up of coding (e.g., 64/66b) and a serializer or multiplexing functions.

The 802.3ae specification defines two PHY types: the LAN PHY and the WAN PHY (discussed below). The WAN PHY has an extended feature set added onto the functions of a LAN PHY. These PHYs are solely distinguished by the PCS. There will also be a number of PMD types.


10 Gigabit Ethernet in the Marketplace
The accelerating growth of worldwide network traffic is forcing service providers, enterprise network managers and architects to look to ever higher-speed network technologies in order to solve the bandwidth demand crunch. Today, these administrators typically use Ethernet as their backbone technology. Although networks face many different issues, 10 Gigabit Ethernet meets several key criteria for efficient and effective high-speed networks: 

  • Easy, straightforward migration to higher performance levels without disruption,
  • Lower cost of ownership vs. current alternative technologies – including both acquisition and support costs
  • Familiar management tools and common skills base
  • Ability to support new applications and data types
  • Flexibility in network design
  • Multiple vendor sourcing and proven interoperability

Managers of enterprise and service provider networks have to make many choices when they design networks. They have multiple media, technologies, and interfaces to choose from to build campus and metro connections: Ethernet (100, 1000,and 10,000 Mbps), OC-12 (622 Mbps) and OC-48 (2.488 Gbps), SONET or equivalent SDH network, packet over SONET/SDH (POS), and the newly authorized IEEE 802 Task Force (802.17) titled Resilient Packet Ring.

Network topological design and operation has been transformed by the advent of intelligent Gigabit Ethernet multi-layer switches. In LANs, core network technology is rapidly shifting to Gigabit Ethernet and there is a growing trend towards Gigabit Ethernet networks that can operate over metropolitan area distances.

The next step for enterprise and service provider networks is the combination of multi-gigabit bandwidth with intelligent services, leading to scaled, intelligent, multi-gigabit networks with backbone and server connections ranging up to 10 Gbps.

In response to market trends, Gigabit Ethernet is currently being deployed over tens of kilometers in private networks. With 10 Gigabit Ethernet, the industry has developed a way to not only increase the speed of Ethernet to 10 Gbps but also to extend its operating distance and interconnectivity. In the future, network managers will be able to use 10 Gigabit Ethernet as a cornerstone for network architectures that encompass LANs, MANs and WANs using Ethernet as the end-to-end, Layer 2 transport method.

Ethernet bandwidth can then be scaled from 10 Mbps to 10 Gbps – a ratio of 1 to 1000 — without compromising intelligent network services such as Layer 3 routing and layer 4 to layer 7 intelligence, including quality of service (QoS), class of service (CoS), caching, server load balancing, security, and policy based networking capabilities. Because of the uniform nature of Ethernet across all environments when IEEE 802.3ae is deployed, these services can be delivered at line rates over the network and supported over all network physical infrastructures in the LAN, MAN, and WAN. At that point, convergence of voice and data networks, both running over Ethernet, becomes a very real option. And, as TCP/IP incorporates enhanced services and features, such as packetized voice and video, the underlying Ethernet can also carry these services without modification.

As we have seen with previous versions of Ethernet, the cost for 10 Gbps communications has the potential to drop significantly with the development of new technologies. In contrast to 10 Gbps telecommunications lasers, the 10 Gigabit Ethernet short links — less than 40km over single-mode (SM) fiber — will be capable of using lower cost, uncooled optics and, in some cases, vertical cavity surface emitting lasers (VCSEL), which have the potential to lower PMD costs. In addition, the industry is supported by an aggressive merchant chip market that provides highly integrated silicon solutions. Finally, the Ethernet market tends to spawn highly competitive start-ups with each new generation of technology to compete with established Ethernet vendors.


Applications
10 Gigabit Ethernet in the Metro
Vendors and users generally agree that Ethernet is inexpensive, well understood, widely deployed and backwards compatible from Gigabit switched down to 10 Megabit shared. Today a packet can leave a server on a short-haul optic Gigabit Ethernet port, move cross-country via a DWDM (dense wave division multiplexing) network, and find its way down to a PC attached to a “thin coax” BNC (Bayonet Neill Concelman) connector, all without any re-framing or protocol conversion. Ethernet is literally everywhere, and 10 Gigabit Ethernet maintains this seamless migration in functionality.

Gigabit Ethernet is already being deployed as a backbone technology for dark fiber metropolitan networks. With appropriate 10 Gigabit Ethernet interfaces, optical transceivers and single mode fiber, service providers will be able to build links reaching 40km or more. (See Figure 4.)


10 Gigabit Ethernet in Local Area Networks
Ethernet technology is already the most deployed technology for high performance LAN environments. With the extension of 10 Gigabit Ethernet into the family of Ethernet technologies, the LAN now can reach farther and support up coming bandwidth hungry applications. Similar to Gigabit Ethernet technology, the 10 Gigabit proposed standard supports both singlemode and multi-mode fiber mediums. However in 10 Gigabit Ethernet, the distance for single-mode fiber has expanded from the 5km that Gigabit Ethernet supports to 40km in 10 Gigabit Ethernet.

The advantage for the support of longer distances is that it gives companies who manage their own LAN environments the option of extending their data centers to more cost-effective locations up to 40km away from their campuses. This also allows them to support multiple campus locations within that 40km range. Within data centers, switch-to-switch applications, as well as switch to server applications, can also be deployed over a more cost effective multi-mode fiber medium to create 10 Gigabit Ethernet backbones that support the continuous growth of bandwidth hungry applications. (See Figure 5.)

With 10 Gigabit backbones installed, companies will have the capability to begin providing Gigabit Ethernet service to workstations and, eventually, to the desktop in order to support applications such as streaming video, medical imaging, centralized applications, and high-end graphics. 10 Gigabit Ethernet will also provide lower network latency due to the speed of the link and over-provisioning bandwidth to compensate for the bursty nature of data in enterprise applications.

10 Gigabit Ethernet in the Storage Area Network
Additionally, 10 Gigabit Ethernet will provide infrastructure for both network-attached storage (NAS) and storage area networks (SAN). Prior to the introduction of 10 Gigabit Ethernet, some industry observers maintained that Ethernet lacked sufficient horsepower to get the job done. Ethernet, they said, just doesn’t have what it takes to move “dump truck loads worth of data.” 10 Gigabit Ethernet, can now offer equivalent or superior data carrying capacity at similar latencies to many other storage networking technologies including 1 or 2 Gigabit Fiber Channel, Ultra160 or 320 SCSI, ATM OC-3, OC-12 & OC-192,and HIPPI (High Performance Parallel Interface). While Gigabit Ethernet storage servers, tape libraries and compute servers are already available, users should look for early availability of 10 Gigabit Ethernet end-point devices in the second half of 2001.

There are numerous applications for Gigabit Ethernet in storage networks today, which will seamlessly extend to 10 Gigabit Ethernet as it becomes available. (See Figure 6.) These include:


  • Business continuance/disaster recovery
  • Remote backup
  • Storage on demand
  • Streaming media


10 Gigabit Ethernet in Wide Area Networks
10 Gigabit Ethernet will enable Internet service providers (ISP) and network service providers (NSPs) to create very highspeed links at a very low cost, between co-located, carrier-class switches and routers and optical equipment that is directly attached to the SONET/SDH cloud. 10 Gigabit Ethernet with the WAN PHY will also allow the construction of WANs that connect geographically dispersed LANs between campuses or POPs (points of presence) over existing SONET/SDH/TDM networks. 10 Gigabit Ethernet links between a service provider’s switch and a DWDM (dense wave division multiplexing) device or LTE (line termination equipment) might in fact be very short — less than 300 meters. (See Figure 7.)


The 10 Gigabit Ethernet Technology 10GbE Chip Interfaces
Among the many technical innovations of the 10 Gigabit Ethernet Task Force is an interface called the XAUI (10 Gigabit Attachment Unit Interface). It is a MAC-PHY interface, serving as an alternative to the XGMII (10 Gigabit Media Independent Interface). XAUI is a low pin-count differential interfaces that enables lower design costs for system vendors.

  The XAUI is designed as an interface extender for XGMII, the 10 Gigabit Media Independent Interface. The XGMII is a 74 signal wide interface (32-bit data paths for each of transmit and receive) that may be used to attach the Ethernet MAC to its PHY. The XAUI may be used in place of, or to extend, the XGMII in chip-to-chip applications typical of most Ethernet MAC to PHY interconnects. (See Figure 8.) 

The XAUI is a low pin count, self-clocked serial bus that is directly evolved from the Gigabit Ethernet 1000BASE-X PHY. The XAUI interface speed is 2.5 times that of 1000BASE-X. By arranging four serial lanes, the 4-bit XAUI interface supports the ten-times data throughput required by 10 Gigabit Ethernet. 


Physical Media Dependent (PMDs)
The IEEE 802.3ae Task Force has developed a draft standard that provides a physical layer that supports link distances for fiber optic media as shown in Table A.

To meet these distance objectives, four PMDs were selected. The task force selected a 1310 nanometer serial PMD to meet its 2km and 10km single-mode fiber (SMF) objectives. It also selected a 1550 nm serial solution to meet (or exceed) its 40km SMF objective. Support of the 40km PMD is an acknowledgement that Gigabit Ethernet is already being successfully deployed in metropolitan and private, long distance applications. An 850 nanometer PMD was specified to achieve a 65-meter objectiveover multimode fiber using serial 850 nm transceivers.

Additionally, the task force selected two versions of the wide wave division multiplexing (WWDM) PMD, a 1310 nanometer version over single-mode fiber to travel a distance of 10km and a 1310 nanometer PMD to meet its 300-meter-over-installedmultimode- fiber objective.

Physical Layer (PHYs)
The LAN PHY and the WAN PHY will operate over common PMDs and, therefore, will support the same distances. These PHYs are distinguished solely by the Physical Encoding Sublayer (PCS). (See Figure 7.) The 10 Gigabit LAN PHY is intended to support existing Gigabit Ethernet applications at ten times the bandwidth with the most cost-effective solution. Over time, it is expected that the LAN PHY will be used in pure optical switching environments extending over all WAN distances. However, for compatibility with the existing WAN network, the 10 Gigabit Ethernet WAN PHY supports connections to existing and future installations of SONET/SDH (Synchronous Optical Network/ Synchronous Digital Hierarchy) circuit-switched telephony access equipment.

The WAN PHY differs from the LAN PHY by including a simplified SONET/SDH framer in the WAN Interface Sublayer (WIS). Because the line rate of SONET OC-192/ SDH STM-64 is within a few percent of 10 Gbps, it is relatively simple to implement a MAC that can operate with a LAN PHY at 10 Gbps or with a WAN PHY payload rate of approximately 9.29 Gbps. (See Figure 9.). Appendix III provides a more in depth look at the WAN PHY.

Conclusion
As the Internet transforms longstanding business models and global economies, Ethernet has withstood the test of time to become the most widely adopted networking technology in the world. Much of the world’s data transfer begins and ends with an Ethernet connection. Today, we are in the midst of an Ethernet renaissance spurred on by surging E-Business and the demand for low cost IP services that have opened the door to questioning traditional networking dogma. Service providers are looking for higher capacity solutions that simplify and reduce the total cost of network connectivity, thus permitting profitable service differentiation, while maintaining very high levels of reliability.
read more "10 Gigabit Ethernet Technology - Seminar paper"