ISO Reference Model Used in Industrial Automation

We are familiar with the seven-layer ISO Open Systems Interconnection Reference Model used to define communication tasks. The Internet version collapses this model down to five levels, and that is what best describes Industrial Automation communication. For Ethernet networks we would typically have 100BASE-TX at the physical layer, 802.3 Ethernet at the data link layer, the Internet Protocol (IP) at the network layer, either the Transmission Control Protocol (TCP) or the User Datagram Protocol (UDP) at the transport layer and finally one of the several industrial automation protocols at the application layer. Application layer examples would include Modbus/TCP, EtherNet/IP, and BACnet/IP. They differ in how they use the services at the lower layers. Modbus uses the services of TCP while EtherNet/IP uses UDP for implicit messaging. BACnet uses UDP. The type of service used could have an impact on recovery time.



TCP and UDP are both transport layer protocols that operate quite differently. TCP is connection-based guaranteeing the delivery of a message while UDP is connectionless and only provides best effort delivery services. With TCP corrupted or failed packets are automatically resent. With UDP there is no automatic acknowledgement of a successful receipt of a packet. That task is left to the application layer. Industrial automation protocols typically take this approach since it improves the real-time performance of the network. This was a consideration as we tested the various redundancy schemes for recovery responsiveness.


Obtaining Empirical Recovery Time Data

In order to test for recovery times, we needed to create a representative network. Instead of using commercial programmable logic controllers (PLCs) and Ethernet-based input/output (I/O) devices, we used two PCs running custom programs. By using PCs, we can construct the length and type of Ethernet frame we want, and send it out as either a TCP segment or a UDP datagram. We could vary the transmission rate to simulate the performance of a PLC. The first PC is called the master. It will function as the host PLC by initiating a very simple master/slave protocol to the second PC that is functioning as a slave. It only responds to the command that came from the master. The master/slave protocol is the most common protocol used in industrial networks. Instead of the master commanding the slave to report its input status or to set the slave's outputs, we have instructed the slave to immediately repeat the command it has received. The slave is therefore functioning as an echo server, simply repeating what it has heard as fast as it can. The master awaits the response and matches it against the command to verify the message integrity. Once confirmed, another command is immediately sent. By fully understanding the average time for a response, it becomes very easy to determine the time for an abnormal response. This will be the response after removing one of the forwarding links.

Although the above plan works well for using UDPs, we needed to change the application program in order to handle TCPs. For the TCP portion of our test, we completely relied upon TCP to acknowledge the successful transmission of the messages. Only after a successful receipt of the command message, would the echo be sent. We observed later that this change impacted the recovery times.


Ring Topology Instead of Mesh

The next decision was to select the network topology and the number of switches in the network. We elected to only use managed switches that were capable of being configured for STP, RSTP, trunking or proprietary ring. We decided to use the same ring topology that was shown in the 802.1D-2004 standard for RSTP. Although STP and RSTP can function in either a mesh or ring topology, we did not want to change the topology when we tested for proprietary ring. In this way we would have consistent data. Besides, the ring topology tends to be the more popular selection when incorporating redundancy schemes. This presented a slight problem with the trunking test since loops are not allowed. For trunking, we broke the ring by removing the same segment that was being used as the alternate segment. In this way the same number of switch hops (in our case four) would occur during communication as would be experienced with STP, RSTP and proprietary ring. It must be remembered, however, that with trunking a segment in the diagram really means two separate paths. So when we test the performance after a break, we mean that we removed one of these paths when testing the trunking scheme. Like the diagram, we used six managed switches. We could have used more, but we felt this would be a representatively sized network. We set all ports to 100 Mbps full-duplex with PAUSE enabled. We chose copper cabling for convenience although we could have used fiber optics as well.