At the beginning of the nineties a small group of people working at Westronic Inc. in Calgary, Canada started the work on the Distributed Network Protocol. The goal was to offer an alternative to the thousands of propriety - and mostly faulty - solutions offered by hundreds of SCADA companies. It had to be efficient enough to compete with the existing solutions, but it also had to be flexible enough to survive into the future.
I had the privilege to participate in the initial definition of the protocol. DNP, the brainchild of Mike Copps, is healthy and very influential today - not bad for a biochemist turned software designer. Working with Mike was one of the very few cases in my career when things were done the right way.
The definition went through many iterations, there were many ideas in countless different forms, almost all of them were found to be faulty under some conditions - so they needed constant refinement. Characters could be lost, distorted - even pop into existence out of nowhere. These cases had to be recognized and appropriate actions taken. The most difficult problems were caused by combinations of events happening in a very rare sequence, at the worst possible time. We had to find all the possible combinations of events that could cause duplication or loss of messages. Or at least we had to try it.
The cost of mistake is very high in the SCADA industry - people could be killed, transformer stations could explode or there could be other huge damage, costing millions of dollars. This explains why the company was willing to invest a lot of time into the proper protocol design.
The shear number of possible problems found during the definition phase made me skeptical about the human capability to find ALL the problems. At about the same time I found a book review about SPIN, Design and Validation of Computer Protocols. It promised a possible solution to our problem.
It took several years for me to get a copy of the book. In the deeply embedded programming there are no complicated data-types to make programs difficult to debug. Optimizing code to run faster is easier than many say it is, but to code for any malicious combination of events, in the worst possible timing is the really difficult. Spin promises to help find those bad combinations. Once you find them, you can modify your code until not even SPIN can find anything wrong with it.
One could, in theory, do it by testing the program for a long time, providing every imaginable combination of input. The longer you test the program that way the more problems you find. In the beginning problems show up very frequently, later less frequently and in the end they become sporadic. After a while you might decide, that there are no more problems. Now the product will get eventually to the customers and thousands of installations could produce customer complaints very rapidly. Our programs were monitored by other computers, they did not fail to notice if one single message was lost in a month!
What SPIN can do for you is equivalent to testing your program for an infinitely long time and finding all the problems caused by unfortunate timing of events. It comes at a cost though. You will have to learn to think in fundamentally different way to effectively use SPIN. In my experience the cultural shock was about the same as when I started to get experience with the Prolog programming language.
One possible way to use SPIN is the following. When considering different design alternatives you should write a Promela model, add specifications and then verify them. When the verifier finds a spec violation it will also produce an execution trace from the start condition up to the violation, so that one can observe what went wrong. Having this information will result in a modified model. Reiterate it until done!
...