Expanding on Unfettered
Recently, Joe Weiss via his ‘Unfettered’ blog posted about an experience he had with IT scans in a utility environment. To summarize, the IT department of a utility port scanned a series of NERC CIP related substations and ended up crashing a service responsible for protection mechanisms. There is some confusion about whether or not the OT group had knowledge of the scans (Joe says “no notification was given”, but the next sentence says “The OT Team was notified”, but clearly the action was taken and the result was that the relays began misbehaving. The operations team had to reboot and clear the relays in order to restore normal operation.
From a technical perspective, Joe highlights that the problem is the GOOSE/IEEE61850 protocol on the relays. This protocol is used to communicate between relays at a substation, exchanging data necessary for protection and control information that is mostly local to the substation, but GOOSE can also go between substations as well. Protection and Control (P&C) is a distinct responsibility from EMS/SCADA, which in this case was using DNP3. Basically P&C is responsible for the substation itself, and EMS/SCADA is responsible for monitoring the grid via data from multiple substations.
A good analogy is that you in your car is the EMS/SCADA. You are responsible for direction, acceleration, deceleration, turn signals, and a few other variables that are exposed to you. But the car itself is made up of subsystems that must require automation, such as airbags, engine control, transmission, and brakes. Those subsystems are not directly viewable by you as the driver, but you depend on them working in order to drive safely and efficiently. Had Joe’s scenario happened to a car, this means that subsystems were failing or not operating correctly, and you weren’t getting a check engine light to notify you. Not a good place to be.
I’m going to agree with Joe…. with some clarifications. Joe makes a point that “IT Security should NEVER be left alone in industrial operations”. I agree, but this is a simplistic argument that doesn’t get to the core issue: Personnel who aren’t trained for industrial work, and who aren’t under the command of an Operations structure, require a formal process for performing work in industrial environments, and a clear consequence to violating that process. And I’m not talking just about safety requirements, this includes all the various things that linemen, foremen, mechanics, electricians, technicians, and others go through prior to performing work. They have a work order, they follow an energy control process, the work is planned with the operators and engineers, and approval is given at the top. All of this is necessary because what they do affects lives and livelihoods, and it’s magnified in electric power because we ALL rely on it.
Without this formal process in place, it’s left to each individual’s own experience and capability to judge how their work may affect the system. Some of us are good at it, we take our time, we talk with our peers in engineering and operations, we get ad-hoc (or even formal!) approvals because we are concerned of the effects. Some of us are not good, some of us fire+forget, some of us assume the responsibility lies elsewhere, and are basically being reckless maybe without realizing that it’s reckless. This lack of responsibility and accountability isn’t an wholly an individual problem, it’s an organizational problem and it needs to be addressed by the organization’s management. This is risk, pure and simple, and it sounds like it’s not being managed adequately in this instance.
And lastly, I have to question the engineering rigor and CIP compliance analysis that went into the design of this system. Clearly, the scanning tool (which is not generally a standard part of an EMS/SCADA installation) has considerable access to the unauthenticated protocols (DNP3 and GOOSE) that run this system. With access via those protocols, crashing the relays is accidental, but a savvy attacker can simply tell the relays to open, and they will open. This is equivalent to the level of access that the licensed operators and the protection engineers have, but without the tools, processes, and training those professionals are required to use. I hope this prompted some uncomfortable conversations between the SCADA team, their networking team, and upper operations management. Back in the CIPv3 days, a design like this would prompt lots of questions from NERC CIP auditors, I hope we haven’t moved backward by going to CIPv5.
photo by tonyglen14