Table of contents
- What SOLID stands for
- 1. Single responsibility principle (SRP)
- 2. Open/close principle (OCP)
- 3. Liskov substitution principle (LSP)
- 4. Interface segregation principle (ISP)
- 5. Dependency inversion principle (DIP)
- Pros (Why would we use them in data engineering)?
- Cons (Why wouldn’t we use them in data engineering)?
- Conclusion
SOLID principles are a set of principles that guide the software engineering process aiming to make code easier to read, test and maintain.
This is a concept under Object Oriented Programming that was made popular by Robert Martin (commonly referred to as Uncle Bob by the software engineering community).
What SOLID stands for
The term SOLID is an acronym that stands for:
Single responsibility principle (SRP)
Open/close principle (OCP)
Liskov substitution principle (LSP)
Interface segregation principle (ISP)
Dependency inversion principle (DIP)
1. Single responsibility principle (SRP)
The single responsibility principle (SIP) states a class must only change for one reason. In literal terms, it means every module must only have one responsibility. Because each module can only have one responsibility, the code becomes more readable and testable.
Examples
Let’s create a simple bank account class to demonstrate what violating and satisfying the single responsible principle looks like:
A. Principle violation
class BankAccount:
def __init__(self, account_number: int, balance: float):
self.account_number = account_number
self.balance = balance
def deposit_money(self, amount: float):
self.balance += amount
def withdraw_money(self, amount: float):
if amount > self.balance:
raise ValueError("Unfortunately your balance is insufficient for any withdrawals right now ... ")
self.balance -= amount
def print_balance(self):
print(f'Account no: {self.account_number}, Balance: {self.balance} ')
def change_account_number(self, new_account_number: int):
self.account_number = new_account_number
print(f'Your account number has changed to "{self.account_number}" ')
This violates the SIP because the BankAccount
class is managing more than one duty for bank accounts - managing bank account profiles and managing money.
B. Principle satisfaction
Now here's an example of satisfying the SIP:
class DepositManager:
def deposit_money(self, account, amount):
account.balance += amount
class WithdrawalManager:
def withdraw_money(self, account, amount):
if amount > account.balance:
raise ValueError("Unfortunately your balance is insufficient for any withdrawals right now ... ")
account.balance -= amount
class BalancePrinter:
def print_balance(self, account):
print(f'Account no: {account.account_number}, Balance: {account.balance} ')
class AccountNumberManager:
def change_account_number(self, account, new_account_number):
account.account_number = new_account_number
print(f'Your account number has changed to "{account.account_number}" ')
class BankAccount:
def __init__(self, account_number: int, balance: float):
self.account_number = account_number
self.balance = balance
self.deposit_manager = DepositManager()
self.withdrawal_manager = WithdrawalManager()
self.balance_printer = BalancePrinter()
self.account_number_manager = AccountNumberManager()
def deposit_money(self, amount: float):
self.deposit_manager.deposit_money(self, amount)
def withdraw_money(self, amount: float):
self.withdrawal_manager.withdraw_money(self, amount)
def print_balance(self):
self.balance_printer.print_balance(self)
def change_account_number(self, new_account_number: int):
self.account_number_manager.change_account_number(self, new_account_number)
We’ve split the duties linked to managing the bank account into separate classes, which makes it easier to make changes to classes with the same responsibility if the need arises.
C. Codebase extension example
For example, if the business requires us to start printing the balances with specific currency symbols, we don’t need to alter the entire code - just the BalancePrinter
class:
class BalancePrinter:
def print_balance(self, account):
print(f'Account no: {account.account_number}, Balance: ${account.balance} ')
....
bank_account = BankAccount(12345678, 100.75)
bank_account.print_balance()
…which results in :
Account no: 12345678, Balance: $100.75
2. Open/close principle (OCP)
This principle states that a class should be open for extension but closed for modification. This simply means that you should be able to add new functionality to your code without changing the existing code.
It may sound counterintuitive so let’s explore examples to break this down a bit:
Examples
Let’s create a robot that detects different objects using a range of sensors
A. Principle violation
This is what it looks like to violate the open/close principle (OCP):
class Robot:
def __init__(self, sensor_type):
self.sensor_type = sensor_type
def detect(self):
if self.sensor_type == "temperature":
print("Detecting objects using temperature sensor ... ")
elif self.sensor_type == "ultrasonic":
print("Detecting objects using ultrasonic sensor ... ")
elif self.sensor_type == "infrared":
print("Detecting objects using infrared sensor ... ")
The violation of the OCP in this example makes it difficult for developers to manage especially when the code scales up in different directions. Imagine a case where we need to add more sensors to the robot to optimize its object detection function - this approach requires us to edit the Robot
class, which would be difficult if the class contains several lines of code. Because we would need to run several unit tests to the Robot
class to confirm the robot operates as expected, it would be easy to get this wrong or miss out on a test especially when it contains thousands of lines of code constantly amended over time.
So extending the code without introducing bugs would be a challenging endeavour indeed under this approach.
B. Principle satisfaction
Let’s explore a possible solution for this:
from abc import ABC, abstractmethod
class Sensor(ABC):
@abstractmethod
def detect(self):
pass
class TemperatureSensor(Sensor):
def detect(self):
print("Detecting objects using temperature sensor ... ")
class UltrasonicSensor(Sensor):
def detect(self):
print("Detecting objects using ultrasonic sensor ... ")
class InfraredSensor(Sensor):
def detect(self):
print("Detecting objects using infrared sensor ... ")
We’ve created an abstract object named Sensor
using the @abstractmethod
decorator, which allows us to create derived classes (or subclasses) that represent the different types of sensors that could be added to the robot such as:
TemperatureSensor
- a temperature sensorUltrasonicSensor
- an ultrasonic sensorInfraredSensor
- an infrared sensor
These subclasses use inheritance and polymorphism to adopt the features of the main Sensor
class to express their unique implementations of the same detect()
method based on their distinct behaviours.
Now we can create the Robot
class:
class Robot:
def __init__(self, *sensor_types):
self.sensor_types = sensor_types
def detect(self):
for sensor_type in self.sensor_types:
sensor_type.detect()
temperature_sensor = TemperatureSensor()
ultrasonic_sensor = UltrasonicSensor()
infrared_sensor = InfraredSensor()
robot = Robot(temperature_sensor, ultrasonic_sensor, infrared_sensor)
robot.detect()
This approach uses composition to provide the flexibility to dynamically change the robot’s sensor technology at runtime, especially in scenarios where a more optimized sensor of existing versions arrives, or even new sensors in general without having to change the Robot
class itself.
But inheritance would force the Robot
class to adopt specific sensors (i.e. tight coupling the Robot class to specific sensor types), making it difficult to adapt the robot if a sensor type is out of date or new sensors need to replace specific existing versions.
C. Codebase extension example
Suppose the CTO now requires the current sensory technology to be replaced with the latest camera and proximity sensors trending across the industry. All we need to do is add the sensors as subclasses to the Sensor
parent class like this:
class CameraSensor(Sensor):
def detect(self):
print("Detecting objects using new camera sensor ...")
class ProximitySensor(Sensor):
def detect(self):
print("Detecting objects using new proximity sensor ...")
...
camera_sensor = CameraSensor()
proximity_sensor = ProximitySensor()
robot = Robot(camera_sensor, proximity_sensor)
robot.detect()
…which results in :
Detecting objects using new camera sensor ...
Detecting objects using new proximity sensor ...
That's it! Again, we didn't need to remove or change any existing code...we simply added two new classes to meet these requirements, CameraSensor
and ProximitySensor
, then we read them into the robot
variable to perform the same object detection task, this time with the new sensors!
3. Liskov substitution principle (LSP)
The Liskov substitution principle (LSP) states that a subclass should be able to replace a parent class without any unexpected behaviour. This means you should be able to replace a parent class with its subclasses at any time in a seamless manner.
Examples
We can use household items to demonstrate the violation and satisfaction of this principle:
A. Principle violation
class HouseholdItem:
def __init__(self):
pass
def turn_on(self):
pass
def turn_off(self):
pass
def change_temperature(self):
pass
class Oven(HouseholdItem):
def __init__(self):
pass
def turn_on(self):
print("Oven turned on. ")
def turn_off(self):
print("Oven turned off. ")
def change_temperature(self):
print("Oven temperature changed. ")
class Lamp(HouseholdItem):
def __init__(self):
pass
def turn_on(self):
print("Lamp turned on. ")
def turn_off(self):
print("Lamp turned off. ")
This looks harmless on the surface, however, this represents the cardinal sin of the LSP: each subclass must be able to be swapped with its parent class without breaking behaviour; if we swapped the Lamp
class with the HouseholdItem
class, the program would break because most household lamps do not have in-built temperature settings.
B. Principle satisfaction
Here’s an approach to fixing the previous code:
from abc import ABC, abstractmethod
class HouseholdItem(ABC):
def __init__(self):
pass
@abstractmethod
def turn_on(self):
pass
@abstractmethod
def turn_off(self):
pass
class TemperatureControlledHouseholdItem(HouseholdItem):
@abstractmethod
def change_temperature(self):
pass
Here’s what we’ve created :
HouseholdItem
class - an abstract class that defines two abstract methods inside it,turn_on
andturn_off
, which all household appliances should have.TemperatureControlledHouseholdItem
- a subclass for household items designed with temperature control settings that inherit the abstracted methods from theHouseholdItem
class. A custom abstract method namedchange_temperature
is also added inside the subclass to further satisfy the LSP.
Then we can now create the household items based on whether their temperatures are controllable or not:
class Oven(TemperatureControlledHouseholdItem):
def __init__(self):
pass
def turn_on(self):
print("Oven turned on. ")
def turn_off(self):
print("Oven turned off. ")
def change_temperature(self):
print("Oven temperature changed. ")
class Lamp(HouseholdItem):
def __init__(self):
pass
def turn_on(self):
print("Lamp turned on. ")
def turn_off(self):
print("Lamp turned off. ")
appliances = [Oven(), Lamp()]
for appliance in appliances:
appliance.turn_on()
if isinstance(appliance, TemperatureControlledHouseholdItem):
appliance.change_temperature()
appliance.turn_off()
This approach allows each household item to only select the behavioural attributes related to its intended design without any class being forced to adopt methods against its initial design.
This approach separates the concerns into the abstract HouseholdItem
class and the TemperatureControlledHouseholdItem
subclass, which makes it easier to link household items to the methods that suit their intended design without unexpected behaviour popping up in the program.
C. Codebase extension example
Let's add more household appliances:
class Refrigerator(TemperatureControlledHouseholdItem):
def __init__(self):
pass
def turn_on(self):
print("Refrigerator turned on. ")
def turn_off(self):
print("Refrigerator turned off. ")
def change_temperature(self):
print("Refrigerator temperature changed. ")
class Laptop(HouseholdItem):
def __init__(self):
pass
def turn_on(self):
print("Laptop turned on. ")
def turn_off(self):
print("Laptop turned off. ")
...
appliances = [Oven(), Lamp(), Refrigerator(), Laptop()]
...
We've added a refrigerator and laptop instance to the mix to represent two more items you can find in the household. Because you can configure the refrigerator's temperature, its interface inherits the TemperatureControlledHouseholdItem
subclass. However, configuring a laptop's temperature is not a feature necessarily accessible in today's world, so we'll let it inherit the simple HouseholdItem
class for now.
Here's the output:
Oven turned on.
Oven temperature changed.
Oven turned off.
Lamp turned on.
Lamp turned off.
Refrigerator turned on.
Refrigerator temperature changed.
Refrigerator turned off.
Laptop turned on.
Laptop turned off.
Once again, we've managed to extend the codebase's interfaces without touching the existing classes, therefore remaining in control of the program's expected behaviour.
4. Interface segregation principle (ISP)
The interface segregation principle (ISP) states that a class shouldn't be forced to use methods it isn’t designed or expected to use.
This principle is violated if a class contains methods its subclass doesn't need or may not make real-world sense to use.
Examples
A. Principle violation
Here's an example of code violating this principle:
class Animal:
def swim(self):
pass
def fly(self):
pass
def make_sound(self):
pass
class Duck(Animal):
def swim(self):
print("Duck is now swimming in the water...")
def fly(self):
print("Duck is now flying in the air...")
def make_sound(self):
print("Quack! Quack!")
class Dog(Animal):
def swim(self):
raise NotImplementedError("Dogs can't swim ... ")
def fly(self):
raise NotImplementedError("Dogs can't fly ....")
def make_sound(self):
print("Woof! Woof!")
The ducks and dogs have distinct behavioural differences that make this coding approach violate the ISP.
B. Principle satisfaction
Let’s explore a better way to handle the different types of animals available:
from abc import ABC, abstractmethod
class SwimmingAnimal:
@abstractmethod
def swim(self):
pass
class FlyingAnimal:
@abstractmethod
def fly(self):
pass
class VocalAnimal:
@abstractmethod
def make_sound(self):
pass
We’ve split the animal types into 3 abstract classes (or interfaces) based on animals that can swim, fly or vocalise a sound: SwimmingAnimal
, FlyingAnimal
and VocalAnimal
respectively with the @abstractmethod
marking each internal method as an abstract one.
class Duck(SwimmingAnimal, FlyingAnimal, VocalAnimal):
def swim(self):
print("Duck is now swimming in the water...")
def fly(self):
print("Duck is now flying in the air...")
def make_sound(self):
print("Quack! Quack!")
class Dog(VocalAnimal):
def make_sound(self):
print("Woof! Woof!")
The Duck
subclass inherits the abstract methods from the SwimmingAnimal
, FlyingAnimal
and VocalAnimal
parent classes, allowing us to explicitly define the behaviours associated with ducks inside the Duck
object’s inherited methods.
The same logic is followed for the Dog
class, except only the VocalAnimal
class is inherited along with its in-built make_sound
method.
This allows us to separate the unique behaviours of the animals using smaller interfaces where each class only depends on the interfaces that contain its behavioural attribute. So the duck inherits all three interfaces, while the dog only inherits the VocalAnimal
interface.
C. Codebase extension example
So if we needed to include more animals like cats, dolphins and swans, we should be able to do this with no issues:
class Cat(VocalAnimal):
def make_sound(self):
print("Meow! Meow!")
class Dolphin(SwimmingAnimal, VocalAnimal):
def swim(self):
print("Dolphin is now swimming in the water...")
def make_sound(self):
print("Whistle! Squeak!")
class Swan(SwimmingAnimal, FlyingAnimal, VocalAnimal):
def swim(self):
print("Swan is now swimming in the water...")
def fly(self):
print("Swan is now flying in the air...")
def make_sound(self):
print("Honk? Hiss?")
...
cat = Cat()
dolphin = Dolphin()
swan = Swan()
cat.make_sound()
dolphin.swim()
swan.fly()
swan.make_sound()
...and this should result in:
Meow! Meow!
Dolphin is now swimming in the water...
Swan is now flying in the air...
Honk? Hiss?
No existing code was changed in the process of including these new animals in the codebase.
5. Dependency inversion principle (DIP)
The dependency inversion principle (DIP) states that high-level modules (classes) should not depend on low-level modules, and both should depend on abstractions only. By making the modules depend on abstract implementations instead of concrete ones, this principle increases the level of loose coupling in the program’s code, making it easier to extend the program’s functionality without modifying the existing code.
Examples
We’ll create an instance of an electric car and its engine for demonstrative purposes:
A. Principle violation
Here's an example of code violating this principle:
class ElectricCar:
def switch_on(self):
print("ON: Car switched on.")
def switch_off(self):
print("OFF: Car switched off.")
class ElectricVehicleEngine:
def __init__(self, vehicle: ElectricCar):
self.vehicle = vehicle
self.engine_active = False
def press_engine_switch(self):
if self.engine_active:
self.vehicle.switch_off()
self.engine_active = False
else:
self.vehicle.switch_on()
self.engine_active = True
B. Principle satisfaction
Here’s how to satisfy the DIP in this case:
from abc import ABC, abstractmethod
class SwitchableObject(ABC):
@abstractmethod
def press_switch(self):
pass
class ElectricCar(SwitchableObject):
def __init__(self):
self.switch_state = False
def press_switch(self):
if self.switch_state:
self.switch_state = False
print("OFF: Car switched off.")
else:
self.switch_state = True
print("ON: Car switched on.")
class ElectricVehicleEngine(SwitchableObject):
def __init__(self, switchable: SwitchableObject):
self.switchable = switchable
self.engine_active = False
def press_switch(self):
if self.engine_active:
self.switchable.press_switch()
self.engine_active = False
else:
self.switchable.press_switch()
self.engine_active = True
SwitchableObject
- an abstract class that represents all objects that contain a switch (or button) that toggles between on and off. This is used to create a single abstract method,press_switch
, ready to be implemented by the derived classes to follow.ElectricCar
- a derived class that serves as a concrete implementation of theSwitchableObject
class for electric cars.ElectricVehicleEngine
- another derived class that serves as a concrete implementation of theSwitchableObject
class for the engines for electric vehicles. This takes theSwitchableObject
is a constructor argument, which means a switchable object must be included in the input parameter when initializing theElectricVehicleEngine
class into an object, like so:
electric_car = ElectricCar()
electric_car_engine = ElectricVehicleEngine(electric_car)
electric_car_engine.press_switch()
electric_car_engine.press_switch()
electric_car_engine.press_switch()
Depending on the abstractions instead of the concrete implementations makes the code more flexible to extend since it allows more switchable items to be added without having to modify the existing codebase.
C. Codebase extension example
If we wanted to include new electronic devices, we can simply create a new class that implements the SwitchableObject
interface and combine it with ElectricVehicleEngine
without changing the existing classes, making the code more modular and easier to maintain over time!
Let's repeat the logic but add a music player to the vehicle this tim:
class MusicPlayer(SwitchableObject):
def __init__(self):
self.switch_state = False
def press_switch(self):
if self.switch_state:
self.switch_state = False
print("OFF: Music player switched off.")
else:
self.switch_state = True
print("ON: Music player switched on.")
class MusicPlayerSwitch(SwitchableObject):
def __init__(self, switchable: SwitchableObject):
self.switchable = switchable
self.music_player_active = False
def press_switch(self):
if self.music_player_active:
self.switchable.press_switch()
self.music_player_active = False
else:
self.switchable.press_switch()
self.music_player_active = True
...
music_player = MusicPlayer()
music_player_switch = MusicPlayerSwitch(music_player)
...
music_player_switch.press_switch()
music_player_switch.press_switch()
music_player_switch.press_switch()
Here we've added a music player to the program, expressed as MusicPlayer
and MusicPlayerSwitch
, similar to the logic of the ElectricCar
and ElectricVehicleEngine
interfaces mentioned earlier, which returns:
ON: Car switched on.
OFF: Car switched off.
ON: Car switched on.
ON: Music player switched on.
OFF: Music player switched off.
ON: Music player switched on.
All that we've done is add the interfaces, which make the code readable and easy to maintain as the program gets bigger.
Pros (Why would we use them in data engineering)?
You should consider using them if you’re building data platforms or data pipelines with
complex state management
frequent update requirements
This becomes more apparent when you have data solutions with several underlying processes expected to scale over time. This is because:
your code becomes more modular, which means you can extend its behaviours and make changes over time easier
SOLID principles reduce the risks of introducing unexpected bugs into your data pipelines, increasing data quality and reliability
it reduces development time by forcing you to think about the specific design of your code from the project’s jump
Once a data platform's requirements grow over time, other factors such as memory management, performance monitoring, and latency optimization will quickly become part of the prioritization list, and guiding the platform's design using these principles may save you plenty of headaches down the line.
Cons (Why wouldn’t we use them in data engineering)?
SOLID principles may not be a good idea if you need to create data pipelines only, especially small or quick prototypes.
There has to be a larger vision in place for the data workflows to make implementing these design principles worthwhile. Here are some of the reasons why:
SOLID principles introduce unnecessary complexities that could easily be replaced with simpler and equally effective implementations
certain design patterns may be satisfied with prioritizing performance over long-term maintainability, making SOLID concepts less applicable
experimental environments may prefer the freedom of rapid iterations which OOP may not offer in the short-term
These design principles require you to treat data workflows as software applications in their own right, and standalone data pipelines may not always qualify for SOLID principle use cases.
Conclusion
In summary, understanding the pros and cons of each can enable OOP and functional programming to co-exist in the same projects but for different use cases. Functional programming may be preferred for experimental areas, while OOP may be more suitable for the final production development, depending on the development team’s use cases.
Feel free to reach out via my handles: LinkedIn| Email | Twitter