SOLID principles in data engineering - Part 1

SOLID principles are a set of principles that guide the software engineering process aiming to make code easier to read, test and maintain.

This is a concept under Object Oriented Programming that was made popular by Robert Martin (commonly referred to as Uncle Bob by the software engineering community).

What SOLID stands for

The term SOLID is an acronym that stands for:

  • Single responsibility principle (SRP)

  • Open/close principle (OCP)

  • Liskov substitution principle (LSP)

  • Interface segregation principle (ISP)

  • Dependency inversion principle (DIP)

1. Single responsibility principle (SRP)

The single responsibility principle (SIP) states a class must only change for one reason. In literal terms, it means every module must only have one responsibility. Because each module can only have one responsibility, the code becomes more readable and testable.

Examples

Let’s create a simple bank account class to demonstrate what violating and satisfying the single responsible principle looks like:

A. Principle violation

class BankAccount:
    def __init__(self, account_number: int, balance: float):
        self.account_number = account_number
        self.balance = balance


    def deposit_money(self, amount: float):
        self.balance += amount


    def withdraw_money(self, amount: float):
        if amount > self.balance:
            raise ValueError("Unfortunately your balance is insufficient for any withdrawals right now ...  ")
        self.balance -= amount


    def print_balance(self):
        print(f'Account no: {self.account_number}, Balance: {self.balance}  ')


    def change_account_number(self, new_account_number: int):
        self.account_number = new_account_number
        print(f'Your account number has changed to "{self.account_number}" ')

This violates the SIP because the BankAccount class is managing more than one duty for bank accounts - managing bank account profiles and managing money.

B. Principle satisfaction

Now here's an example of satisfying the SIP:

class DepositManager:
    def deposit_money(self, account, amount):
        account.balance += amount

class WithdrawalManager:
    def withdraw_money(self, account, amount):
        if amount > account.balance:
            raise ValueError("Unfortunately your balance is insufficient for any withdrawals right now ...  ")
        account.balance -= amount

class BalancePrinter:
    def print_balance(self, account):
        print(f'Account no: {account.account_number}, Balance: {account.balance}  ')

class AccountNumberManager:
    def change_account_number(self, account, new_account_number):
        account.account_number = new_account_number
        print(f'Your account number has changed to "{account.account_number}" ')

class BankAccount:
    def __init__(self, account_number: int, balance: float):
        self.account_number = account_number
        self.balance = balance
        self.deposit_manager = DepositManager()
        self.withdrawal_manager = WithdrawalManager()
        self.balance_printer = BalancePrinter()
        self.account_number_manager = AccountNumberManager()

    def deposit_money(self, amount: float):
        self.deposit_manager.deposit_money(self, amount)

    def withdraw_money(self, amount: float):
        self.withdrawal_manager.withdraw_money(self, amount)

    def print_balance(self):
        self.balance_printer.print_balance(self)

    def change_account_number(self, new_account_number: int):
        self.account_number_manager.change_account_number(self, new_account_number)

We’ve split the duties linked to managing the bank account into separate classes, which makes it easier to make changes to classes with the same responsibility if the need arises.

C. Codebase extension example

For example, if the business requires us to start printing the balances with specific currency symbols, we don’t need to alter the entire code - just the BalancePrinter class:

class BalancePrinter:
    def print_balance(self, account):
        print(f'Account no: {account.account_number}, Balance: ${account.balance}  ')

....

bank_account = BankAccount(12345678, 100.75)
bank_account.print_balance()

…which results in :

Account no: 12345678, Balance: $100.75

2. Open/close principle (OCP)

This principle states that a class should be open for extension but closed for modification. This simply means that you should be able to add new functionality to your code without changing the existing code.

It may sound counterintuitive so let’s explore examples to break this down a bit:

Examples

Let’s create a robot that detects different objects using a range of sensors

A. Principle violation

This is what it looks like to violate the open/close principle (OCP):

class Robot:
    def __init__(self, sensor_type):
        self.sensor_type = sensor_type

    def detect(self):
        if self.sensor_type == "temperature":
            print("Detecting objects using temperature sensor ... ")

        elif self.sensor_type == "ultrasonic":
            print("Detecting objects using ultrasonic sensor ... ")

        elif self.sensor_type == "infrared":
            print("Detecting objects using infrared sensor ... ")

The violation of the OCP in this example makes it difficult for developers to manage especially when the code scales up in different directions. Imagine a case where we need to add more sensors to the robot to optimize its object detection function - this approach requires us to edit the Robot class, which would be difficult if the class contains several lines of code. Because we would need to run several unit tests to the Robot class to confirm the robot operates as expected, it would be easy to get this wrong or miss out on a test especially when it contains thousands of lines of code constantly amended over time.

So extending the code without introducing bugs would be a challenging endeavour indeed under this approach.

B. Principle satisfaction

Let’s explore a possible solution for this:

from abc import ABC, abstractmethod

class Sensor(ABC):
    @abstractmethod
    def detect(self):
        pass

class TemperatureSensor(Sensor):
    def detect(self):
        print("Detecting objects using temperature sensor ... ")

class UltrasonicSensor(Sensor):
    def detect(self):
        print("Detecting objects using ultrasonic sensor ... ")


class InfraredSensor(Sensor):
    def detect(self):
        print("Detecting objects using infrared sensor ... ")

We’ve created an abstract object named Sensor using the @abstractmethod decorator, which allows us to create derived classes (or subclasses) that represent the different types of sensors that could be added to the robot such as:

  1. TemperatureSensor - a temperature sensor

  2. UltrasonicSensor - an ultrasonic sensor

  3. InfraredSensor - an infrared sensor

These subclasses use inheritance and polymorphism to adopt the features of the main Sensor class to express their unique implementations of the same detect() method based on their distinct behaviours.

Now we can create the Robot class:

class Robot:
    def __init__(self, *sensor_types):
        self.sensor_types = sensor_types


    def detect(self):
        for sensor_type in self.sensor_types:
            sensor_type.detect()

temperature_sensor  =   TemperatureSensor()
ultrasonic_sensor   =   UltrasonicSensor()
infrared_sensor     =   InfraredSensor()

robot = Robot(temperature_sensor, ultrasonic_sensor, infrared_sensor)
robot.detect()

This approach uses composition to provide the flexibility to dynamically change the robot’s sensor technology at runtime, especially in scenarios where a more optimized sensor of existing versions arrives, or even new sensors in general without having to change the Robot class itself.

But inheritance would force the Robot class to adopt specific sensors (i.e. tight coupling the Robot class to specific sensor types), making it difficult to adapt the robot if a sensor type is out of date or new sensors need to replace specific existing versions.

C. Codebase extension example

Suppose the CTO now requires the current sensory technology to be replaced with the latest camera and proximity sensors trending across the industry. All we need to do is add the sensors as subclasses to the Sensor parent class like this:

class CameraSensor(Sensor):
    def detect(self):
        print("Detecting objects using new camera sensor ...")


class ProximitySensor(Sensor):
    def detect(self):
        print("Detecting objects using new proximity sensor ...")

...

camera_sensor = CameraSensor()
proximity_sensor = ProximitySensor()

robot = Robot(camera_sensor, proximity_sensor)
robot.detect()

…which results in :

Detecting objects using new camera sensor ...
Detecting objects using new proximity sensor ...

That's it! Again, we didn't need to remove or change any existing code...we simply added two new classes to meet these requirements, CameraSensor and ProximitySensor, then we read them into the robot variable to perform the same object detection task, this time with the new sensors!

3. Liskov substitution principle (LSP)

The Liskov substitution principle (LSP) states that a subclass should be able to replace a parent class without any unexpected behaviour. This means you should be able to replace a parent class with its subclasses at any time in a seamless manner.

Examples

We can use household items to demonstrate the violation and satisfaction of this principle:

A. Principle violation

class HouseholdItem:
    def __init__(self):
        pass

    def turn_on(self):
        pass

    def turn_off(self):
        pass

    def change_temperature(self):
        pass

class Oven(HouseholdItem):
    def __init__(self):
        pass

    def turn_on(self):
        print("Oven turned on. ")

    def turn_off(self):
        print("Oven turned off. ")

    def change_temperature(self):
        print("Oven temperature changed. ")

class Lamp(HouseholdItem):
    def __init__(self):
        pass

    def turn_on(self):
        print("Lamp turned on. ")

    def turn_off(self):
        print("Lamp turned off. ")

This looks harmless on the surface, however, this represents the cardinal sin of the LSP: each subclass must be able to be swapped with its parent class without breaking behaviour; if we swapped the Lamp class with the HouseholdItem class, the program would break because most household lamps do not have in-built temperature settings.

B. Principle satisfaction

Here’s an approach to fixing the previous code:

from abc import ABC, abstractmethod

class HouseholdItem(ABC):
    def __init__(self):
        pass

    @abstractmethod
    def turn_on(self):
        pass

    @abstractmethod
    def turn_off(self):
        pass

class TemperatureControlledHouseholdItem(HouseholdItem):
    @abstractmethod
    def change_temperature(self):
        pass

Here’s what we’ve created :

  • HouseholdItem class - an abstract class that defines two abstract methods inside it, turn_on and turn_off, which all household appliances should have.

  • TemperatureControlledHouseholdItem - a subclass for household items designed with temperature control settings that inherit the abstracted methods from the HouseholdItem class. A custom abstract method named change_temperature is also added inside the subclass to further satisfy the LSP.

Then we can now create the household items based on whether their temperatures are controllable or not:

class Oven(TemperatureControlledHouseholdItem):
    def __init__(self):
        pass

    def turn_on(self):
        print("Oven turned on. ")

    def turn_off(self):
        print("Oven turned off. ")

    def change_temperature(self):
        print("Oven temperature changed. ")

class Lamp(HouseholdItem):
    def __init__(self):
        pass

    def turn_on(self):
        print("Lamp turned on. ")

    def turn_off(self):
        print("Lamp turned off. ")

appliances = [Oven(), Lamp()]

for appliance in appliances:
    appliance.turn_on()
    if isinstance(appliance, TemperatureControlledHouseholdItem):
        appliance.change_temperature()
    appliance.turn_off()

This approach allows each household item to only select the behavioural attributes related to its intended design without any class being forced to adopt methods against its initial design.

This approach separates the concerns into the abstract HouseholdItem class and the TemperatureControlledHouseholdItem subclass, which makes it easier to link household items to the methods that suit their intended design without unexpected behaviour popping up in the program.

C. Codebase extension example

Let's add more household appliances:

class Refrigerator(TemperatureControlledHouseholdItem):
    def __init__(self):
        pass

    def turn_on(self):
        print("Refrigerator turned on. ")

    def turn_off(self):
        print("Refrigerator turned off. ")

    def change_temperature(self):
        print("Refrigerator temperature changed. ")


class Laptop(HouseholdItem):
    def __init__(self):
        pass

    def turn_on(self):
        print("Laptop turned on. ")

    def turn_off(self):
        print("Laptop turned off. ")

...

appliances = [Oven(), Lamp(), Refrigerator(), Laptop()]

...

We've added a refrigerator and laptop instance to the mix to represent two more items you can find in the household. Because you can configure the refrigerator's temperature, its interface inherits the TemperatureControlledHouseholdItem subclass. However, configuring a laptop's temperature is not a feature necessarily accessible in today's world, so we'll let it inherit the simple HouseholdItem class for now.

Here's the output:

Oven turned on. 
Oven temperature changed. 
Oven turned off. 
Lamp turned on. 
Lamp turned off. 
Refrigerator turned on. 
Refrigerator temperature changed. 
Refrigerator turned off. 
Laptop turned on. 
Laptop turned off.

Once again, we've managed to extend the codebase's interfaces without touching the existing classes, therefore remaining in control of the program's expected behaviour.

4. Interface segregation principle (ISP)

The interface segregation principle (ISP) states that a class shouldn't be forced to use methods it isn’t designed or expected to use.

This principle is violated if a class contains methods its subclass doesn't need or may not make real-world sense to use.

Examples

A. Principle violation

Here's an example of code violating this principle:

class Animal:
    def swim(self):
        pass

    def fly(self):
        pass

    def make_sound(self):
        pass

class Duck(Animal):
    def swim(self):
        print("Duck is now swimming in the water...")

    def fly(self):
        print("Duck is now flying in the air...")


    def make_sound(self):
        print("Quack! Quack!")

class Dog(Animal):
    def swim(self):
        raise NotImplementedError("Dogs can't swim ... ")

    def fly(self):
        raise NotImplementedError("Dogs can't fly ....")

    def make_sound(self):
        print("Woof! Woof!")

The ducks and dogs have distinct behavioural differences that make this coding approach violate the ISP.

B. Principle satisfaction

Let’s explore a better way to handle the different types of animals available:

from abc import ABC, abstractmethod

class SwimmingAnimal:
    @abstractmethod
    def swim(self):
        pass

class FlyingAnimal:
    @abstractmethod
    def fly(self):
        pass

class VocalAnimal:
    @abstractmethod
    def make_sound(self):
        pass

We’ve split the animal types into 3 abstract classes (or interfaces) based on animals that can swim, fly or vocalise a sound: SwimmingAnimal, FlyingAnimal and VocalAnimal respectively with the @abstractmethod marking each internal method as an abstract one.

class Duck(SwimmingAnimal, FlyingAnimal, VocalAnimal):
    def swim(self):
        print("Duck is now swimming in the water...")

    def fly(self):
        print("Duck is now flying in the air...")

    def make_sound(self):
        print("Quack! Quack!")


class Dog(VocalAnimal):
    def make_sound(self):
        print("Woof! Woof!")

The Duck subclass inherits the abstract methods from the SwimmingAnimal, FlyingAnimal and VocalAnimal parent classes, allowing us to explicitly define the behaviours associated with ducks inside the Duck object’s inherited methods.

The same logic is followed for the Dog class, except only the VocalAnimal class is inherited along with its in-built make_sound method.

This allows us to separate the unique behaviours of the animals using smaller interfaces where each class only depends on the interfaces that contain its behavioural attribute. So the duck inherits all three interfaces, while the dog only inherits the VocalAnimal interface.

C. Codebase extension example

So if we needed to include more animals like cats, dolphins and swans, we should be able to do this with no issues:

class Cat(VocalAnimal):
    def make_sound(self):
        print("Meow! Meow!")


class Dolphin(SwimmingAnimal, VocalAnimal):
    def swim(self):
        print("Dolphin is now swimming in the water...")

    def make_sound(self):
        print("Whistle! Squeak!")


class Swan(SwimmingAnimal, FlyingAnimal, VocalAnimal):
    def swim(self):
        print("Swan is now swimming in the water...")

    def fly(self):
        print("Swan is now flying in the air...")

    def make_sound(self):
        print("Honk? Hiss?")

...

cat         =   Cat()
dolphin     =   Dolphin()
swan        =   Swan()

cat.make_sound()
dolphin.swim()
swan.fly()
swan.make_sound()

...and this should result in:

Meow! Meow!
Dolphin is now swimming in the water...
Swan is now flying in the air...
Honk? Hiss?

No existing code was changed in the process of including these new animals in the codebase.

5. Dependency inversion principle (DIP)

The dependency inversion principle (DIP) states that high-level modules (classes) should not depend on low-level modules, and both should depend on abstractions only. By making the modules depend on abstract implementations instead of concrete ones, this principle increases the level of loose coupling in the program’s code, making it easier to extend the program’s functionality without modifying the existing code.

Examples

We’ll create an instance of an electric car and its engine for demonstrative purposes:

A. Principle violation

Here's an example of code violating this principle:

class ElectricCar:
    def switch_on(self):
        print("ON: Car switched on.")

    def switch_off(self):
        print("OFF: Car switched off.")

class ElectricVehicleEngine:
    def __init__(self, vehicle: ElectricCar):
        self.vehicle = vehicle
        self.engine_active = False

    def press_engine_switch(self):
        if self.engine_active:
            self.vehicle.switch_off()
            self.engine_active = False
        else:
            self.vehicle.switch_on()
            self.engine_active = True

B. Principle satisfaction

Here’s how to satisfy the DIP in this case:


from abc import ABC, abstractmethod

class SwitchableObject(ABC):
    @abstractmethod
    def press_switch(self):
        pass

class ElectricCar(SwitchableObject):
    def __init__(self):
        self.switch_state = False

    def press_switch(self):
        if self.switch_state:
            self.switch_state = False
            print("OFF: Car switched off.")
        else:
            self.switch_state = True
            print("ON: Car switched on.")

class ElectricVehicleEngine(SwitchableObject):
    def __init__(self, switchable: SwitchableObject):
        self.switchable = switchable
        self.engine_active = False

    def press_switch(self):
        if self.engine_active:
            self.switchable.press_switch()
            self.engine_active = False
        else:
            self.switchable.press_switch()
            self.engine_active = True
  • SwitchableObject - an abstract class that represents all objects that contain a switch (or button) that toggles between on and off. This is used to create a single abstract method, press_switch, ready to be implemented by the derived classes to follow.

  • ElectricCar - a derived class that serves as a concrete implementation of the SwitchableObject class for electric cars.

  • ElectricVehicleEngine- another derived class that serves as a concrete implementation of the SwitchableObject class for the engines for electric vehicles. This takes the SwitchableObject is a constructor argument, which means a switchable object must be included in the input parameter when initializing the ElectricVehicleEngine class into an object, like so:

electric_car = ElectricCar()
electric_car_engine = ElectricVehicleEngine(electric_car)

electric_car_engine.press_switch()
electric_car_engine.press_switch()
electric_car_engine.press_switch()

Depending on the abstractions instead of the concrete implementations makes the code more flexible to extend since it allows more switchable items to be added without having to modify the existing codebase.

C. Codebase extension example

If we wanted to include new electronic devices, we can simply create a new class that implements the SwitchableObject interface and combine it with ElectricVehicleEngine without changing the existing classes, making the code more modular and easier to maintain over time!

Let's repeat the logic but add a music player to the vehicle this tim:

class MusicPlayer(SwitchableObject):
    def __init__(self):
        self.switch_state = False

    def press_switch(self):
        if self.switch_state:
            self.switch_state = False
            print("OFF: Music player switched off.")
        else:
            self.switch_state = True
            print("ON: Music player switched on.")



class MusicPlayerSwitch(SwitchableObject):
    def __init__(self, switchable: SwitchableObject):
        self.switchable = switchable
        self.music_player_active = False

    def press_switch(self):
        if self.music_player_active:
            self.switchable.press_switch()
            self.music_player_active = False
        else:
            self.switchable.press_switch()
            self.music_player_active = True

...

music_player = MusicPlayer()
music_player_switch = MusicPlayerSwitch(music_player)

...

music_player_switch.press_switch()
music_player_switch.press_switch()
music_player_switch.press_switch()

Here we've added a music player to the program, expressed as MusicPlayer and MusicPlayerSwitch, similar to the logic of the ElectricCar and ElectricVehicleEngine interfaces mentioned earlier, which returns:

ON: Car switched on.
OFF: Car switched off.
ON: Car switched on.
ON: Music player switched on.
OFF: Music player switched off.
ON: Music player switched on.

All that we've done is add the interfaces, which make the code readable and easy to maintain as the program gets bigger.

Pros (Why would we use them in data engineering)?

You should consider using them if you’re building data platforms or data pipelines with

  1. complex state management

  2. frequent update requirements

This becomes more apparent when you have data solutions with several underlying processes expected to scale over time. This is because:

  • your code becomes more modular, which means you can extend its behaviours and make changes over time easier

  • SOLID principles reduce the risks of introducing unexpected bugs into your data pipelines, increasing data quality and reliability

  • it reduces development time by forcing you to think about the specific design of your code from the project’s jump

Once a data platform's requirements grow over time, other factors such as memory management, performance monitoring, and latency optimization will quickly become part of the prioritization list, and guiding the platform's design using these principles may save you plenty of headaches down the line.

Cons (Why wouldn’t we use them in data engineering)?

SOLID principles may not be a good idea if you need to create data pipelines only, especially small or quick prototypes.

There has to be a larger vision in place for the data workflows to make implementing these design principles worthwhile. Here are some of the reasons why:

  • SOLID principles introduce unnecessary complexities that could easily be replaced with simpler and equally effective implementations

  • certain design patterns may be satisfied with prioritizing performance over long-term maintainability, making SOLID concepts less applicable

  • experimental environments may prefer the freedom of rapid iterations which OOP may not offer in the short-term

These design principles require you to treat data workflows as software applications in their own right, and standalone data pipelines may not always qualify for SOLID principle use cases.

Conclusion

In summary, understanding the pros and cons of each can enable OOP and functional programming to co-exist in the same projects but for different use cases. Functional programming may be preferred for experimental areas, while OOP may be more suitable for the final production development, depending on the development team’s use cases.

Feel free to reach out via my handles: LinkedIn| Email | Twitter