SOLID principles in data engineering - Part 3

Photo by Ian Noble on Unsplash

SOLID principles in data engineering - Part 3

The Functional Programming (FP) version

Play this article

Preface🚀

This blog aims to explore the question “Can functional programming comply with SOLID principles (using Python)?”.

So…can functional programming comply with SOLID principles?🤔

To answer this question, we first need to outline the primary aim of SOLID principles in the first place. Why do we actually need them?

When you plan on creating a solution expected to grow over time, it's important to get the design right from the start. Failure to do so can lead to unexpected behaviours and bugs that become increasingly difficult and expensive to fix as the solution grows in size. This is where selecting the right design patterns can save a lot of time and money.

SOLID principles are used to guide our design decisions that result in code that is easier to test, manage and reuse code over time. Although these principles were initially intended to be for object-oriented (OO) applications, the core benefits and rules of each principle apply to functional programming, especially for data engineering projects.

This blog post aims to prove that through the use of:

  • pure functions - a function that always returns the same output (immutable values) when passing the same input multiple times (immutable inputs) with no side effects i.e. deterministic

  • higher-order functions - a function that either takes a function as an input or returns a function as an output

  • function composition - the process of chaining multiple functions together to create a new function

  • dependency injections - the process of inserting a resource or behaviour required by code within a function as an input parameter of the main function

There are other functional programming concepts we could use for this blog, but we’ll keep the focus on these for now.

How are SOLID principles translated into functional programming?😵‍💫

  • Single responsibility principle (SRP) - each function must have one responsibility i.e. it may do more than one thing but have a single purpose it’s focused on achieving

  • Open/close principle (OCP) - The source code for each function should be open for extension but closed for modification

  • Liskov substitution principle (LSP) - Each function should be able to be swapped for another function sharing the same signature without altering the program’s behaviour

  • Interface segregation principle (ISP) - Each function should not depend on functions it does not need

  • Dependency inversion principle (DIP) - All functions should depend on input arguments instead of behaviour hard-coded into the function

OOP vs FP’s interpretation of SOLID principles💼

SOLID principleOOPFunctional programming
Single responsibility principle (SRP)Every class and method must have only one reason to changeEvery function must have only one reason to change
Open/close principle (OCP)Every class and method must be open for extension (using techniques like inheritance, composition and polymorphism), but close for modification.Every function must be open for extension (using techniques like functional composition, higher-order functions and currying), but closed for modification.
Liskov substitution principle (LSP)Every child class must be able to be substituted for its parent class without unexpected behaviour occurring in the programEvery input argument should be able to be substituted for another argument that shares the same subtype without unexpected behaviour occurring in the program
Interface segregation principle (ISP)No child class should depend on any methods from its parent class it does not useNo functions should depend on functions or external operations it does not need (from input arguments or global variables)
Dependency inversion principle (DIP)Every class and method should depend on abstractions, not concrete implementationsEvery function should depend on input arguments only, not concrete operations

Demo🎮

I will be using the same code examples from the 1st blog post on SOLID principles in data engineering but in a functional programming format to demonstrate how each example violates and satisfies each of the SOLID principles from an FP perspective.

Let’s begin the exploration!

1. Single responsibility principle (SRP)🎯

The single responsibility principle (SRP) declares that a function must only change for a single reason, which means even though a function may possess multiple activities, it must have only one objective in a large unit of work. This is where separation of concerns occurs, where you ensure each part of a program is responsible for doing one thing only and doing it well.

For example, if the business requires a certain data pipeline serving a team to be processed faster, this could be considered a single reason for change. So, the code responsible for improving the performance should be separated from other parts of the program with different responsibilities.

Examples

We’ll be creating a simple bank account where we will perform simple activities on (click here for the object-oriented programming version of this example):

A. Principle violation

from typing import Tuple

def process_customer_money(account_number:  int, 
                            balance:        int,
                            operation:      str,
                            amount:         int=0) -> Tuple[int, int]:

    if operation == "deposit":
        balance += amount
        print(f'New balance: {balance} ')

    elif operation == "withdraw":
        if amount > balance:
            raise ValueError("Unfortunately your balance is insufficient for any withdrawals right now...")
        balance -= amount
        print(f'New balance: {balance} ')

    elif operation == "print":
        print(f'Account no:{account_number}, Balance: {balance}')

    elif operation == "change_account_number":
        account_number = amount
        print(f'Your account number has changed to "{account_number}" ')

    return account_number, balance


process_customer_money(account_number=123, balance=510, operation="withdraw", amount=100)

Unfortunately, this example does not satisfy SRP because the process_customer_money function is responsible for several operations, like deposits, withdrawals, printing balances etc.

B. Principle satisfaction

Let’s try and get the code in harmony with SRP:

from typing import Tuple

def deposit_money(account_number: int, balance: float, amount: int) -> Tuple[int, int]:
    return account_number, balance + amount

def withdraw_money(account_number: int, balance: float, amount: int) -> Tuple[int, int]:
    if amount > balance:
        raise ValueError("Unfortunately your balance is insufficient for any withdrawals right now...")
    return account_number, balance - amount


def print_balance(account_number: int, balance: float) -> str:
    return f"Account no: {account_number}, New balance: {balance}"

def change_account_number(current_account_number: int, new_account_number: int) -> str:
        return f'Your account number has changed to "{new_account_number}" '



# Display results
my_account_details = print_balance(account_number=12345678, balance=540.00)
print(my_account_details)

By splitting the large process_customer_money function into smaller independent functions, we increase the modularity in the code. This makes it easy to create tests and manage the general code’s behaviour over time.

C. Codebase extension example

Imagine we get a new request from management expecting us to perform transfers between accounts without making changes to the existing codebase.

All we need to do is add a new function like so:


def transfer_money(account_no1:     int, 
                    balance1:       float, 
                    account_no2:    int, 
                    balance2:       float, 
                    amount:         float) -> Tuple[ Tuple[int, float], Tuple[int, float]  ]:

        account_no1, balance1 = withdraw_money(account_no1, balance1, amount)
        account_no2, balance2 = deposit_money(account_no2, balance2, amount)

        return (account_no1, balance1), (account_no2, balance2)

… and here’s what an actual transfer looks like:

# Set up accounts
account_1    =    (12345678,   850.00)
account_2    =    (87654321,   400.00)

# Transfer 100.00 from account_1 to account_2
account_1, account_2 = transfer_money(account_1[0],   account_1[1],
                                      account_2[0],   account_2[1],
                                      100.00
                                        )

# Display transfer details
print(print_balance(account_1[0], account_1[1]))
print(print_balance(account_2[0], account_2[1]))

…and this results in:

Account no: 12345678, New balance: 750.0
Account no: 87654321, New balance: 500.0

2. Open/close principle (OCP)🔐

The open-close principle (OCP) declares that a function should be open for extending behaviours, but closed for any modification. This is achieved using higher-order functions and composition.

Examples

Here we create a robot that detects objects using different sensors - (click here for the object-oriented programming version of this example):

A. Principle violation

def detect_object(sensor_type: str) -> None:

    if sensor_type == "temperature":
        print("Detecting objects using temperature sensor ...")

    elif sensor_type == "ultrasonic":
        print("Detecting objects using ultrasonic sensor ...")

    elif sensor_type == "infrared":
        print("Detecting objects using infrared sensor ...")

detect_object("infrared")

If we need to add another sensor to the robot’s detect_object operation, we would need to amend the existing code, meaning this current approach doesn’t satisfy the open-close principle.

B. Principle satisfaction

from typing import Callable

# Create higher-order function that receives different sensors
def detect_with_sensor(*sensors: Callable) -> None:
    for i, sensor in enumerate(sensors):
        print(f'Sensor {i + 1}:')
        sensor()

# Express sensors as functions
def use_temperature_sensor() -> None:
    print("Detecting objects using temperature sensor ...")


def use_ultrasonic_sensor() -> None:
    print("Detecting objects using ultrasonic sensor ...")


def use_infrared_sensor() -> None:
    print("Detecting objects using infrared sensor ...")


# Detect the objects using different sensors 
detect_with_sensor(use_ultrasonic_sensor, use_temperature_sensor)

In this example, we use a higher-order function, detect_with_sensor, to pass in different sensor functions as input arguments into it. The Callable object is a type hint used to indicate that each sensor the detect_with_sensor takes in doesn’t need any input arguments and returns nothing.

This approach leaves us with enough flexibility to simply append new sensors as functions to the robot without changing its existing codebase.

C. Codebase extension example

Let’s assume we’ve purchased two new sensors for the robot - a camera sensor and a proximity sensor. Let’s now attach them to the robot:

def use_camera_sensor() -> None:
    print("Detecting objects using camera sensor ...")


def use_proximity_sensor() -> None:
    print("Detecting objects using proximity sensor ...")

… and adding this to the higher-order function …

# Detect the objects using different sensors 
detect_with_sensor(use_ultrasonic_sensor, 
                   use_temperature_sensor, 
                   use_camera_sensor,        # new camera sensor
                   use_proximity_sensor      # new proximity sensor
)

…which results in…

Sensor 1:
Detecting objects using ultrasonic sensor ...
Sensor 2:
Detecting objects using temperature sensor ...
Sensor 3:
Detecting objects using camera sensor ...
Sensor 4:
Detecting objects using proximity sensor ...

3. Liskov substitution principle (LSP)🔄

From a functional programming perspective, the Liskov substitution principle (LSP) declares that a function must be able to be swapped for another function that shares the same function signature, without any unexpected behaviour.

A function signature includes both the inputs (like the types and number of arguments used) and outputs (results and their types) that make up the function.

This principle emphasises that a function should only be interchangeable with another function that holds the same parameters and return type while behaving as expected.

This doesn’t necessarily mean that both functions must return identical outputs. Instead, it implies that the substituted function should not cause any bugs, and it should behave in harmony with the logical expectations of the code (i.e. it should make real-world sense for the replacement function to be there).

Examples

Click here to view the object-oriented programming version of this example:

A. Principle violation

from typing import Callable

def use_household_item(turn_on:                Callable[ [], None ], 
                        turn_off:              Callable[ [], None ], 
                        change_temperature:    Callable[ [], None ]) -> None:
    turn_on()
    change_temperature()
    turn_off()

def turn_on_fridge() -> None:
    print("Refrigerator turned on.")

def turn_off_fridge() -> None:
    print("Refrigerator turned off.")

def change_temperature_fridge() -> None:
    print("Refrigerator temperature changed.")

def turn_on_laptop() -> None:
    print("Laptop turned on.")

def turn_off_laptop() -> None:
    print("Laptop turned off.")

use_household_item(turn_on_fridge, turn_off_fridge, change_temperature_fridge)

# This is where the violation occurs because it's not possible to change the temperature of a laptop
use_household_item(turn_on_laptop, turn_off_laptop, change_temperature_fridge)

The Liskov substitution principle is violated because the change_temperature_fridge function was passed into the use_household_item function as a third argument to change the temperature of the laptop, even though we can’t change the temperature of laptops (like we do for fridges). This would cause an error because the change_temperature_fridge function is not programmed to configure any laptop’s temperature, which could result in unexpected behaviour.

B. Principle satisfaction

from typing import Callable

def use_temperature_controlled_item(turn_on:                Callable[ [], None ], 
                                    turn_off:               Callable[ [], None ], 
                                    change_temperature:     Callable[ [], None ]) -> None:
    turn_on()
    change_temperature()
    turn_off()

def turn_on_fridge() -> None:
    print("Refrigerator turned on.")

def turn_off_fridge() -> None:
    print("Refrigerator turned off.")

def change_temperature_fridge() -> None:
    print("Refrigerator temperature changed.")

use_temperature_controlled_item(turn_on_fridge, turn_off_fridge, change_temperature_fridge)

To comply with LSP, we created a higher-order function, use_temperature_controlled_item (a function designed to only accepts household appliances that support temperature control), that addresses the function’s signature.

Let’s break down this function’s signature:

The use_temperature_controlled_item function takes in 3 other functions as arguments, turn_on_fridge, turn_off_fridge, and change_temperature_fridge, where each function is a Callable[ [], None ] type, meaning

  1. they do not take in input arguments of their own (i.e. []), and

  2. they do not return any outputs either (i.e. None).

This structure complies with LSP because we can pass in any function that can be exchanged for turn_on_fridge, turn_off_fridge, and change_temperature_fridge into the use_temperature_controlled_item function without causing it to behave incorrectly.

In other words, a function passed into the use_temperature_controlled_item function complies with LSP if it has no arguments and also returns no values, and therefore complying with the use_temperature_controlled_item function’s signature.

So to satisfy LSP in functional programming, you must be able to swap a function with another function that shares the same signature without any changes to the program’s behaviour.

C. Codebase extension example

We can also add another temperature-controlled item without touching the use_temperature_controlled_item function, like so:

def turn_on_oven() -> None:
    print("Oven turned on.")

def turn_off_oven() -> None:
    print("Oven turned off.")

def change_temperature_oven() -> None:
    print("Oven temperature changed.")

use_temperature_controlled_item(turn_on_oven, turn_off_oven, change_temperature_oven)

… this results in:

Oven turned on.
Oven temperature changed.
Oven turned off.

4. Interface segregation principle (ISP)🛠️

The interface segregation principle (ISP) declares that a function should not be obligated to depend on any operation it doesn’t use. These operations include:

  • other functions

  • variables

  • input parameters

Examples

Let’s demonstrate what this means using animals that exhibit different behaviours (click here to view the object-oriented programming version of this example):

A. Principle violation

from typing import Callable 

# Create higher-rder function for interacting with animals
def interact_with_animal(make_sound: Callable[ [], None ], 
                        swim:        Callable[ [], None ], 
                        fly:         Callable[ [], None ]) -> None:
    make_sound()
    swim()
    fly()

# 1. Create functions for interacting with ducks
def make_duck_sound() -> None:
    print("Quack! Quack!")

def make_duck_swim() -> None:
    print("Duck is now swimming in the water...")


def make_duck_fly() -> None:
    print("Duck is now flying in the air...")

# 2. Create functions for interacting with cat
def make_cat_sound() -> None:
    print("Meow! Meow!")

def make_cat_swim() -> None:
    raise NotImplementedError("Cats do not swim!")


def make_cat_fly() -> None:
    raise NotImplementedError("Cats do not fly!")


# Interact with animals
interact_with_animal(make_duck_sound, make_duck_swim, make_duck_fly)

##### This is where we force the cat to swim and fly 
interact_with_animal(make_cat_sound, make_cat_swim, make_cat_fly)

…this returns:

ERROR!
Quack! Quack!
Duck is now swimming in the water...
Duck is now flying in the air...
Meow! Meow!
Traceback (most recent call last):
  File "<string>", line 39, in <module>
  File "<string>", line 5, in interact_with_animal
  File "<string>", line 30, in make_cat_swim
NotImplementedError: Cats do not swim!

Our example shows the duck was able to fly, swim and make its distinct sounds without issues. However the cat is forced to perform actions it’s not naturally accustomed to, like swimming or flying, which means we’ve forced behaviours it doesn’t typically exhibit, thereby violating the interface segregation principle.

B. Principle satisfaction

from typing import Callable 

# Create higher-rder function for each behaviour 
def make_animal_sound(make_sound: Callable[ [], None]) -> None:
    make_sound()

def make_animal_swim(swim: Callable[ [], None]) -> None:
    swim()


def make_animal_fly(fly: Callable[ [], None]) -> None:
    fly()



# 1. Create functions for interacting with ducks
def make_duck_sound() -> None:
    print("Quack! Quack!")

def make_duck_swim() -> None:
    print("Duck is now swimming in the water...")


def make_duck_fly() -> None:
    print("Duck is now flying in the air...")

# 2. Create functions for interacting with cat
def make_cat_sound() -> None:
    print("Meow! Meow!")

# Interact with animals

# 1. Duck
make_animal_sound(make_duck_sound)
make_animal_swim(make_duck_swim)
make_animal_fly(make_duck_fly)

# 2. Cat
make_animal_sound(make_cat_sound)

We’ve now divided the interact_with_animal function into 3 separate behaviours:

  1. make_animal_sound

  2. make_animal_swim

  3. make_animal_fly

This way, when we are interacting with the cat, we are no longer forcing it to fly or swim - we invoke the make_animal_sound function, which will prompt it to simply “Meow” when expected. This ensures we do not force the cat to depend on interfaces it doesn’t necessarily use, therefore satisfying ISP in the process.

C. Codebase extension example

Let’s add a dog to our code base:

def make_dog_sound() -> None:
    print("Woof! Woof!")

make_animal_sound(make_dog_sound)

…our code results in:

Woof! Woof!

5. Dependency inversion principle (DIP)🧩

The dependency inversion principle (DIP) declares that all modules, irrespective of their levels, should depend on abstractions, and not on concretions. Any dependency on concretions is a direct violation of DIP.

In the context of functional programming,

  • modules are the same as functions

  • abstractions are the same as input parameters

  • concretions are the same as global variables and hard-coded values or operations in any function

This principle emphasises the use of dependency injections to avoid tight coupling with any modules or variables, which makes it easy to manage, extend and test the code’s behaviour over time.

Examples

Click here to view the object-oriented programming version of this example:

A. Principle violation

# Set the state of the music player
music_player_state = False

# Print the state of the music player
def display_music_player_state() -> None:
    if music_player_state:
        print("ON: Music player switched on.")
    else:
        print("OFF: Music player switched off.")


# Create the music player switch
def press_music_player_switch() -> None:
    global music_player_state
    music_player_state = not music_player_state
    display_music_player_state()

# Press the music player switch
press_music_player_switch()
press_music_player_switch()
press_music_player_switch()

…this results in:

ON: Music player switched on.
OFF: Music player switched off.
ON: Music player switched on.

In this code, the display_music_player_state is a concrete implementation that shows whether the music player is on or off. Another concrete implementation is the music_player_state, a global variable that holds the boolean value of the current state of the music player. The press_music_player_switchfunction depends on both of these objects and is in direct violation of the dependency inversion principle.

B. Principle satisfaction

from typing import Callable

# Toggle the state
def toggle_state(state: bool) -> bool:
    return not state

# Display the state
def display_state(state: bool) -> None:
    if state:
        print("ON: Music player switched on.")
    else:
        print("OFF: Music player switched off.")

# Handle the switch pressing 
def press_switch(state:  bool, 
                 toggle:  Callable[ [bool], bool], 
                 display: Callable[ [bool], None] ) -> bool:
    new_state = toggle(state)
    display(new_state)

    return new_state

# Set the initial state
music_player_state = False

# Press the switch

### This is where the DIP is satisfied by only depending on injected dependencies 
music_player_state = press_switch(music_player_state, toggle_state, display_state)
music_player_state = press_switch(music_player_state, toggle_state, display_state)
music_player_state = press_switch(music_player_state, toggle_state, display_state)

Here is what this code does:

  • toggle_state - this toggles the state of the music player between on and off. It takes in a boolean value and returns the opposite value e.g. if it’s given False, it returns True instead (and vice versa)

  • display_state - this prints the current state of the music player. It also takes in a boolean value and prints a message indicating whether the music player is on or off.

  • press_switch - this is the button that turns the music player on or off. It takes in 3 arguments - the current state of the music player, the toggle function and the display function. The current state is collected (i.e. 1st argument) to feed into the toggle function (i.e. 2nd argument) to create a new state, and then the new state is displayed using the display function (i.e. 3rd argument).

Once the code is initialized, it returns this output:

ON: Music player switched on.
OFF: Music player switched off.
ON: Music player switched on.

This complies with the dependency inversion principle because the press_switch function depends on the input arguments instead of any concrete implementations in the code. In other words, it depends on the toggle_state and display_state being passed into the press_switch function’s parameters instead of it hard coded into the function itself or referenced via global variables.

C. Codebase extension example

Now let’s imagine we manufacture music players. We have a goal of enhancing the user experience by adding a new feature for adjusting the volume of the music player. This means we need to make the current code more robust. How do we do that without making any changes to what’s already there?

Here’s the code added to the music player’s current script:

from typing import Callable, Tuple

# Add a feature for changing volume
def change_volume(volume: int, increment_counter: int) -> int:
    new_volume = volume + increment_counter

    if new_volume < 0:
        new_volume = 0
    elif new_volume > 100:
        new_volume = 100
    print(f'New volume: {new_volume}  ')

    return new_volume


# Add a feature for using music player 
def use_music_player(state: bool, 
                    volume: int, 
                    operations: Callable[ [bool, int], Tuple[bool, int] ]) -> Tuple[bool, int]:
    new_state, new_volume = operations(state, volume)

    return new_state, new_volume

# Set up initial constants
current_state    =  False
current_volume   =  0

# A. Increase the operations of the music player 
def increase_operations(state: bool, volume: int) -> Tuple[bool, int]:
    new_state = toggle_state(state)
    display_state(new_state)
    new_volume = change_volume(volume, 35)

    return new_state, new_volume

current_state, current_volume = use_music_player(current_state, current_volume, increase_operations)

# B. Reduce the operations of the music player 
def decrease_operations(state: bool, volume: int) -> Tuple[bool, int]:
    new_state = toggle_state(state)
    display_state(new_state)
    new_volume = change_volume(volume, -20)

    return new_state, new_volume

# Use the music player
current_state, current_volume = use_music_player(current_state, current_volume, decrease_operations)

This may look like a lot of code but don’t worry, it’s actually simple to understand:

  • change_volume - this changes the volume of the music player between 0 and 100 based on the increment given to it, and then prints out the new value of the volume. Volume defaults to 0 if the new volume returns a negative value, and the same defaults to 100 if the new volume returns a value higher than 100.

  • use_music_player - this is the higher-order function for using the music player. Similar to the press_switch function from the previous example, this also takes 3 arguments, where the current state and volume values occupy the first 2 parameters, and the operations are passed into the 3rd one to use the current state and volume values to create the new state and volume values.

  • increase_operations - this is used to increase the current volume and toggle the state of the music player

  • decrease_operations- this is used to decrease the current volume and toggle the state of the music player

Implementing the use_music_player function with the relevant parameters gives us this output:

ON: Music player switched on.
New volume: 35  
OFF: Music player switched off.
New volume: 15

Pros (Why would we use them in data engineering)?👍

You should consider combining SOLID principles with functional programming if you’re building data platforms or data pipelines that:

  • perform parallel and distributed computing operations

  • use pure functions as data transformation and aggregation jobs

The benefits of combining SOLID principles with FP for data engineering include:

  • modularity - this approach allows developers to easily extend any function’s behaviours without worrying about introducing bugs

  • concurrent and parallel - immutable data structures are suited for concurrent and parallel computing because data can be safely replicated and processed across multiple computers, which enables developers to create workflows that reduce the chances of data loss and corruption

  • reusability - you can introduce new functions based on existing ones using partial application, currying and higher-order functions with ease

Cons (Why should we not use them in data engineering)?👎

SOLID principles with functional programming may not be suitable for data solutions that strongly require:

  • Complex state management

  • Frequent updates to the data

Some of the drawbacks of this approach include:

  • performance & memory dip - data workflows decrease in performance and memory space using immutable data structures because of the increased overhead in maintaining multiple copies of data

  • limited resources - FP is still a growing programming paradigm, and its application with SOLID principles is less popular. As a result, there may be a deficit in tooling and resources compared to OOP, especially in data engineering use cases in Python

Conclusion🎬

Based on the functional programming techniques used in the examples in the blog, we can conclude that functional programming can be applied to all 5 SOLID principles from a data engineering standpoint, but admittedly, has its distinct interpretation of these principles which are considerably different from OOP. The difference may present a more steep learning curve for those more familiar with OOP.

While functional programming can be applied to the SOLID principles, it’s important to point out not every project would require strict compliance with these principles. Only the specific requirements of the project should guide the relevance of each principle, which means engineers should use their professional discretion to determine the appropriate level of adherence to ensure all code used is easy to maintain, read, and still performs as expected.

Feel free to reach out via my handles: LinkedIn| Email | Twitter