SOLID principles in data engineering - Part 3
The Functional Programming (FP) version
Table of contents
- Prefaceš
- Soā¦can functional programming comply with SOLID principles?š¤
- How are SOLID principles translated into functional programming?šµāš«
- OOP vs FPās interpretation of SOLID principlesš¼
- Demoš®
- 1. Single responsibility principle (SRP)šÆ
- 2. Open/close principle (OCP)š
- 3. Liskov substitution principle (LSP)š
- 4. Interface segregation principle (ISP)š ļø
- 5. Dependency inversion principle (DIP)š§©
- Pros (Why would we use them in data engineering)?š
- Cons (Why should we not use them in data engineering)?š
- Conclusionš¬
Prefaceš
This blog aims to explore the question āCan functional programming comply with SOLID principles (using Python)?ā.
Soā¦can functional programming comply with SOLID principles?š¤
To answer this question, we first need to outline the primary aim of SOLID principles in the first place. Why do we actually need them?
When you plan on creating a solution expected to grow over time, it's important to get the design right from the start. Failure to do so can lead to unexpected behaviours and bugs that become increasingly difficult and expensive to fix as the solution grows in size. This is where selecting the right design patterns can save a lot of time and money.
SOLID principles are used to guide our design decisions that result in code that is easier to test, manage and reuse code over time. Although these principles were initially intended to be for object-oriented (OO) applications, the core benefits and rules of each principle apply to functional programming, especially for data engineering projects.
This blog post aims to prove that through the use of:
pure functions - a function that always returns the same output (immutable values) when passing the same input multiple times (immutable inputs) with no side effects i.e. deterministic
higher-order functions - a function that either takes a function as an input or returns a function as an output
function composition - the process of chaining multiple functions together to create a new function
dependency injections - the process of inserting a resource or behaviour required by code within a function as an input parameter of the main function
There are other functional programming concepts we could use for this blog, but weāll keep the focus on these for now.
How are SOLID principles translated into functional programming?šµāš«
Single responsibility principle (SRP) - each function must have one responsibility i.e. it may do more than one thing but have a single purpose itās focused on achieving
Open/close principle (OCP) - The source code for each function should be open for extension but closed for modification
Liskov substitution principle (LSP) - Each function should be able to be swapped for another function sharing the same signature without altering the programās behaviour
Interface segregation principle (ISP) - Each function should not depend on functions it does not need
Dependency inversion principle (DIP) - All functions should depend on input arguments instead of behaviour hard-coded into the function
OOP vs FPās interpretation of SOLID principlesš¼
SOLID principle | OOP | Functional programming |
Single responsibility principle (SRP) | Every class and method must have only one reason to change | Every function must have only one reason to change |
Open/close principle (OCP) | Every class and method must be open for extension (using techniques like inheritance, composition and polymorphism), but close for modification. | Every function must be open for extension (using techniques like functional composition, higher-order functions and currying), but closed for modification. |
Liskov substitution principle (LSP) | Every child class must be able to be substituted for its parent class without unexpected behaviour occurring in the program | Every input argument should be able to be substituted for another argument that shares the same subtype without unexpected behaviour occurring in the program |
Interface segregation principle (ISP) | No child class should depend on any methods from its parent class it does not use | No functions should depend on functions or external operations it does not need (from input arguments or global variables) |
Dependency inversion principle (DIP) | Every class and method should depend on abstractions, not concrete implementations | Every function should depend on input arguments only, not concrete operations |
Demoš®
I will be using the same code examples from the 1st blog post on SOLID principles in data engineering but in a functional programming format to demonstrate how each example violates and satisfies each of the SOLID principles from an FP perspective.
Letās begin the exploration!
1. Single responsibility principle (SRP)šÆ
The single responsibility principle (SRP) declares that a function must only change for a single reason, which means even though a function may possess multiple activities, it must have only one objective in a large unit of work. This is where separation of concerns occurs, where you ensure each part of a program is responsible for doing one thing only and doing it well.
For example, if the business requires a certain data pipeline serving a team to be processed faster, this could be considered a single reason for change. So, the code responsible for improving the performance should be separated from other parts of the program with different responsibilities.
Examples
Weāll be creating a simple bank account where we will perform simple activities on (click here for the object-oriented programming version of this example):
A. Principle violation
from typing import Tuple
def process_customer_money(account_number: int,
balance: int,
operation: str,
amount: int=0) -> Tuple[int, int]:
if operation == "deposit":
balance += amount
print(f'New balance: {balance} ')
elif operation == "withdraw":
if amount > balance:
raise ValueError("Unfortunately your balance is insufficient for any withdrawals right now...")
balance -= amount
print(f'New balance: {balance} ')
elif operation == "print":
print(f'Account no:{account_number}, Balance: {balance}')
elif operation == "change_account_number":
account_number = amount
print(f'Your account number has changed to "{account_number}" ')
return account_number, balance
process_customer_money(account_number=123, balance=510, operation="withdraw", amount=100)
Unfortunately, this example does not satisfy SRP because the process_customer_money
function is responsible for several operations, like deposits, withdrawals, printing balances etc.
B. Principle satisfaction
Letās try and get the code in harmony with SRP:
from typing import Tuple
def deposit_money(account_number: int, balance: float, amount: int) -> Tuple[int, int]:
return account_number, balance + amount
def withdraw_money(account_number: int, balance: float, amount: int) -> Tuple[int, int]:
if amount > balance:
raise ValueError("Unfortunately your balance is insufficient for any withdrawals right now...")
return account_number, balance - amount
def print_balance(account_number: int, balance: float) -> str:
return f"Account no: {account_number}, New balance: {balance}"
def change_account_number(current_account_number: int, new_account_number: int) -> str:
return f'Your account number has changed to "{new_account_number}" '
# Display results
my_account_details = print_balance(account_number=12345678, balance=540.00)
print(my_account_details)
By splitting the large process_customer_money
function into smaller independent functions, we increase the modularity in the code. This makes it easy to create tests and manage the general codeās behaviour over time.
C. Codebase extension example
Imagine we get a new request from management expecting us to perform transfers between accounts without making changes to the existing codebase.
All we need to do is add a new function like so:
def transfer_money(account_no1: int,
balance1: float,
account_no2: int,
balance2: float,
amount: float) -> Tuple[ Tuple[int, float], Tuple[int, float] ]:
account_no1, balance1 = withdraw_money(account_no1, balance1, amount)
account_no2, balance2 = deposit_money(account_no2, balance2, amount)
return (account_no1, balance1), (account_no2, balance2)
ā¦ and hereās what an actual transfer looks like:
# Set up accounts
account_1 = (12345678, 850.00)
account_2 = (87654321, 400.00)
# Transfer 100.00 from account_1 to account_2
account_1, account_2 = transfer_money(account_1[0], account_1[1],
account_2[0], account_2[1],
100.00
)
# Display transfer details
print(print_balance(account_1[0], account_1[1]))
print(print_balance(account_2[0], account_2[1]))
ā¦and this results in:
Account no: 12345678, New balance: 750.0
Account no: 87654321, New balance: 500.0
2. Open/close principle (OCP)š
The open-close principle (OCP) declares that a function should be open for extending behaviours, but closed for any modification. This is achieved using higher-order functions and composition.
Examples
Here we create a robot that detects objects using different sensors - (click here for the object-oriented programming version of this example):
A. Principle violation
def detect_object(sensor_type: str) -> None:
if sensor_type == "temperature":
print("Detecting objects using temperature sensor ...")
elif sensor_type == "ultrasonic":
print("Detecting objects using ultrasonic sensor ...")
elif sensor_type == "infrared":
print("Detecting objects using infrared sensor ...")
detect_object("infrared")
If we need to add another sensor to the robotās detect_object
operation, we would need to amend the existing code, meaning this current approach doesnāt satisfy the open-close principle.
B. Principle satisfaction
from typing import Callable
# Create higher-order function that receives different sensors
def detect_with_sensor(*sensors: Callable) -> None:
for i, sensor in enumerate(sensors):
print(f'Sensor {i + 1}:')
sensor()
# Express sensors as functions
def use_temperature_sensor() -> None:
print("Detecting objects using temperature sensor ...")
def use_ultrasonic_sensor() -> None:
print("Detecting objects using ultrasonic sensor ...")
def use_infrared_sensor() -> None:
print("Detecting objects using infrared sensor ...")
# Detect the objects using different sensors
detect_with_sensor(use_ultrasonic_sensor, use_temperature_sensor)
In this example, we use a higher-order function, detect_with_sensor
, to pass in different sensor functions as input arguments into it. The Callable
object is a type hint used to indicate that each sensor the detect_with_sensor
takes in doesnāt need any input arguments and returns nothing.
This approach leaves us with enough flexibility to simply append new sensors as functions to the robot without changing its existing codebase.
C. Codebase extension example
Letās assume weāve purchased two new sensors for the robot - a camera sensor and a proximity sensor. Letās now attach them to the robot:
def use_camera_sensor() -> None:
print("Detecting objects using camera sensor ...")
def use_proximity_sensor() -> None:
print("Detecting objects using proximity sensor ...")
ā¦ and adding this to the higher-order function ā¦
# Detect the objects using different sensors
detect_with_sensor(use_ultrasonic_sensor,
use_temperature_sensor,
use_camera_sensor, # new camera sensor
use_proximity_sensor # new proximity sensor
)
ā¦which results inā¦
Sensor 1:
Detecting objects using ultrasonic sensor ...
Sensor 2:
Detecting objects using temperature sensor ...
Sensor 3:
Detecting objects using camera sensor ...
Sensor 4:
Detecting objects using proximity sensor ...
3. Liskov substitution principle (LSP)š
From a functional programming perspective, the Liskov substitution principle (LSP) declares that a function must be able to be swapped for another function that shares the same function signature, without any unexpected behaviour.
A function signature includes both the inputs (like the types and number of arguments used) and outputs (results and their types) that make up the function.
This principle emphasises that a function should only be interchangeable with another function that holds the same parameters and return type while behaving as expected.
This doesnāt necessarily mean that both functions must return identical outputs. Instead, it implies that the substituted function should not cause any bugs, and it should behave in harmony with the logical expectations of the code (i.e. it should make real-world sense for the replacement function to be there).
Examples
Click here to view the object-oriented programming version of this example:
A. Principle violation
from typing import Callable
def use_household_item(turn_on: Callable[ [], None ],
turn_off: Callable[ [], None ],
change_temperature: Callable[ [], None ]) -> None:
turn_on()
change_temperature()
turn_off()
def turn_on_fridge() -> None:
print("Refrigerator turned on.")
def turn_off_fridge() -> None:
print("Refrigerator turned off.")
def change_temperature_fridge() -> None:
print("Refrigerator temperature changed.")
def turn_on_laptop() -> None:
print("Laptop turned on.")
def turn_off_laptop() -> None:
print("Laptop turned off.")
use_household_item(turn_on_fridge, turn_off_fridge, change_temperature_fridge)
# This is where the violation occurs because it's not possible to change the temperature of a laptop
use_household_item(turn_on_laptop, turn_off_laptop, change_temperature_fridge)
The Liskov substitution principle is violated because the change_temperature_fridge
function was passed into the use_household_item
function as a third argument to change the temperature of the laptop, even though we canāt change the temperature of laptops (like we do for fridges). This would cause an error because the change_temperature_fridge
function is not programmed to configure any laptopās temperature, which could result in unexpected behaviour.
B. Principle satisfaction
from typing import Callable
def use_temperature_controlled_item(turn_on: Callable[ [], None ],
turn_off: Callable[ [], None ],
change_temperature: Callable[ [], None ]) -> None:
turn_on()
change_temperature()
turn_off()
def turn_on_fridge() -> None:
print("Refrigerator turned on.")
def turn_off_fridge() -> None:
print("Refrigerator turned off.")
def change_temperature_fridge() -> None:
print("Refrigerator temperature changed.")
use_temperature_controlled_item(turn_on_fridge, turn_off_fridge, change_temperature_fridge)
To comply with LSP, we created a higher-order function, use_temperature_controlled_item
(a function designed to only accepts household appliances that support temperature control), that addresses the functionās signature.
Letās break down this functionās signature:
The use_temperature_controlled_item
function takes in 3 other functions as arguments, turn_on_fridge
, turn_off_fridge
, and change_temperature_fridge
, where each function is a Callable[ [], None ]
type, meaning
they do not take in input arguments of their own
(i.e. [])
, andthey do not return any outputs either
(i.e. None)
.
This structure complies with LSP because we can pass in any function that can be exchanged for turn_on_fridge
, turn_off_fridge
, and change_temperature_fridge
into the use_temperature_controlled_item
function without causing it to behave incorrectly.
In other words, a function passed into the use_temperature_controlled_item
function complies with LSP if it has no arguments and also returns no values, and therefore complying with the use_temperature_controlled_item
functionās signature.
So to satisfy LSP in functional programming, you must be able to swap a function with another function that shares the same signature without any changes to the programās behaviour.
C. Codebase extension example
We can also add another temperature-controlled item without touching the use_temperature_controlled_item
function, like so:
def turn_on_oven() -> None:
print("Oven turned on.")
def turn_off_oven() -> None:
print("Oven turned off.")
def change_temperature_oven() -> None:
print("Oven temperature changed.")
use_temperature_controlled_item(turn_on_oven, turn_off_oven, change_temperature_oven)
ā¦ this results in:
Oven turned on.
Oven temperature changed.
Oven turned off.
4. Interface segregation principle (ISP)š ļø
The interface segregation principle (ISP) declares that a function should not be obligated to depend on any operation it doesnāt use. These operations include:
other functions
variables
input parameters
Examples
Letās demonstrate what this means using animals that exhibit different behaviours (click here to view the object-oriented programming version of this example):
A. Principle violation
from typing import Callable
# Create higher-rder function for interacting with animals
def interact_with_animal(make_sound: Callable[ [], None ],
swim: Callable[ [], None ],
fly: Callable[ [], None ]) -> None:
make_sound()
swim()
fly()
# 1. Create functions for interacting with ducks
def make_duck_sound() -> None:
print("Quack! Quack!")
def make_duck_swim() -> None:
print("Duck is now swimming in the water...")
def make_duck_fly() -> None:
print("Duck is now flying in the air...")
# 2. Create functions for interacting with cat
def make_cat_sound() -> None:
print("Meow! Meow!")
def make_cat_swim() -> None:
raise NotImplementedError("Cats do not swim!")
def make_cat_fly() -> None:
raise NotImplementedError("Cats do not fly!")
# Interact with animals
interact_with_animal(make_duck_sound, make_duck_swim, make_duck_fly)
##### This is where we force the cat to swim and fly
interact_with_animal(make_cat_sound, make_cat_swim, make_cat_fly)
ā¦this returns:
ERROR!
Quack! Quack!
Duck is now swimming in the water...
Duck is now flying in the air...
Meow! Meow!
Traceback (most recent call last):
File "<string>", line 39, in <module>
File "<string>", line 5, in interact_with_animal
File "<string>", line 30, in make_cat_swim
NotImplementedError: Cats do not swim!
Our example shows the duck was able to fly, swim and make its distinct sounds without issues. However the cat is forced to perform actions itās not naturally accustomed to, like swimming or flying, which means weāve forced behaviours it doesnāt typically exhibit, thereby violating the interface segregation principle.
B. Principle satisfaction
from typing import Callable
# Create higher-rder function for each behaviour
def make_animal_sound(make_sound: Callable[ [], None]) -> None:
make_sound()
def make_animal_swim(swim: Callable[ [], None]) -> None:
swim()
def make_animal_fly(fly: Callable[ [], None]) -> None:
fly()
# 1. Create functions for interacting with ducks
def make_duck_sound() -> None:
print("Quack! Quack!")
def make_duck_swim() -> None:
print("Duck is now swimming in the water...")
def make_duck_fly() -> None:
print("Duck is now flying in the air...")
# 2. Create functions for interacting with cat
def make_cat_sound() -> None:
print("Meow! Meow!")
# Interact with animals
# 1. Duck
make_animal_sound(make_duck_sound)
make_animal_swim(make_duck_swim)
make_animal_fly(make_duck_fly)
# 2. Cat
make_animal_sound(make_cat_sound)
Weāve now divided the interact_with_animal
function into 3 separate behaviours:
make_animal_sound
make_animal_swim
make_animal_fly
This way, when we are interacting with the cat, we are no longer forcing it to fly or swim - we invoke the make_animal_sound
function, which will prompt it to simply āMeowā when expected. This ensures we do not force the cat to depend on interfaces it doesnāt necessarily use, therefore satisfying ISP in the process.
C. Codebase extension example
Letās add a dog to our code base:
def make_dog_sound() -> None:
print("Woof! Woof!")
make_animal_sound(make_dog_sound)
ā¦our code results in:
Woof! Woof!
5. Dependency inversion principle (DIP)š§©
The dependency inversion principle (DIP) declares that all modules, irrespective of their levels, should depend on abstractions, and not on concretions. Any dependency on concretions is a direct violation of DIP.
In the context of functional programming,
modules are the same as functions
abstractions are the same as input parameters
concretions are the same as global variables and hard-coded values or operations in any function
This principle emphasises the use of dependency injections to avoid tight coupling with any modules or variables, which makes it easy to manage, extend and test the codeās behaviour over time.
Examples
Click here to view the object-oriented programming version of this example:
A. Principle violation
# Set the state of the music player
music_player_state = False
# Print the state of the music player
def display_music_player_state() -> None:
if music_player_state:
print("ON: Music player switched on.")
else:
print("OFF: Music player switched off.")
# Create the music player switch
def press_music_player_switch() -> None:
global music_player_state
music_player_state = not music_player_state
display_music_player_state()
# Press the music player switch
press_music_player_switch()
press_music_player_switch()
press_music_player_switch()
ā¦this results in:
ON: Music player switched on.
OFF: Music player switched off.
ON: Music player switched on.
In this code, the display_music_player_state
is a concrete implementation that shows whether the music player is on or off. Another concrete implementation is the music_player_state
, a global variable that holds the boolean value of the current state of the music player. The press_music_player_switch
function depends on both of these objects and is in direct violation of the dependency inversion principle.
B. Principle satisfaction
from typing import Callable
# Toggle the state
def toggle_state(state: bool) -> bool:
return not state
# Display the state
def display_state(state: bool) -> None:
if state:
print("ON: Music player switched on.")
else:
print("OFF: Music player switched off.")
# Handle the switch pressing
def press_switch(state: bool,
toggle: Callable[ [bool], bool],
display: Callable[ [bool], None] ) -> bool:
new_state = toggle(state)
display(new_state)
return new_state
# Set the initial state
music_player_state = False
# Press the switch
### This is where the DIP is satisfied by only depending on injected dependencies
music_player_state = press_switch(music_player_state, toggle_state, display_state)
music_player_state = press_switch(music_player_state, toggle_state, display_state)
music_player_state = press_switch(music_player_state, toggle_state, display_state)
Here is what this code does:
toggle_state
- this toggles the state of the music player between on and off. It takes in a boolean value and returns the opposite value e.g. if itās givenFalse
, it returnsTrue
instead (and vice versa)display_state
- this prints the current state of the music player. It also takes in a boolean value and prints a message indicating whether the music player is on or off.press_switch
- this is the button that turns the music player on or off. It takes in 3 arguments - the current state of the music player, the toggle function and the display function. The current state is collected (i.e. 1st argument) to feed into thetoggle
function (i.e. 2nd argument) to create a new state, and then the new state is displayed using thedisplay
function (i.e. 3rd argument).
Once the code is initialized, it returns this output:
ON: Music player switched on.
OFF: Music player switched off.
ON: Music player switched on.
This complies with the dependency inversion principle because the press_switch
function depends on the input arguments instead of any concrete implementations in the code. In other words, it depends on the toggle_state
and display_state
being passed into the press_switch
functionās parameters instead of it hard coded into the function itself or referenced via global variables.
C. Codebase extension example
Now letās imagine we manufacture music players. We have a goal of enhancing the user experience by adding a new feature for adjusting the volume of the music player. This means we need to make the current code more robust. How do we do that without making any changes to whatās already there?
Hereās the code added to the music playerās current script:
from typing import Callable, Tuple
# Add a feature for changing volume
def change_volume(volume: int, increment_counter: int) -> int:
new_volume = volume + increment_counter
if new_volume < 0:
new_volume = 0
elif new_volume > 100:
new_volume = 100
print(f'New volume: {new_volume} ')
return new_volume
# Add a feature for using music player
def use_music_player(state: bool,
volume: int,
operations: Callable[ [bool, int], Tuple[bool, int] ]) -> Tuple[bool, int]:
new_state, new_volume = operations(state, volume)
return new_state, new_volume
# Set up initial constants
current_state = False
current_volume = 0
# A. Increase the operations of the music player
def increase_operations(state: bool, volume: int) -> Tuple[bool, int]:
new_state = toggle_state(state)
display_state(new_state)
new_volume = change_volume(volume, 35)
return new_state, new_volume
current_state, current_volume = use_music_player(current_state, current_volume, increase_operations)
# B. Reduce the operations of the music player
def decrease_operations(state: bool, volume: int) -> Tuple[bool, int]:
new_state = toggle_state(state)
display_state(new_state)
new_volume = change_volume(volume, -20)
return new_state, new_volume
# Use the music player
current_state, current_volume = use_music_player(current_state, current_volume, decrease_operations)
This may look like a lot of code but donāt worry, itās actually simple to understand:
change_volume
- this changes the volume of the music player between 0 and 100 based on the increment given to it, and then prints out the new value of the volume. Volume defaults to 0 if the new volume returns a negative value, and the same defaults to 100 if the new volume returns a value higher than 100.use_music_player
- this is the higher-order function for using the music player. Similar to thepress_switch
function from the previous example, this also takes 3 arguments, where the current state and volume values occupy the first 2 parameters, and theoperations
are passed into the 3rd one to use the current state and volume values to create the new state and volume values.increase_operations
- this is used to increase the current volume and toggle the state of the music playerdecrease_operations
- this is used to decrease the current volume and toggle the state of the music player
Implementing the use_music_player
function with the relevant parameters gives us this output:
ON: Music player switched on.
New volume: 35
OFF: Music player switched off.
New volume: 15
Pros (Why would we use them in data engineering)?š
You should consider combining SOLID principles with functional programming if youāre building data platforms or data pipelines that:
perform parallel and distributed computing operations
use pure functions as data transformation and aggregation jobs
The benefits of combining SOLID principles with FP for data engineering include:
modularity - this approach allows developers to easily extend any functionās behaviours without worrying about introducing bugs
concurrent and parallel - immutable data structures are suited for concurrent and parallel computing because data can be safely replicated and processed across multiple computers, which enables developers to create workflows that reduce the chances of data loss and corruption
reusability - you can introduce new functions based on existing ones using partial application, currying and higher-order functions with ease
Cons (Why should we not use them in data engineering)?š
SOLID principles with functional programming may not be suitable for data solutions that strongly require:
Complex state management
Frequent updates to the data
Some of the drawbacks of this approach include:
performance & memory dip - data workflows decrease in performance and memory space using immutable data structures because of the increased overhead in maintaining multiple copies of data
limited resources - FP is still a growing programming paradigm, and its application with SOLID principles is less popular. As a result, there may be a deficit in tooling and resources compared to OOP, especially in data engineering use cases in Python
Conclusionš¬
Based on the functional programming techniques used in the examples in the blog, we can conclude that functional programming can be applied to all 5 SOLID principles from a data engineering standpoint, but admittedly, has its distinct interpretation of these principles which are considerably different from OOP. The difference may present a more steep learning curve for those more familiar with OOP.
While functional programming can be applied to the SOLID principles, itās important to point out not every project would require strict compliance with these principles. Only the specific requirements of the project should guide the relevance of each principle, which means engineers should use their professional discretion to determine the appropriate level of adherence to ensure all code used is easy to maintain, read, and still performs as expected.
Feel free to reach out via my handles: LinkedIn| Email | Twitter