Functional programming in data engineering with Python - Part 2

Preface🚀

💡In part 1 of our "Functional programming in data engineering with Python" series, we explored some of the principles and techniques that govern functional programming using Python code examples💥🐍.

💡In this post, we dive a little deeper into some of the key concepts mentioned to shed more light on the areas that will support your data engineering projects 🔋🔊.

Key concepts of functional programming

  • First-class functions 🥇- every function is a first-class citizen

  • Type system 🏷️- every function is considered a specific type and not a class

  • Modularity 💎 - every function is focused on one responsibility or purpose

  • Reusability ♻️- every function can be integrated with (or into) another function

1. First-class functions🥇

This states that every function is a first-class citizen. This means first-class functions are

  • passed into the parameters of other functions (inputs)

  • returned as results/values from other functions (outputs)

  • assigned into variables (standalone objects)

So in short, as long as a function behaves like a variable or value would, it qualifies as a first-class function. Let’s explore this thought further:

Code examples

Violation❌

If a function is not treated as values, it may not be considered a first-class function. If you're not able to pass another function into another function or pass functions into variables, it does not qualify as a first-class function.

Let’s observe an example of what is NOT a first-class function:

# Specify function for creating laptops
def create_laptop(laptop_type):
    return f"Laptop for {laptop_type} created successfully"


# Specify function for creating work laptops
def produce_work_laptop():
    result = create_laptop("work")
    print(result)

# Create a laptop for work
produce_work_laptop()
### Output 

# Laptop for work created successfully

This example misses the opportunity to utilize the concept of first-class functions by hard-coding the create_laptop function into the produce_work_laptop function, instead of passing the create_laptop function as a parameter.

Satisfaction✅

So let’s rewrite it to make it a first-class function:

# Specify function for creating laptops
def create_laptop(laptop_type):
    return f"Laptop for {laptop_type} created successfully"


# Specify function for creating laptop for work
def produce_work_laptop(create_something, laptop_type):
    return create_laptop(laptop_type)

# Create a laptop for work
print(produce_work_laptop(create_laptop, "work"))
### Output 

# Laptop for work created successfully

This time the create_laptop function is passed into the produce_work_laptop function as a parameter, which utilizes the concept of first-class functions properly.

2. Type system🏷️

In functional programming, a type system is a set of rules for managing the types of variables found in a program. Every function (and value) has its types checked at compile time (the time when the Python interpreter checks your code for syntax, grammatical or logical errors before running your code) to reduce the chances of bugs and unexpected behaviour running in the code, and so, therefore, every function is considered a specific type, not a class.

There are two categories of type systems:

  1. Static - this forces you to declare the expected type of a variable before assigning values to it. An error is thrown at compile time if you assign a value that does not match the type enforced to the variable. Static typing is useful for automation activities like filling out online forms.

  2. Dynamic - this allows you to assign values to a variable without the need to enforce a type to it. No error will be thrown even when assigning values that do not match the expected data type.

Python is primarily a dynamic-type programming language but has introduced optional static typing features through tools like mypy, a module for matching the type annotations specified against the types generated from the coding output’s values, to enhance the data integrity and quality of the functions used in a program

To combine static and dynamic typing in Python, you can use themypy module to examine the type annotations assigned to the variables in your code, which can be installed from the pip package manager.

You can download it to your virtual environment via the terminal, using:

pip install mypy

…then you run your Python (.py) file using mypy, like so:

mypy my_main_file.py

This will perform a static annotation check on your file to confirm if the type annotations assigned to your functions and variables match the actual values read into them.

If there are any violations detected, an error like this can be returned:

Found 1 error in 1 file (checked 1 source file)

…but if there are no type annotation issues, this is what you get:

Success: no issues found in 1 source file

Code examples

Violation❌

Examples of violating the type system concept would be when the function parameters and/or function return values are not enforced.

# Create function for ordering chicken wings
def order_chicken_wings(flavour, quantity):
    return f"You have ordered {quantity} {flavour} chicken wings "

# Place the order
print(order_chicken_wings(10, "BBQ"))
### Output

# You have ordered BBQ 10 chicken wings

This example doesn’t enforce any types for its input parameters, which leads to a logical error when the order of the arguments is swapped, as evident in the output - the order_chicken_wings function was expecting the flavour argument (str) to be filled first, followed by the quantity argument (int). Instead, the opposite took place.

Satisfaction✅

To rectify this, we can add type annotations to our order_chicken_wings function:

# Create function for ordering chicken wings
def order_chicken_wings(flavour:      str, 
                        quantity:     int) -> str:
    return f"You have ordered {quantity} {flavour} chicken wings "

# Place the order
print(order_chicken_wings("BBQ", 10))
### Output

# You have ordered 10 BBQ chicken wings

Now the type annotations added can help any developer understand the expected argument types

just by reading the order_chicken_wings function and reduce the chance of logical errors occurring, like the blunder we made before.

More details on the violation and satisfaction of these can be found below under the product type code examples.

3. Modularity💎

Modularity involves splitting code into small, independent functions where each function is focused on a single responsibility in the codebase.

Good function composition is built on top of modularity because it is easy to chain modular functions together to form a new one. It is also easier to test, maintain and debug because the functions are split into their self-contained modules that are independent of others.

Code examples

Violation❌

# Specify function for operating the home appliances

def operate_home_appliances():
    print("Configuring the washing machine...")
    print("Charging the tablet...")
    print("Switching the oven on...")
    print("Switching channels on the TV...")


# Operate the home appliances
operate_home_appliances()
### Output

# Configuring the washing machine...
# Charging the tablet...
# Switching the oven on...
# Switching channels on the TV...

The operate_home_appliances function is handling multiple responsibilities by itself, like operating the washing machine, tablet and oven at the same time. This conflicts with the concept of having modular functions in the program. When functions have too many responsibilities, it becomes more difficult to read, understand and debug the code.

Satisfaction✅

Let’s fix the code violation by splitting the responsibilities into their modules:

# Create the function for operating the washing machine
def operate_washing_machine():
    print("Configuring the washing machine...")



# Create the function for operating the electronic tablet
def operate_electronic_tablet():
    print("Charging the tablet...")

# Create the function for operating the oven
def operate_oven():
    print("Switching the oven on...")

# Create the function for operating the TV
def operate_tv():
    print("Switching channels on the TV...")


# Operate the home appliacnes
operate_washing_machine()
operate_electronic_tablet()
operate_oven()
operate_tv()
### Output

# Configuring the washing machine...
# Charging the tablet...
# Switching the oven on...
# Switching channels on the TV...

Now each function has its duty it is maintaining - the operate_washing_machine, operate_electronic_tablet, operate_oven and operate_tv functions each now have a specific operation about a single home appliance, making the functions serve as self-contained units that are easy to call independently when required flexibly.

4. Reusability♻️

The motivation behind reusability is to create functions in a way that can be reused across many areas of a program’s code base.

By doing this, we reduce redundant code and the chance of duplication to occur and therefore increasing efficiency and maintainability in our development workflow.

Code examples

Violation❌

Although the previous concept encourages the use of modularity, if a function is too specific to a certain domain, it forces us to rewrite it in a way it applies to the context we need it to be used for, which causes us to violate the concept of reusability.

As data professionals, these are moments where we use our professional discretion to measure how strictly we adhere to each guiding principle laid out in the programming styles we adopt for our projects. And of course, this applies to the other concepts and principles mentioned in my blogs.

Nevertheless, here’s an example of a violation of this:

# Create the function for calculating profits for Q3
def calculate_profit_for_Q3(sales: int, costs: int) -> int:
    return sales - costs



# Calculate profit for Q3
Q3_profits = calculate_profit_for_Q3(1000, 300)
print(f"Q3 profits: {Q3_profits}")
### Output

# Q3 profits: 700

The calculate_profit_for_Q3 function is specifically designed to calculate the profits generated for Q3, which is fine…but only for Q3. This means if we need to calculate profit for the other quarters (i.e. Q1, Q2 & Q4), we would need to modify the calculate_profit_for_Q3 function. This raises the chances for unexpected bugs to be introduced to our code, and therefore not in harmony with the reusability principle.

Satisfaction✅

The best way to manage this is to create a more general function, like calculate_profit, which can handle the calculation of profits for any quarter:

# Create the function for calculating profits
def calculate_profit(sales: int, costs: int) -> int:
    return sales - costs



# Calculate profit for Q1 - Q4
Q1_profits = calculate_profit(500,  800)
Q2_profits = calculate_profit(750,  900)
Q3_profits = calculate_profit(1000, 300)
Q4_profits = calculate_profit(1500, 400)



# Display profits for Q1 - Q4
print(f"Q1 profits: {Q1_profits}")
print(f"Q2 profits: {Q2_profits}")
print(f"Q3 profits: {Q3_profits}")
print(f"Q4 profits: {Q4_profits}")
### Output

# Q1 profits: -300
# Q2 profits: -150
# Q3 profits: 700
# Q4 profits: 1100

By introducing the calculate_profit function, we can use it in any context where calculating profit is concerned. Even if we decide to go more granular and calculate profits by periods instead of quarters, we can use the same function to achieve this, therefore introducing more flexible usage and less chance of errors sneaking into our program.

Function composition🧬

Function composition occurs when two or more functions are chained together to create a new one. This is the principle of functional programming that enables developers to create production-grade applications that are easy to manage over time.

In functional programming, monoids are created when functions are composed together. A monoid is an object that contains the following properties:

In functional programming, the output of one function must become the input of another function, enabling functions to work together seamlessly. A monoid is an object that achieves this while possessing these properties:

  • Closure🔐 - the results share the same type as the input parameters

  • Identity👤 - any value added to the function which doesn’t change its answer or state

  • Associativity🔗 - a function’s values can fit together in any order and return the same results each time

You can read more on monoids in the 1st part of this series here.

Code examples

Violation❌

# Create the function for checking cinema ticket availability
def check_ticket_availability(movie):
    if movie in ["Inception", "Fast and Furious 9", "Avengers: Infinity War"]:
        return True
    else:
        return False

# Create the function for booking the cinema ticket if it's available
def book_cinema_ticket(is_available):
    if is_available:
        return "Now booking your ticket..."
    else:
        return "Unfortunately there are no tickets for this movie available in your cinema..."


# Check the movie status
movie_status = check_ticket_availability("Spiderman: No Way Home")

# Check the ticket status
cinema_ticket_status = book_cinema_ticket(movie_status)
print(cinema_ticket_status)
### Output 

# Unfortunately there are no tickets for this movie available in your cinema...

Great…not only are there no tickets for Spiderman: No Way Home available in our cinema, we also happen to not apply function composition, at least not correctly. The check_ticket_availability and book_cinema_ticket functions are not chained together to form a new function, thereby violating the principle of function composition.

Satisfaction✅

Let’s chain the check_ticket_availability and book_cinema_ticket functions together to fix this:

# Create the function for checking cinema ticket availability
def check_ticket_availability(movie):
    if movie in ["Inception", "Fast and Furious 9", "Avengers: Infinity War"]:
        return True
    else:
        return False

# Create the function for booking the cinema ticket if it's available
def book_cinema_ticket(is_available):
    if is_available:
        return "Now booking your ticket..."
    else:
        return "Unfortunately there are not tickets for this movie available in your cinema..."


# Check if the cinema ticket is available
cinema_ticket_status = book_cinema_ticket(check_ticket_availability("Spiderman 3: No Way Home"))
print(cinema_ticket_status)
### Output 

# Unfortunately there are no tickets for this movie available in your cinema...

In this example, the check_ticket_availability and book_cinema_ticket functions are now composed together; the output of check_ticket_availability is directly passed into the book_cinema_ticket as an input argument to demonstrate function composition in a proper manner, which satisfies the principle in the process…even if the tickets are still out of stock🙄.

Algebraic type system🧮

In functional programming, an algebraic type system is an approach used for composing types together to form more robust ones.

Let’s paint an analogy - imagine you have a bag of sweets. Each sweet in the bag can have a mixture of fillings like caramel and nuts together (product type). Now, each sweet can either be chocolate or not chocolate (sum type).

Now let's shed more light on this further💡:

1. Product types✖️📦

The product types system chains existing types together to form a brand new one. Think of it like the “AND” operator for type compositions. Tuples are usually the type associated with activities that perform product-type operations.

Code examples

Violation❌

Here's an example of what a product type violation looks like:

from typing import List, Tuple

# Set the function for making pizza
def make_pizza(name:        str, 
               toppings:    List[str], 
               size:        int) -> Tuple[str, int, List[str] ]:

    return (name, toppings, size) 


# Create constants for pizza
my_pizza_name              =   "Pepperoni-special"
my_pizza_toppings          =   ["pepperoni", "cheese", "olives"]
my_pizza_size              =   15

# Display pizza 
final_pizza     =   make_pizza(my_pizza_name, my_pizza_toppings,  my_pizza_size)
print(f" I am making a '{final_pizza[0]}' with the following toppings: {final_pizza[1]}. The pizza size is {final_pizza[2]} inches. '   ")
### Output

### I am making a 'Pepperoni-special' with the following toppings: ['pepperoni', 'cheese', 'olives']. The pizza size is 15 inches. '

At first glance you may be wondering, “I don’t see any obvious issues with this example…”, but if you notice the ordering of the expected type annotation for the output - within a Tuple type, it is ordered as

  1. str - (represented as the pizza’s name)

  2. int - represented as the pizza’s size

  3. List[str] - represented as the toppings on the pizza

Here’s what mypy would return if validation checks are conducted:

Found 1 error in 1 file (checked 1 source file)

If you compare the output’s type hint to the actual output generated, you may notice the discrepancy this time. The actual position of the toppings is before the size, although this isn’t the case with the expected type annotations.

Product types require the actual output order to match the ordering set by the expected return type’s annotation, otherwise, it will return an error when running through a type validation checker, like mypy.

Satisfaction ✅

Now here's what it should look like:

from typing import List, Tuple

# Set the function for making pizza
def make_pizza(name:        str, 
               toppings:    List[str], 
               size:        int) -> Tuple[str, List[str], int ]:

    return (name, toppings, size) 


# Create constants for pizza
my_pizza_name              =   "Pepperoni-special"
my_pizza_toppings          =   ["pepperoni", "cheese", "olives"]
my_pizza_size              =   15

# Display pizza 
final_pizza     =   make_pizza(my_pizza_name, my_pizza_toppings,  my_pizza_size)
print(f" I am making a '{final_pizza[0]} with the following toppings: {final_pizza[1]}. The pizza size is {final_pizza[2]} inches. '   ")
### Output 

# ('Pepperoni-special', ['pepperoni', 'cheese', 'olives'], 15)

We’ve changed the order of the outputs to match the return type annotation to look like this:

  1. str - (represented as the pizza’s name)

  2. List[str] - represented as the toppings on the pizza

  3. int - represented as the pizza’s size

…and the code results are rectified!

Although the amendment was minimal, it makes the validation checker happy:

Success: no issues found in 1 source file

So the process of reconciling the expected type annotations with the actual result’s sequence is what we call pattern matching. Pattern matching is the process of checking a sequence or structure against a pattern to ensure code consistency is realised. It’s worth noting that pattern-matching is not limited to reconciling the sequence of type annotations, it can also encompass matching characters, regular expressions, and data structures of inputs and outputs.

2. Sum types➕🎲

The sum types approach allows us to create a new type by selecting from a range of optional types.

So in relation to a function, your input argument can at least be either type A or type B to be considered a sum type. This is why a sum type is considered to behave like the “OR” operator in an algebraic type system.

Code examples

Let’s say we want to sell the pizza now, and we’ve just run out of pizzas smaller than 10 inches. What would this look like with sum types?

Violation❌

Here's an example of what this would NOT look like:

from typing import Callable 

# Create custom typer for function parameter
PriceCalculator = Callable[int, int]

# Calculate the price of pizza based on pizza size
def sell_pizza(price_calculator:    PriceCalculator,
               size:                int) -> int:
    return price_calculator(size)


# Calculate regular pizza price
def calculate_regular_pizza_price(size: int) -> int:
    return size * 3


# Calculate discount pizza price
def calculate_discount_pizza_price(size: int) -> int:
    return size * 1.5 if size >= 15 else "Discounts do not apply for pizza smaller than 15 inches"



# Display pizza prices
print(sell_pizza(calculate_regular_pizza_price, 12))
print(sell_pizza(calculate_discount_pizza_price, 10))
### Output

# 36
# Discounts do not apply for pizza smaller than 15 inches

The type annotation for the sell_pizza function declares only one type is returned by this function, int, which does not satisfy the sum types concept because a sum type operation states that there could be multiple potential return types for a function, not just one.

Satisfaction✅

Now let’s consider an example that showcases sum types correctly:

from typing import List, Union, Callable

# Create custom type for function parameter
PriceCalculator = Callable[int,  Union[int, str] ]

# Calculate price of pizza based on pizza size
def sell_pizza(price_calculator:    PriceCalculator, 
               size:                int) -> Union[int, str]:
    return price_calculator(size)



# Calculate regular pizza price
def calculate_regular_pizza_price(size: int) -> int:
    return size * 3


# Calculate discount pizza price
def calculate_discount_pizza_price(size: int) -> int:
    return size * 1.5 if size >= 15 else "Discounts do not apply for pizza smaller than 15 inches"



# Display pizza prices
print(sell_pizza(calculate_regular_pizza_price, 12))
print(sell_pizza(calculate_discount_pizza_price, 10))
### Output

# 36
# Discounts do not apply for pizza smaller than 15 inches

The sell_pizza function can either return either an integer that represents the size of the pizza, or it can return a string that indicates discounts do not apply to any pizza smaller than 15 inches.

It does this through the Union[int, str] return type annotation, which declares the sell_pizza function returns a result either as an int or str type.

Best practices

Keep in mind some of the techniques used in functional programming to enhance your code quality.

  • Immutable data- leverage the power of immutability by performing conditional actions without changing the state of your data

  • Type systems - adopt the use of static type systems to catch errors in your logic as early as you can

  • Pure functions - utilize functions that will always produce the same outputs, provided they are given the same inputs, and also have no side-effects

You can find the other techniques in the 1st post of this series here.

Feel free to reach out via my handles: LinkedIn| Email | Twitter