What is Mocking in Unit Testing: A Data Scientist’s Perspective Explained
My recent journey into the world of mocking in unit testing has been a true eye-opener as a data scientist. Although I’m relatively new to the subject, I had the opportunity to apply mocking in an ideal use case, and it has inspired me to share my insights into what mocking means and its significance in unit testing.
This article will discuss an overview of software testing, an explanation of unit testing, and the importance of mocking in unit testing. Afterward, we’ll discuss three practical use cases to further strengthen your understanding of how mocking is applied in unit testing. I’ve uploaded the Python scripts used for illustration in this article to this dedicated repository to enhance comprehension and clarity.
Before we delve into the article, I’d like to mention that I’m a data scientist, not a software testing or quality assurance professional. Hence, the contents of this post are based on my perspective and interpretations, which have been shaped by research.
What is Software Testing?
Software testing involves verifying and evaluating whether an application or product functions as intended. It serves as a vital process to assess the quality of applications, identifying and resolving bugs, errors, or defects that could cause significant malfunctions or compromise interconnected systems.
The true significance of software testing lies in its ability to mitigate the risk of software failures, ensuring the overall quality and reliability of applications. Software testing is typically carried out by software testers or quality assurance professionals. However, it can involve collaboration with developers, and in some cases, developers may be involved in testing specific components or functions they’ve created.
Software testers are trained to identify previously unseen security vulnerabilities, inappropriate functionalities, poor design decisions, missing requirements, etc. Developers often use their valuable feedback to improve applications before deployment.
There are various types of software testing, each with its specific strategies and objectives. This article will focus on unit testing and its relevance, particularly in the context of mocking.
What Does Unit Testing Mean?
Unit testing is a software testing technique that focuses on testing the individual units or components of an application in isolation. The primary aim is to ensure that each component performs its designated function flawlessly before being integrated into the larger system.
From a data scientist’s perspective, unit testing involves checking the robustness and correctness of smaller, individual functions or methods that play crucial roles in data processing, feature engineering and selection, model training and evaluation, analysis pipelines, etc.
These tests guarantee that each function operates as intended, handles edge cases appropriately, and returns the expected object type, shape, and output. Additionally, unit tests verify that functions appropriately handle exceptional cases, such as invalid parameters, outliers, missing values, or unexpected data formats. When functions encounter such issues, unit tests can validate whether the functions return the correct error messages as expected or raise appropriate exceptions.
By conducting unit tests, data scientists can become more confident in the reliability of individual functions, leading to more accurate and reliable data analysis and machine learning workflows.
While some functions operate independently without external dependencies, others rely on external dependencies/components for proper functioning. In such cases, replicating the behaviors of these dependencies becomes essential. This is where mocking becomes useful.
The following sections discuss what mocking entails, why it’s crucial in unit testing, and some practical use cases for using mocking in unit testing. Code examples can be found in this repository.
What is Mocking in Unit Tests?
In simple terms, mocking refers to the process of replicating or imitating something. It’s often used in unit testing and involves substituting complex objects or external dependencies within a function with controlled replacement objects that simulate the behavior of the actual ones.
There are several benefits of using mocking in unit tests. Firstly, it lets developers separate code issues from dependency issues, allowing them to focus independently on testing their code logic. Secondly, mocking liberates developers from the constraints imposed by performance-sensitive or unpredictable external dependencies, such as network problems or variations in traffic.
To better understand mocking, consider an example where you need to test a function that retrieves records from a database table, performs data manipulation, and returns a pandas DataFrame. Instead of executing actual database calls during unit testing, you can simulate the behavior of these calls and solely focus on verifying the function’s data manipulation and DataFrame output.
By employing mocking, developers create predictable and controlled environments for unit testing, enabling them to isolate and validate specific functionalities within a function without relying on external services. This approach improves testability, streamlines testing, and promotes the development of more robust and reliable code.
First Use Case — Mocking a Request to an External Website
## This function provides a way to fetch data from a URL and return it as a DataFrame, with an added check to handle the case where the request is not successful.
def get_data_and_return_dataframe(url):
# Make the GET request
response = requests.get(url)
# Check the status code to ensure the request was successful
if response.status_code == 200:
# Convert the response JSON to a DataFrame
data = response.json()
df = pd.DataFrame(data)
return df
else:
return ("Request failed with a status code other than 200")
The get_data_and_return_dataframe function within “utility.py” simulates a request to an external website. The function initiates a GET request to a URL using the “requests” library. If the request is successful and the response status code is 200 (indicating a successful response), the function converts the JSON content to a pandas DataFrame and returns the resulting DataFrame. However, if the status code is not 200 (indicating a failed request), the function returns the string “Request failed with a status code other than 200”.
In the TestGetDataAndReturnDataframe class within “test.py,” we have two test cases: test_get_data_and_return_dataframe_success and test_get_data_and_return_dataframe_failure.
test_get_data_and_return_dataframe_success:
This test focuses on verifying the behavior of the get_data_and_return_dataframe function when the request is successful. It configures a mock response object with a status code 200 and JSON content representing sample data. The main objective of the test is to ensure that the function appropriately converts the JSON content into a pandas DataFrame. Assertions are made to validate that the output returned is indeed a DataFrame and contains the expected columns.
test_get_data_and_return_dataframe_failure:
This test specifically examines the behavior of the get_data_and_return_dataframe function when the request fails. It configures a mock response object with a status code other than 200 (e.g., 404) to stimulate a failed request scenario. The goal is to confirm that the function handles such scenarios appropriately. In this case, the function is expected to return a specific string (i.e., “Request failed with a status code other than 200”) indicating a failure. An assertion is made to verify that the actual output matches the expected result.
By including these two test cases, we ensure that the get_data_and_return_dataframe function is thoroughly tested for failed and successful request scenarios. Mock responses allow us to simulate and control these scenarios without relying on the external website or making requests to the website. This approach enables us to create controlled test environments and validate the expected behavior of the function when interacting with external dependencies.
Second Use Case — Mocking SQLite Database Connection
## This function retrieves user data from a SQLite database based on the provided user ID.
def fetch_user_data(user_id):
conn = sqlite3.connect('users.db')
cursor = conn.cursor()
# Fetch the user data from the database
cursor.execute('SELECT * FROM users WHERE id = ?', (user_id,))
user_data = cursor.fetchone()
# Close the database connection
cursor.close()
conn.close()
if user_data:
user = {
'id': user_data[0],
'name': user_data[1],
'email': user_data[2],
'status': 'active' if user_data[3] else 'inactive'
}
return user
else:
return "User record not found"
In this example, the fetch_user_data function stimulates a database connection to retrieve user data. It establishes a connection to an SQLite database, fetches user data based on the user_id provided, and returns a dictionary with user information if found. Otherwise, the function returns the string “User record not found.”
The TestFetchUserData class within test.py contains two test cases: test_fetch_user_data_success and test_fetch_user_data_not_found.
test_fetch_user_data_success:
This test case verifies the behavior of fetch_user_data when the user record is successfully retrieved from the database. It configures mock objects for the database connection and cursor, providing a predefined user record that the mock cursor should return. This ensures the function correctly processes user data and returns the expected dictionary output. Assertions are made to validate the expected user data against the output. In addition, assertions were used to validate that the database connection and cursor were appropriately called.
test_fetch_user_data_not_found:
This test case examines how the fetch_user_data function behaves when there’s no record for the user_id provided. Like the previous test case, mock objects are used to imitate the database connection and cursor, returning None for the fetch operation. This is to check if the function returns None when user data is not found in the database. Assertions are made to validate the output against the expected result and to check that the database connection and cursor were appropriately called.
These two test cases enable us to test the fetch_user_data function for both not-found and successful scenarios, ensuring its accuracy and reliability in retrieving user data. By mocking the database and cursor connection, we were able to carry out controlled tests without relying on an actual database. We also confirmed that the function works as expected and handles different scenarios effectively.
Third Use Case — Mocking A Function Defined Externally
## This function invokes mocked_function and carries out additional operations on the resulting string.
def convert_to_upper():
# Call the external function
result = mocked_function()
# Perform other operations
processed_result = result.upper()
return processed_result
This basic use case demonstrates how to mock a function imported and used in a different Python module. The focus is on the “utility.py” module, which incorporates an external function called mocked_function, defined in “mocked_module.py.” Both convert_to_upper() and mocked_function() are kept simple in this example to illustrate how mocking works. However, in real-world scenarios, the functions could be much more complex, and mocked_function() will have complexities that we’re not interested in when testing convert_to_upper().
Within utility.py, the convert_to_upper() function invokes mocked_function() and performs additional operations on the resulting string. The goal is to test the behavior of convert_to_upper in isolation, disregarding the complexities of mocked_function.
To achieve this, we utilize the @patch decorator in the test.py module to mock the behavior of mocked_function(). By mocking the external function defined and used in utility.py, we gain control over its behavior and can simulate different scenarios during testing. This allows us to create controlled test environments and validate the expected behavior of the convert_to_upper() function without relying on the actual implementation of mocked_function().
The ability to mock external dependencies enhances the flexibility and reliability of our unit tests. It enables us to isolate the code under test, handle various edge cases, and thoroughly validate the functionality of convert_to_upper(). This is particularly useful when dealing with complex dependencies or external systems that we want to decouple from the unit test scenarios.
CONCLUSION
This article started by delving into the fundamental concepts of software testing, what unit testing means, and the meaning and significance of mocking. Afterward, we examined three compelling use cases of how mocking works in unit testing.
I hope the practical examples have helped you understand how mocking allows developers to write more robust and comprehensive tests. As you advance in your data career, the insights gleaned and concepts explained in this article will serve as a base for enhancing your testing practices and methodologies (i.e., ensuring code reliability and correctness, even in the presence of complex dependencies) and improving the quality of your data projects.
If you have any questions, experiences, and insights to share, I’d love to hear them. Don’t hesitate to contact me via email (enyoone3@gmail.com) or LinkedIn. Once again, you can view the code for the use cases explained in the article by clicking this link.
Happy coding and testing.