Skip to content

Test Interpretation

This test coverage report is for our python code which processes the research and practicum project datasets into the format appropriate for visualization and through Tableau. We use python and various packages to clean the data, validate the geographic accuracy of the data, and merge the datasets.


Our test assumes the dataset are in a specific format, for example, practicum dataset should contain a total of 5 columns in the following order: ‘city’, ‘state’, ‘country’, ‘organization’, ‘domestic or global.’ Functions including process_practicum are written with this assumption. Similarly, the research dataset should also be in a specific format (the format is shown in mockResearch.cvs and mockResearch2.xlsx file) and functions such as process_research, location are only appropriate if this assumption is met.


Our test results represent a healthy test suite with 100% test coverage. We check different kinds of possible errors within the data and test whether the dataset generated through the program meets the visualization requirement and corrects the errors. We separate every function for different adjustments, for example, for the function called location, we test whether the function correctly splits the location variable in research datasets into four variables: ‘county’, ‘city’, ‘state’, and ‘country’ and whether it validates the ‘latitude’ and ‘longitude’. If the ‘latitude’ and ‘longitude’ are pointing to a different country from the ‘location’ variable; then, approximate ‘latitude’ and ‘longitude’ based on the ‘location’ variable will replace the incorrect ‘latitude’ and ‘longitude’. If the ‘country’ matches with the ‘latitude’ and ‘longitude’, we will use the given ‘longitude’ and ‘latitude’ to fill in missing location information such as ‘city’.


Data on practicum projects already contain city, state, country columns but do not have any values for latitude and longitude. Through the process_practicum function, we validate these location information; for example, if it is Orange County, North Carolina, United States, then the data is valid; if it is Orange County, North Carolina, Japan, the data is invalid and will be taken out. For the validated rows, the function fills in approximate latitude and longitude so that the project appears on the map. 


Furthermore, for all missing data, the functions imputed those values with ‘Others’, so that ‘null’ does not appear as an option on the dashboard. We also tested merging of the research datasets and practicum dataset, which generate the final dataset used for visualization.


These are the two main aspects we test, focusing on processing two different datasets: research and practicum. However, the unhealthy part is we didn’t consider all the incorrect situations, only covering the errors that existed in the datasets provided by our client. Maybe there are some more aspects that need to be cleaned, but the current processing is enough.


We believe all the python functions we wrote are worth testing, because each function is a step of cleaning data, and we must make sure all the steps are correct and the final rows and columns follow our expectation. 


The top testing priority of our code is the consistency of the location information and the geographic coordinates. Many rows have wrong coordinates. In our data visualization, we use a map to show all the projects and their distribution around the world. When a north carolina project is located in Brazil, the map can be messy. Thus, we do a lot in checking the correctness and modifying the latitude and longitude to match the country. This directly affects the appearance on Tableau, so we have several steps within the function to test this.