In my first blog about data, I discussed several ways to ensure the collection of “good” data and how to avoid “bad” data. In this blog, I will dive further into some of the mechanics of data storage. The goal of data storage is to allow for further growth and analysis of data. As information becomes available or new needs or uses for the data arise, more data collection is required. Following a few basic rules for good data storage will allow you to review and respond to questions regarding the data, ensure the data is complete and check data accuracy, all of which will lead to informed decision making.
When collecting and organizing the data, be sure to create specific uses for data fields and avoid inputting too much information in one place. The latter is the “death by text field” scenario, where various types of information are captured in a text field. This may give the reader context but ultimately will prevent the data from being used beyond a simple report of that text. This is one of the main issues we see when we receive data for analysis or reporting. If multiple pieces of information are combined into one field, that data simply cannot be used in practical ways, but there are a few simple tips you can follow to avoid this pitfall. The first is to separate out key fields of data rather than combining them with text. If a description is needed, create a separate text field for that information. The second is to track ranges of data in separate fields rather than one. For example, if tracking a date range of a person’s employment, store the start date and end date separately from each other and separate from any contextual explanations.
By following these tips when tracking information, you can easily search or sort by these fields. You can also use them for calculations such as finding an average length of time or the earliest date. If done correctly, reporting a person’s earliest date of employment is no longer an arduous task.
Always create a data storage infrastructure that allows for one-to-many relationships – where multiple data points can be associated with any one data point. This structure allows you to search, extract and analyze information regarding any of the data points. The previously mentioned example of tracking a person’s employment history is a good example of when to implement a one-to-many relationship. For example, most people have worked at multiple locations throughout their life. Instead of only capturing the earliest and latest dates of employment across all jobs, the one-to-many data structure allows you to capture the years of employment at each location, providing more detail about each employment record which can ultimately result in a more thorough analysis.
When the details are broken out, you have more flexibility in your analysis. For example, if the person’s first job was in a location you wish to exclude from your analysis, the one-to-many data structure allows for this restriction. If the employment dates at each location are not entered separately, you would not have the flexibility to do this. Additionally, if the dates are captured separately, you can perform robust calculations on the information such as finding the earliest date of employment or average amount of time spent at various types of jobs. Having the data broken out into multiple records allows you to be more flexible in your reporting than if you had only captured one date range.
Rarely does anyone capture data while they actually perform data analysis. Most of the time, the collection happens well in advance. At the time when you are performing your calculations, you may not remember where specific data originated. How many times have you been in the situation where you had additional questions about your data or wanted to consult the original source again? My final recommendation is to track the source of your data or link the original document to the data at the time the data is collected.
Tracking the source serves many purposes. First, linking the source allows you to easily go back to read the document for context or to answer any additional questions. If you find conflicting data to an existing data point, having the documents linked allows you to evaluate the sources of those two data points to determine if one is more credible than the other.
Second, it allows you to easily add in more information, should that become necessary. Perhaps your project has evolved, and new data fields are now necessary. If you tracked the source document used for your data input, you can easily return to the documentation and collect the newly required information for the data set.
And lastly, it allows you to maintain easy access to your data sources and documents to quickly and easily reference or produce, should that become needed. Sometimes, there is a need to share data with others (perhaps an outside auditing firm or an opposing party with whom you shared an analysis). In these situations, they may want to review the documents you used and validate that your data is correct. If you tracked and linked your sources, you can easily perform this function. A little forward thinking and effort can save significant time on the back end.
As I’ve said before, data is very powerful when used correctly. The hard part is organizing the data in a manner that allows you to use it. But, once the data is well organized, you are ready to analyze the data – the fun part! There are numerous impressive tools to help you analyze the data, whether you are a novice or a seasoned professional. Stay tuned for part 3 of this blog series for a discussion on data analysis and reporting.
Never miss a post. Get Risky Business tips and insights delivered right to your inbox.
Carrie Scott is KCIC’s technology lead, both in operations/infrastructure and for development. “I work with a talented group of people to make sure our technology stays innovative and top of the line to support our client’s needs,” she says. “I also focus on the Consulting side of our practice, leading many clients through their day-to-day and long-term strategic goals.”
Learn More About Carrie