I work with a lot of data. Financial data like revenue and costs, business data such as customer retention and churn, and product usage analytics. One of the most difficult aspects of my job is to convince stakeholders that the data is legitimate and trustworthy.
Although we use modern tools and systems like Segment, Mixpanel, and Mode Analytics, those are only tools to produce, access, and visualize data.
A typical data analysis process looks something like this:
- Step 1: Define Your Questions
- How many customers created a Dashboard this month and then canceled?
- Step 2: Set Clear Measurement Goals
- How has that number changed month-to-month, by cohort, or by subscription plan? Is it a positive change?
- Step 3: Collect Data
- Time to write some queries (if you have the data)
- Additional tracking may be required…
- Step 4: Analyze Data
- Apply descriptive statistics and data cleaning methods to summarize the results.
- Create some great looking charts
- Step 5: Interpret Results
- Apply domain knowledge
This process works well when it’s been established, leaving little room for interpretation. Where I often found the process breaking down was during the crucial collect and analyze steps.
Although I personally have a solid understanding of my process and methods used to deliver data, that process was opaque to the end user. Without understanding how I came to the results, the data are not trustworthy.
The RADS framework was created to address the concern with data accuracy and scalability. Like with any agile process, my plan is to implement and test RADS while still in development.
RADS is an acronym for Repeatable, Accurate, Dependable, and Scalable.
Data that represent these qualities have a better chance of being a trustworthy and reliable source of insights for your business.
Let’s dive into each of the qualities.
The process to retrieve data should be repeatable while providing consistent results.
Each task required to fetch this data should have these characteristics:
- A well-defined order of operations.
- Can be visualized in a flow chart or a numbered list.
- Simple to understand by a person with limited domain expertise.
- A SQL query should be well documented and easy to execute, even if the user is not a SQL expert.
- Be efficient and not use significant computing resources.
- Is your code well written? Are you running locally or on a cluster?
- Prevent the introduction of errors or bugs into the systems.
- Read-only is your friend
- Do the heavy lifting up front, then write clean data to new tables for use with simplistic queries
The data queried should be an accurate representation of the source data.
Qualities of accurate data include:
- The sample data is a normally distributed.
- If we query events for only 50 customers, do those customers represent our total customer base?
- The data does not contain unaccounted for null values.
- Data such as plan price (for paying), account id, or customer status should never be null.
- Null values indicate no activity ever occurred
- Results can be reverse engineered.
- If we have a result, we can view the data that was used to get to that result.
Systems break. Code has bugs. Data processes should be dependable and have built-in redundancies if possible.
Qualities to achieve for dependable data include:
- The data is always available
- The source of data is stored in perpetuity
- Contingency plans exist for data loss & restoration
- Data can pass rigorous audits
When the process to collect data is Repeatable, Accurate, and Dependable is has a better chance of being a scalable process.
After initial data scoping and implementation, ongoing data usage and maintenance should use little to no resources.
The methods used to collect and analyze data should have these qualities:
- Be automated if possible
- Require little user interaction to view data results
- Require little user input to change variables
- Be easily accessible
- Run quickly and produce readable results
- Be simplistic and well documented
Time to Implement
That’s a quick summary of the RADS framework. From here the process will be developed and tested to ensure the data produced not only provides great insights but instills trust.