tracetest: [EPIC][Error Handling] Test Run Page Error Handling Improvements
As we come across different user sessions, the team has identified multiple areas of opportunity regarding the error handling messaging when executing a test run.
Currently, a test run goes through three significant steps:
- Trigger execution
- Trace fetching
- Test spec execution
Each step has its own set of success and failure scenarios that need to be appropriately displayed to the user. Today, Tracetest uses only two fields from the test run to validate possible errors.
lastErrorStatewhich contains the string info for the last known error.statewhich controls the status of the test.
This was a good starting but now is not sufficient for the clients (CLI/UI) to display enough information so the user can understand how to fix potential problems. As well as providing good user feedback on what is the serverside executing at any given time.
In this case, we have identified a matrix of possible scenarios depending on the test run state, results, and what we should be displaying to the user.
Test Run Flow Chart
flowchart TD
A[Run] --> B[Created]
B --> C[Resolve Trigger Vars]
C --> D[Execute Trigger]
D --> ET{Is Successful Trigger}
ET -->|Yes| E[Queue Polling]
ET -->|NO| ES[Set State to Failed]
ES --> Q
E --> F[Execute Polling Job]
F --> G[Fetch Trace from Data Store]
G --> H{Trace Exists}
H -->|No| I{timed out config reached}
H -->|Yes| J{Has the span # changed}
J -->|Yes| G
J -->|No| K[Trace is ready]
I -->|No| G
I -->|Yes| L[Trace fetch failed]
K --> O[Generating Outputs]
O --> P[Running Test Specs]
P --> Q[Finish]
L --> ES
State Matrix for Test Runs
| CREATED | TRIGGERING | CONNECTING_TO_DATA_STORE | POLLING_TRACE | GENERATING_OUTPUTS | RUNNING_TEST_SPECS | FINISHED | |
|---|---|---|---|---|---|---|---|
| Successful | Run Page | Trigger response data - body - timing - headers | Signal of successful connection to data store | Trace | Outputs | Test Specs results | Trigger/Trace/Test |
| Failed | Failed Page | Breakdown of the trigger problem - DNS connection - Queue connection - Auth problems | Similar to the test connection endpoint show breakdown of issues | Breakdown of the trace fetching with Reason of the error | Warning that the generation of the output failed And the reason why | Failed Test specs | Global Failed state |
| In Progress | Loading state | Loading state with trigger steps | Loading state | Similar to the server output - Polling iteration # - # of spans - Reason for next iteration | Loading state | Loading state | Loading state |
Tickets and Tasks
Follow-up release
Nice to have
- [Error Handling] Event Log text version
- [Error Handling] Mode bar live status
Mockups
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15
Yes yes. This is our second to the top priority, only behind knocking out the configuration work which I want the team to swarm on as it is blocking other activity. If we get to a spot where @jorgeepc or @xoscar do not have an area they can contribute to the config changes, we will want to focus on this.
Hello every one, here’s my take on what should be added to the test run page to improve the user experience
Acceptance Criteria: AC1 As a user looking at the test run page And I just ran the test And the test failed in the initial trigger request (HTTP, GRPC, etc…) I should be able to see a breakdown of the error and steps that occurred prior to seeing the error
AC2 As a user looking at the test run page And I just ran a test and the initial request worked as expected And the app is trying to fetch the trace I should be able to see a description of what the app is doing in the background, things like:
AC3 As a user looking at the test run page And I just ran a test and the initial request worked as expected And the app failed to fetch the trace I should be able to see a proper error description of what happened, what was done to try to fetch the trace And I should be able to see the initial request/response details
The idea with this is to allow users to have easier ways to debug what’s happening within the system, if we found a problem or if something else is happening. This can also help them tweak their polling settings to have the best result for them
CC: @olha23
Added some comments, if we are in the clear about config stuff I will start working on this Monday morning!