tracetest: [EPIC][Error Handling] Test Run Page Error Handling Improvements

As we come across different user sessions, the team has identified multiple areas of opportunity regarding the error handling messaging when executing a test run.

Currently, a test run goes through three significant steps:

  1. Trigger execution
  2. Trace fetching
  3. Test spec execution

Each step has its own set of success and failure scenarios that need to be appropriately displayed to the user. Today, Tracetest uses only two fields from the test run to validate possible errors.

  • lastErrorState which contains the string info for the last known error.
  • state which controls the status of the test.

This was a good starting but now is not sufficient for the clients (CLI/UI) to display enough information so the user can understand how to fix potential problems. As well as providing good user feedback on what is the serverside executing at any given time.

In this case, we have identified a matrix of possible scenarios depending on the test run state, results, and what we should be displaying to the user.

Test Run Flow Chart

flowchart TD
    A[Run] --> B[Created]
    B --> C[Resolve Trigger Vars]
    C --> D[Execute Trigger]
    D --> ET{Is Successful Trigger}
    ET -->|Yes| E[Queue Polling]
    ET -->|NO| ES[Set State to Failed]
    ES --> Q
    E --> F[Execute Polling Job]
    F --> G[Fetch Trace from Data Store]
    G --> H{Trace Exists}
    H -->|No| I{timed out config reached}
    H -->|Yes| J{Has the span # changed}
    J -->|Yes| G
    J -->|No| K[Trace is ready]
    I -->|No| G
    I -->|Yes| L[Trace fetch failed]
    K --> O[Generating Outputs]
    O --> P[Running Test Specs]
    P --> Q[Finish]
    L --> ES

State Matrix for Test Runs

CREATED TRIGGERING CONNECTING_TO_DATA_STORE POLLING_TRACE GENERATING_OUTPUTS RUNNING_TEST_SPECS FINISHED
Successful Run Page Trigger response data - body - timing - headers Signal of successful connection to data store Trace Outputs Test Specs results Trigger/Trace/Test
Failed Failed Page Breakdown of the trigger problem - DNS connection - Queue connection - Auth problems Similar to the test connection endpoint show breakdown of issues Breakdown of the trace fetching with Reason of the error Warning that the generation of the output failed And the reason why Failed Test specs Global Failed state
In Progress Loading state Loading state with trigger steps Loading state Similar to the server output - Polling iteration # - # of spans - Reason for next iteration Loading state Loading state Loading state

Tickets and Tasks

Follow-up release

Nice to have

  • [Error Handling] Event Log text version
  • [Error Handling] Mode bar live status

Mockups

https://www.figma.com/file/LBN4SKVPq3ykegrPKbHT2Y/0.8-0.9-Release-Tracetest?node-id=1994-32394&t=5M47CI4J8VFbgit2-0

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15

Most upvoted comments

Yes yes. This is our second to the top priority, only behind knocking out the configuration work which I want the team to swarm on as it is blocking other activity. If we get to a spot where @jorgeepc or @xoscar do not have an area they can contribute to the config changes, we will want to focus on this.

Hello every one, here’s my take on what should be added to the test run page to improve the user experience

Acceptance Criteria: AC1 As a user looking at the test run page And I just ran the test And the test failed in the initial trigger request (HTTP, GRPC, etc…) I should be able to see a breakdown of the error and steps that occurred prior to seeing the error

AC2 As a user looking at the test run page And I just ran a test and the initial request worked as expected And the app is trying to fetch the trace I should be able to see a description of what the app is doing in the background, things like:

  1. What # of polling retry is it
  2. In what state is the polling (waiting, polling, failed)
  3. Recent errors or reasons why a new poll was triggered (even if the trace was already found)

AC3 As a user looking at the test run page And I just ran a test and the initial request worked as expected And the app failed to fetch the trace I should be able to see a proper error description of what happened, what was done to try to fetch the trace And I should be able to see the initial request/response details

The idea with this is to allow users to have easier ways to debug what’s happening within the system, if we found a problem or if something else is happening. This can also help them tweak their polling settings to have the best result for them

CC: @olha23

Added some comments, if we are in the clear about config stuff I will start working on this Monday morning!