srt: [core] High CPU load on Windows

OS Host: Windows 10 64 bits SRT version: latest git (f5f52c02c27d7a8d7050a746eebee6e3d1b33aaa) Program: srt-live-transmit.exe (32 bits compile)

Commandline used: srt-live-transmit "srt://wowza_server....." udp://127.0.0.1:1234 -s:10

It starts using about 0.2% of CPU load. After some hours (8 or even more), it jumps to about 70% of CPU load. However memory usage keeps stable at 2 MB.

How can I capture the moment when this issue happens ? I have my toolchain ready for anything you want me to compile.

Regards

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 48 (13 by maintainers)

Commits related to this issue

Most upvoted comments

So just refining my suggested fix further, my local tests suggest that CTimer::getTime() doesn’t need to be epoch-based (this is obviously permissible as can be seen from the Apple code path to the rdtsc call at the top of the function), so I think CTimer::getTime() could just be like this:

uint64_t CTimer::getTime()
{
    // XXX Do further study on that. Currently Cygwin is also using gettimeofday,
    // however Cygwin platform is supported only for testing purposes.

    //For other systems without microsecond level resolution, add to this conditional compile
#if defined(OSX) || (TARGET_OS_IOS == 1) || (TARGET_OS_TV == 1)
    // Otherwise we will have an infinite recursive functions calls
    if (m_bUseMicroSecond == false)
    {
        uint64_t x;
        rdtsc(x);
        return x / s_ullCPUFrequency;
    }
    // Specific fix may be necessary if rdtsc is not available either.
    // Going further on Apple platforms might cause issue, fixed with PR #301.
    // But it is very unlikely for the latest platforms.
#endif
#ifdef _WIN32
    static LARGE_INTEGER tickFrequency;
    static int isFirstCall = 1;
    if (isFirstCall)
    {
        QueryPerformanceFrequency(&tickFrequency);
        isFirstCall = 0;
    }
    LARGE_INTEGER tickNow;
    QueryPerformanceCounter(&tickNow);
    return tickNow.QuadPart * 1000000 / tickFrequency.QuadPart;
#else
    timeval t;
    gettimeofday(&t, 0);
    return t.tv_sec * uint64_t(1000000) + t.tv_usec;
#endif
}

This plus the simplification of gettimeofday as per two posts up resolves the high CPU issue caused by the pthread_cond_timedwait issue while preserving the high resolution timer code needed elsewhere in SRT for Windows.

Thank you all for the excellent descriptions. I believe I have isolated this defect and can explain why it is happening. I will share a workaround that may resolve the issue until an official fix is made.

System: Windows 10 Enterprise version 1703 64-bit running on Intel Core I7-6820HQ CPU @ 2.70 GHz SRT version 2e6037e pthread-win32 version 19fd505 Built with Visual Studio 15.9.7 (vc++14.14)

Steps to reproduce the problem:

  1. Start srt-live-transmit.exe as listener srt-live-transmit.exe srt://:4000 file://con > null
  2. Observe virtually zero cpu usage
  3. Wait 20-60 minutes (system dependent)
  4. Observe cpu utilization jump to 20%, consistent with maximum cycles on 1-2 threads on my 4 core (8 thread) system

Isolation and explanation:

  • srt-live-transmit.cpp calls srt_epoll_wait() with 100 ms timeout. There is no activity on listening port 4000, so this call is expected to return in 100 ms with error status SRT_ETIMEOUT
  • the function call reaches CEPoll::wait() in epoll.cpp in the SRT protocol stack. This function includes a while(true) loop that polls SRTSOCKETs and SYSSOCKETs every 10 ms, so should do 10 iterations before throwing CUDTException(MJ_AGAIN, MN_XMTIMEOUT, 0) which is the same as SRT_ETIMEOUT.
  • the 10 ms delay per iteration is accomplished with a call to CTimer::waitForEvent(). This function uses pthread_cond_timedwait() which unfortunately takes absolute time as its timeout argument. CTimer::waitForEvent() calls SRTCompat_gettimeofday() in win_time.cpp to obtain the current time and adds 10 ms to result in the target time for the call to pthread_cond_timedwait().
  • the problem is in SRTCompat_gettimeofday(). On first call, it gets the time from windows CRT function ftime() in timeb.h which has millisecond accuracy. This first call to SRTCompat_gettimeofday() is used to establish a baseline “epoch”. Then subsequent calls to SRTCompat_gettimeofday() use QueryPerformanceCounter() and QueryPerformanceFrequency() to compute a higher precision absolute time by adding the counter increment to the baseline “epoch” time.
  • see Acquiring high-resolution time stamps second paragraph: “QPC is independent of and isn’t synchronized to any external time reference.” What happens over time is that the target time computed in SRTCompat_gettimeofday(), supposed to be 10 ms in the future, falls behind. Eventually, the call to pthread_cond_timedwait() is made with a target absolute timeout time that is in the past compared to whatever time is used in pthread-win32, probably ftime(). The number of iterations in each call to CEPoll::wait() to get to the 100 ms delay gradually increases from about 10 to 20,000 on my system, because CTimer::waitForEvent() has delay that decreases from 10 ms to eventually 0. The spin loop in CEPoll::wait() accounts for the high cpu utilization.

Possible Workaround: In SRTCompat_gettimeofday() in win_time.cpp change if (isFirstCall) to if (true). This causes ftime() to be used every time to get the current time from the system. This workaround may not work well on older systems if the resolution of ftime() is coarser than 10 ms.

Fix: Need to change SRTCompat_gettimeofday() so that it doesn’t try to compute absolute time using QueryFrequencyCounter().

@vidanio Why did you close the issue? Still looks like something to investigate.

Also running here since 2 hours ago.

I just started the test with version 1.3.2 that you indicate.