zephyr: Zephyr is missing documentation and validation of device drivers
Introduction
Zephyr is missing documentation and build time validation of device drivers. This results in avoidable, obscure errors during development, and having to look through source code, Kconfig files and devicetree bindings files to determine the existence and capabilities of device drivers. This proposal adds a build step which uses predefined Kconfig symbols to determine device driver capabilities, which are used to validate device drivers early, providing helpful, descriptive errors, and for generating device driver documentation.
Problem description
An example devicetree fragment shows the modem bg95 which needs a device driver which is compatible with "quectel,bg95"
&uart0 {
bg95 {
compatible = "quectel,bg95";
};
};
Obscure errors during development
In this example, we have two drivers which are compatible with "quectel,bg95", MODEM_GSM_PPP and MODEM_QUECTEL_BG9X.
If we build an application without selecting either compatible driver, we get the obscure
undefined reference to __device_dts_ord_ ...
error. If we select both, we get
__device_dts_ord_ ... defined multiple times
These errors are obscure and they occur at the last stage of the build, costing time both in diagnosing the issue and processing the build itself.
Missing device driver documentation
If we search for the bg95 on the Zephyr documentations page, it will return only the bindings file, as it matches “quectel,bg95”. Neither of the compatible device driver’s Kconfig symbols MODEM_GSM_PPP and MODEM_QUECTEL_BG9X will be returned.
To consistently discover all compatible device drivers, knowing only the devices compatible string, we must go to the source code, search for all files which include the sanitized compatible “quectel_bg95”, look for the CMakeLists.txt, files which include source code files, to determine which Kconfig symbol selects the individual device drivers.
To then determine which API the device drivers support, we must find the source file which includes the relevant DEVICE_DT_DEFINE(), and look to the type of the API struct passed to its API parameter. We can now search for the API on the Zephyr documentation page.
Proposed change
Use Kconfig select feature to select predefined symbols which are mapped to device driver properties by symbol name.
An example symbol for the MODEM_QUECTEL_BG9X driver follows:
config MODEM_QUECTEL_BG9X
bool "Enable quectel modem device driver"
select DEVICE_DRIVER -> Declares that this symbol enables a device driver
select DT_COMPAT_QUECTEL_BG95 -> Declares that this device driver supports the quectel,bg95
select DT_COMPAT_QUECTEL_BG96 -> Declares that this device driver supports the quectel,bg96
select DEVICE_DRIVER_API_NET_IF -> Declares that this device exposes the net_if device driver API.
Using the information declared using predefined mapped device driver properties, the obscure linker errors
undefined reference to __device_dts_ord_ ...
and
__device_dts_ord_ ... defined multiple times
will be replaced with the following errors which will show up early in the build, before compilation,
Devicetree node bg95 has status "okay" but no compatible device driver has been included in the build.
The following compatible device drivers where found: MODEM_GSM_PPP, MODEM_QUECTEL_BG9X
and
Conflicting device drivers compatible with devicetree node bg95 have been included in the build.
Only one of the following device drivers may be included in the build: MODEM_GSM_PPP, MODEM_QUECTEL_BG9X
The solution saves time by catching device driver errors in an early stage of the build, and by returning helpful concrete errors. It also enables generating documentation for all existing device drivers, ensuring that searching for bg95 on the Zephyr documentations page will return links to all supported device driver’s documentation pages, which internally link to relevant devicetree bindings, Kconfig symbols and API references.
Detailed RFC
The following steps are required to implement this proposed solution:
- Determine naming conversions which identify device driver properties from symbol names
- Add device driver validation module
- Start adding new symbols to device drivers to have them included by device driver validation module
- Add device driver documentation generation module.
- Deprecate using
depends on DT_HAS_compat_ENABLEDto raise dependency errors, since this is now handled by device driver validation module.
All predefined symbols connected with the devicetree with the prefix DT_ are generated by dts which runs before Kconfig. Generic symbols like DEVICE_DRIVER will be declared in zephyr/drivers/Kconfig, Pattern matching symbols, like DEVICE_DRIVER_API_<api> may be declared where appropriate. This allows for out-of-tree device driver symbol definition.
Proposed change (Detailed)
The proposal updates kconfig.py to additionally export all symbols which select other symbols to a dict named selects.pickle. It will convert the following symbol
config MODEM_QUECTEL_BG9X
bool "Enable quectel modem device driver"
select DEVICE_DRIVER
select DT_COMPAT_QUECTEL_BG95
select DT_COMPAT_QUECTEL_BG96
select DEVICE_DRIVER_API_NET_IF
to
selects = {
"MODEM_QUECTEL_BG9X": ["DEVICE_DRIVER", "DT_COMPAT_QUECTEL_BG95", "DT_COMPAT_QUECTEL_BG96", DEVICE_DRIVER_API_NET_IF]
}
selects.pickle is then used with the output from dts kconfig.dts, and the predefined symbol naming convention, to identify device drivers, which are then exported as a dict named device_drivers.pickle.
device_drivers = {
"MODEM_QUECTEL_BG9X": {
"compatible": ["quectel,bg95", "quectel,bg96"],
"api": ["net_if],
},
}
device_drivers.pickle is then used with the output from dts edt.pickle and Kconfig .config to validate device drivers as described in this proposal.
device_drivers.pickle is also used to generate device driver documentation.
From the build systems perspective, it will go from this:
- Generate edt.pickle and kconfig.dts
- Generate .config
- Compile
to this:
- Generate edt.pickle and kconfig.dts
- Generate .config and selects.pickle
- Generate device_drivers.pickle using kconfig.dts and selects.pickle
- Validate device drivers using edt.pickle, .config and device_drivers.pickle
- Compile
How to implement change step by step:
- Update Kconfig to generate
selects.pickle - Add a cmake module which runs after Kconfig and dts
- Create script which imports
selects.pickleandkconfig.dtsand generatesdevice_drivers.pickle - Create script which imports
device_drivers.pickle,edt.pickleand.config., and uses them to perform the validation described in this proposal. - Add execution of newly created scripts to newly created cmake module
- Merge cmake module with upstream
At this point, validation of device drivers is fully supported. Adding predefined symbols to existing device driver Kconfigs should begin here. A device driver validation error will only occur if a compatible device driver, required by an enabled devicetree node, has been declared using predefined symbols. The more device driver Kconfigs are updated, the better the validation support will be.
- Add support for generating device driver documentation from
device_drivers.pickle. - Wait until all device drivers have been declared using predefined symbols.
- Update the device driver validation module to raise a validation error even if no compatible device driver exists, enforcing device drivers being declared by predefined symbols.
Dependencies
The proposal requires updating Kconfig files for device drivers to use the new predefined symbols. It also strongly encourages not using depends on DT_HAS_compat_ENABLED which will be replaced with select DT_COMPAT_compat.
Concerns and Unresolved Questions
Overlap between current Kconfig dependency error and device driver validation error
The devicetree currently generates symbols which are used in kconfig to determine if a compatible exists, DT_HAS_<compat>_ENABLED, which some Kconfig symbols depend on to determine if they should be selected. This causes a kconfig dependency error if a driver is selected with no supported devices in the devicetree.
There is overlap between that feature and the validation offered by this solution. Kconfig runs before this solution, resulting in the error which would have been provided by this solution being silenced, as the build fails when Kconfig raises the dependency error.
Since the error raised by this solution is more descriptive and helpful, using Kconfig to invoke a dependency error in this case should be discouraged.
Alternatives
Using YAML files to declare device driver properties, instead of predefined Kconfig symbols. Pros using YAML files:
- Can contain other metadata like maintainers etc.
Cons
- Properties declared in YAML files can not be used by Kconfig for configuration
- Requires adding new files alongside Kconfig, source etc.
- May result in duplicate information, in case some device driver property must be used by Kconfig for configuration
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 23 (1 by maintainers)
Consider that the link needs to still be meaningful when the driver implementation and/or filename changes. (E.g., when a file is refactored into separate files, when a driver supports some of the versions of a HW block.)
For end users that are limited to reuse an upstream repository (no additional commits) fixing any mapping errors must be done through configuration settings and mappings in a different repository. If both repositories have parallel directory structures, is it the same source tree? 🤔
In addition to those above, consider:
As mentioned in the issue, under Concerns and Unresolved Questions, this is expanded upon. The solution proposed with this issue also does the reverse check, is there a node which is enabled, for which any driver exists, which has not been enabled, and if so, notify the user of the existence of these drivers. The error reporting proposed in this solution is more helpful than causing a Kconfig dependency error as well.
Having multiple drivers is beneficial for out-of-tree drivers, where you want to overwrite an existing driver with your own implementation. It is also useful for devices which can be interacted with in different ways, like a generic cover all driver, like MODEM_GSM_PPP, and a tailored driver, like MODEM_QUECTEL_BG9X. Turning it around, we should not disallow or prevent having multiple drivers which support the same devices either, it is a valid usecase.