On 2020-08-07 1:06 PM, Cezary Rojewski wrote:
Implement support for Lynxpoint and Wildcat Point AudioDSP. Catpt solution deprecates existing sound/soc/intel/haswell which is removed in the following series. This cover-letter is followed by 'Developer's deep dive' message schedding light on catpt's key concepts and areas addressed.
Developer's deep dive =====================
Purpose of this message is explanation of Catpt's key concepts, problems addressed when compared to /soc/intel/haswell/ (which from now on will be addressed to as /haswell/) as well as answeing major why(s) surrounding the subject. Message does not explain every detail of every process as it's length is already high without doing so. In case of any question and areas which have not beed mentioned here, please don't hesitate to send an email.
Following removal of /haswell/ and moving forward -------------------------------------------------
Catpt is a direct replacement for existing /haswell/ solution. Because of userspace API being inherited as well as FW binary re-used, /haswell/ is going to be removed entirely in the follow-up series. In consequence of that action, majority of processing code found in /soc/intel/common becomes redundant. /common/ was supposed to host code common to Intel AudioDSP architectures. Unfortunately, it never lived up to the expecations. Most of code found there is LPT/WPT specific with some /soc/intel/baytail/ deviates while /soc/intel/skylake/ moved most of it's stuff away from /common/ - even DSP initialization.
To my knowledge, no device makes use of legacy /baytail/ solution any longer with all the products enlisting either /soc/intel/atom/ or /soc/sof/intel/. Following the sanitization of /common/ folder, the logical next step is removal of /baytrail/.
LPT and catpt -------------
I'm not aware of any released Haswell-based product with AudioDSP capability, despite these being planned for some Ultrabooks initially. Broadwell ADSP on the other hand, was released and is present on the marked.
Decision not to cut LPT's ACPI ID off from /catpt/device.c is maintainance based. While a piece of most of the models found on the market is in IGK or Bangalore validation teams hands, the quantity is very limited (~1 per model). On top of that, production stuff is poor ContinuousIntegration medium. Our CI, once attached to the platform, takes basically entire control over it, in exchange allowing to perform test cycles in rapid fashion. Tests areas include but are not limited to D3/D0, S3, S4, S5, G3 (power off), individual streaming, concurrent streaming, cpu overload (so on and so forth..). That comes in cost of preparation - GPIOs and other pins need to be exposed and available for our CI and external hw: power switches, hydras for dynamic external codec connections and disconnections and more. Production stuff, for obvious reasons does not expose such capabilities. This is different on RVPs which are made for that very purpose and are very CI-friendly. There are more of them available too.
Age of the LPT/WPT architecture has its toll, though. Not many working RVP exemplars are available and even less CPUs - yes, you need a special one to match the RVP, production CPU won't cut it. TLDR: there are not enough working WPT RVPs when combined with pre-release CPUs to mark catpt CI healthy for long-term validation (2-3 years+). To compensate, holes have been filled with LPT equivalents.
Considering DSP hw capabilities are basically identical between LPT and WPT with FW code being shared for both of them (one branch) this allows for high test coverage of DSP functionality regardless of PCH present. Codecs support is shared too - /soc/intel/boards/haswell.c aka hsw-rt5640.c and /soc/intel/boards/broadwell.c aka bdw-rt286.c are PCH agnostic. Kernel code differences are minimal between the two and will probably be reduced even more in the future. Given the reasons presented above, I believe gains from healthy CI and test coverage heavily outweight the maintainance cost of few lines of code appending LPT device support.
Device and its components -------------------------
Device probing has been redesigned to accommodate for: single platform_device solution coupling dw_dma dev with actual ADSP device
Starting point was two platform_devices as in /soc/intel/common/sst-acpi.c. First gets created when specific acpi_id is found within DSDT ACPI table and then callback is performed on successful firmware file request which creates yet another device: this time for PCM operations - haswell-pcm-audio. Having in mind Greg KH's idea of reduction of number of platform_devices in Linux environment, decision was made to cut the later off.
This raised a dependency issue as DW DMA Controller - one of LPT/WPT AudioDsp device components - requires the device to be up and running before being probed. Moreover, because of said controller making use of pm_runtime during release, it is paramount pm_runtime is disabled before invoking it. To address this, catpt's .probe() calls _dsp_power_up() - which takes device out of D3 and allows for I/O access before proceeding with DW DMA controller. .remove() starts with pm_runtime_disable() so postmortem suspend and resumes are prevented.
DMAC plays a ADSP personal memcopier role, there are two engines available 8 channels each. It's neither entirely owned by HOST nor DSP: ownership is instead shared. As per spec, to prevent de-synchronization, the following protocol is obeyed: HOST owns DW DMA controller as long as FW isn't alive - that is, FW_READY notification has not been received. Once DSP is unstalled and firmware boots, it is expected that HOST stops all DMA operations and entire ownership of all 16 channels is taken by FW. This effectively limits HOST's DMA usage to FW booting procedure.
Device begins its life in D3 state (11b for PCI PMCS::PS register), and needs to be taken out of it via _dsp_power_up(). That has to be done before FW image loading. Since LPT/WPT ADSP has no IMR - memory region for storing firmware image - capability, context is lost on each D0 exit. In consequence image has to be reloaded on each resume. Here, additional optimization has been added to prevent redundant image flashing from occurring when module is about to be unloaded: module_is_live check.
DMAC is not the only component tied to catpt. After allocating all the necessary resources, probing DMAC and flashing FW, ASoC platform component and card need to be created. The latter is triggered by creation of child platform_device - as is the case for all snd_soc_card(s). As a child device owned by catpt, it's catpt's responsibility to remove it before solution gets unloaded. This is done by the devm_add_action hook provided with platform_device_unregister function.
Compared to /haswell/, catpt allows for core device probing regardless of snd_soc_acpi_mach being present or not. This is similar to codec-driver behaviour which probe right after matching device id on I2S bus and is more logical than complete abort when no machines are present. Allows for core device debug or tests (e.g. power sequences) in the codec-less environment.
Resource management -------------------
In contrast to its younger brothers and sisters from cAVS architecture (SKL+), it's HOST's responsibility to manage SRAM - memory allocation and power gating. SRAM is split into two banks: Data SRAM and Instruction SRAM, subsequently divided into several EBBs with 32kBs each. IRAM is targeted for fixed (static) data while DRAM holds both, static and dynamic information.
Both, FW_BASE module and feature modules require persistent memory allocated to them as well as some temporary one. The temporary block is called scratch and is shared by all modules and thus only one gets allocated compared to persistent ones which are module's individual area. As FW context is lost on each D0 exit, to speed-up the boot process HOST is expected to store dynamic information regions from DRAM. Those regions contain module and stream states which subsequently allow for bringing base FW as well as streams right back to where they were before leaving D0.
Model seen in /soc/intel/common/sst-firmware.c has been redesigned. Two simple structs have been enlisted: catpt_mbank catpt_mregion
to provide resource-like memory management. 'struct resource' serves different purpose and its layout does not fit all needs of catpt and that's why new types have been provided instead. catpt_mbank represents SRAM bank of one type: IRAM -or- DRAM. It's made of _one_ or more catpt_regions, never less.
catpt core device requests memory region for either fixed or dynamic allocation by calling catpt_mbank_request_region or catpt_mbank_reserve_region. Once allocated, catpt_mregion::busy field gets flagged to ensure said region is no longer available until freed by catpt_mbank_release_region. In the very beginning each mbank is made of singular list of regions - one element spanning entire SRAM with ::busy=0. On each allocation this situation changes and more and more blocks are being extracted from the free space. Banks maintain actual list of regions and perform a 'join' procedure when a region gets yielded back to pool of free regions. Said procedure attempts to join adjacent regions as long as they too are ::busy=0.
Presented mechanism allows for keeping lowest possible amount of EBBs alive while power gating rest of them, saving maximum amount of power. There are few exceptions, meaning regions which must always be power un-gated. That goes for 0x200 (FW dump) at the front of DRAM and everything past the highest module offset - that always goes for FW_BASE. Everything else is available for dynamic allocation and should be power gated when possible.
Last but not least is the LPCS - low power clock selection. While clock selection is granular as per hw spec (6+ configurations), catpt deals with it in binary fashion only. Clock is either set to low-power when DSP is idle or high-power when streaming is done. This limitation is inherited from equivalent Windows solution and in order to eliminate it, much more testing has be to done. For now catpt sticks to what's stable. Clock selection itself is guarded by in-progress register and may not be performed until it's cleared. On top of that, as long as FW is alive, HOST should await WAITI state before attempting any selection. This is to ensure work on DSP side is not disrupted by unexpected clock change. While in D3, HOST bypasses that rule and is free to select clock forcibly.
IPC protocol ------------
Catpt features simple, synchronized '1 message out - 1message in' FW communication. This deviates from /soc/intel/common/sst-ipc.c as there are no lists involved and there is no sst_plat_ipc_ops::reply_msg_match. Vast majority of IPCs are one-shots meaning they flag DSP with busy status and until response is received, no further messaging is allowed.
There is only one communication channel for request-response called 'downlink' with secondary channel called 'uplink' available for FW alone to sent notifications to HOST. While most IPCs are one-shots, FW may choose to delay the response. In such cases status PENDING is returned back and HOST is expected to await actual replay coming from the notification channel. Catpt verifies status of incoming response and yields on success or failure but re-awaits the completion on said PENDING status to ensure synchronization remains intact. Example of such delayed reply is RESET_STREAM for low power offload pipe. Until response is received, stream cannot progress in state machine, through operation PAUSE and ultimately, RESUME.
Steps have been taken to reduce kmallocs/ kzallocs in IPC messages. This is done by removal of temporary buffers in requests (/catpt/messages.c) and instead working with provided ret-pointers directly - that is, only when reply with SUCCESS status is returned back. Otherwise ret-pointer is untouched. Moreover, tx buffer has been removed (/catpt/ipc.c::catpt_ipc_arm) as once request is copied to hardware registers, it is no longer of use for IPC framework. Said framework gates the communication - field 'catpt_ipc::ready'. Once FW_READY notification is received, mailbox is initialized and further messaging is allowed. On critical failures: COREDUMP notification -or- IPC timeout ready status is revoked. Things stay that way until DSP is recovered from failure.
ASoC Platform component -----------------------
Solution maintains backwards compatibility with previous one, /haswell/. Kcontrols which were available there - master playback, capture and 2x offload volumes - make they return in catpt. Differences: new kcontrol 'Loopback Mute' has been added volume controls now support quad rather than just stereo
The former is self-explanatory and has been missing in /haswell/. Targets Loopback stream only. The later is an adjustment to align with requirements and FW spec. Years ago, during LPT/WPT development change request has been filed to increase the number of supported channels. Looks like that information didn't get back to Linux, but in catpt this has been addressed. As already noted, WAVES and module support in general, is scheduled for later release.
Both volume and mute controls are stored within kcontrols::private_value and are applied during pcm prepare() operation. As soc_mixer_control is featured in every iteration of helpful macros present in /include/sound/soc.h - which is stereo-configuration biased - decision has been made to relocate control creation to component's .probe().
In regard to PCM operations alone, some invisible to userspace stuff has been re-used from /haswell/ e.g.: volume_map and page_table arranging. As internal FW pipes are governed with state machine, changes have been made to ensure following in obeyed: ALLOC -> RESET -> PAUSE -> RESUME FREE <- RESET <- PAUSE <- RESUME
On existing solution, RESET could have been bypassed and stream moved to PAUSE directly. In catpt, .trigger() and .prepare() functions do the majority of stream's preparation and state changing ensuring these are changed properly.
Another area of interest is substream's private data handling. This has been modified from static block: struct hsw_priv_data::pcm and struct hsw_priv_data::dmab to dynamic - on .startup() _dma_data is allocated and assigned to given DAI. In catpt, said private data is the chest of solution's PCM: struct catpt_stream_runtime. It stores current state, template used and memory allocated. As LPT/WPT DSP does not offer flexible topology, static one is applied. This is manifested via catpt_topology global which describes the shape of every stream. There are seven of them, 2 Bluetooth streams for SSP1 and 5 (system playback & capture, two offloads and loopback) for SSP0.
This data is essential during stream allocation in .hw_params(). On D0 exit SSP device configuration is lost, just like other FW context. SSP device formats are expected to be resent once FW resumes operations. Catpt removes the need for /soc/intel/boards to play with IPCs (sst_hsw_device_set_config) and assigns formats automatically on .pcm_new().
Heavy lifting has also been done for stream-position-update handled by catpt_stream_update_position and POSITION_CHANGED notification. This time, payload dumped by the later is always accounted for instead of being ignored and combined with SET_WRITE_POS ipc, allows for stream progression for OFFLOAD pins. On Dell XPS 13, all exposed DAIs apart from the system one were not working correctly. For offload, HOST owns write-pointer and is expected to send SET_WRITE_POS IPC periodically - that can be done twice prior to stream's start aka RESUME and from there, once on every POSITION_CHANGE notification. For loopback stream, DAPM routes were missing what too has been addressed to ensure stream is functional.
Thanks for bearing with me. In case of any questions, send me an email.
Kind Regards, Czarek