[RFC PATCH 00/16] soundwire/ASoC: speed-up downloads with BTP/BRA protocol
This RFC patchset suggests a new API for ASoC codec drivers to use for firmware/table downloads.
Problem statement:
All existing transfers initiated by codec drivers rely on SoundWire read/write commands, which can only support ONE byte per frame. With the typical 48kHz frame rate, this means 384 kbits/s.
In addition, the command/control is typically handled with a FIFO and interrupts which adds more software overhead. To give a practical reference, sending 32Kb takes 2.5s on Intel platforms, which means about 105kbit/s only. Additional non-audio activity is likely to adversely impact interrupt scheduling and further decrease the transfer speeds.
New SDCA-based codecs have a need to download tables and DSP firmware which are typically between 20 and 256 Kb. The slow bus operation has a direct impact on boot/resume times, and clearly having to wait more than 300ms is a showstopper in terms of latency requirements and user-experience.
Suggested solution:
The MIPI specification and most of the new codecs support the Bulk Transfer Protocol (BTP) and specifically the Bulk Register Access (BRA) configuration. This mode reclaims the 'audio' data space of the SoundWire frame to send firmware/coefficients over the DataPort 0 (DP0).
The API suggested is rather simple, with the following sequence expected: open(): reserve resources and prepare hardware send_async(): trigger DMAs and perform SoundWire bank switch wait(): wait for DMA completion and disable DMAs close(): release resources
Benefits:
Even after accounting for the protocol overhead, the data can be sent 8x or 16x faster on the link than with the regular commands.
With the use of DMAs, the software overhead becomes limited to the initialization. Measured results show that transferring the same 32Kb takes about 100ms, a 25x improvement on the baseline write() commands with an actual bitrate of 2.6 Mbits/s. These results are a measure of bus/hardware performance mainly, and will typically not be too modified by the CPU activity and scheduling.
The performance for reads is similar, with a 25x speedup measured.
Limitations:
Setting up the transfers over DP0 takes time, and the reliance on DMAs on the host side brings alignment restrictions. The BTP/BRA protocol is really only relevant for "large" transfers done during boot/resume *before* audio transfers take place. Mixing BTP/BRA and audio is a nightmare, this patchset suggests a mutual-exclusion between two usages.
Scope:
This patchset only exposes the API and a debugfs interface to initiate commands, validate results and measure performance. The actual use of the API is left as an exercise for codec driver developers.
This patchset depends on a number of pre-requisite patches and will not build on top of any for-next branches. The main intent of this RFC is to gather comments on the usage, API, benefits and restrictions.
The code and functionality was tested on an Intel LunarLake RVP platform connected to a Realtek RT711-SDCA device.
Acknowledgements:
Thanks to Zeek Tsai at Realtek for providing test sequences that helped reconcile the data formatted by the host driver with the expected results on the code side.
Pierre-Louis Bossart (16): Documentation: driver: add SoundWire BRA description soundwire: cadence: add BTP support for DP0 soundwire: stream: extend sdw_alloc_stream() to take 'type' parameter soundwire: extend sdw_stream_type to BPT soundwire: stream: special-case the bus compute_params() routine soundwire: stream: reuse existing code for BPT stream soundwire: bus: add API for BPT protocol soundwire: bus: add bpt_stream pointer soundwire: crc8: add constant table soundwire: cadence: add BTP/BRA helpers to format data soundwire: intel_auxdevice: add indirection for BPT open/close/send_async/wait ASoC: SOF: Intel: hda-sdw-bpt: add helpers for SoundWire BPT DMA soundwire: intel: add BPT context definition soundwire: intel_ace2x: add BPT open/close/send_async/wait soundwire: debugfs: add interface for BPT/BRA transfers ASoC: rt711-sdca: add DP0 support
Documentation/driver-api/soundwire/bra.rst | 478 +++++++++++++ Documentation/driver-api/soundwire/index.rst | 1 + Documentation/driver-api/soundwire/stream.rst | 2 +- .../driver-api/soundwire/summary.rst | 5 +- drivers/soundwire/Kconfig | 1 + drivers/soundwire/Makefile | 4 +- drivers/soundwire/amd_manager.c | 2 +- drivers/soundwire/bus.c | 77 +++ drivers/soundwire/bus.h | 18 + drivers/soundwire/cadence_master.c | 646 +++++++++++++++++- drivers/soundwire/cadence_master.h | 30 + drivers/soundwire/crc8.c | 277 ++++++++ drivers/soundwire/crc8.h | 11 + drivers/soundwire/debugfs.c | 122 +++- .../soundwire/generic_bandwidth_allocation.c | 84 ++- drivers/soundwire/intel.h | 12 + drivers/soundwire/intel_ace2x.c | 377 ++++++++++ drivers/soundwire/intel_auxdevice.c | 55 ++ drivers/soundwire/qcom.c | 2 +- drivers/soundwire/stream.c | 137 +++- include/linux/soundwire/sdw.h | 91 ++- include/linux/soundwire/sdw_intel.h | 16 + include/sound/hda-sdw-bpt.h | 76 +++ sound/soc/codecs/rt711-sdca-sdw.c | 8 + sound/soc/qcom/sdw.c | 2 +- sound/soc/sof/intel/Kconfig | 8 +- sound/soc/sof/intel/Makefile | 4 + sound/soc/sof/intel/hda-sdw-bpt.c | 328 +++++++++ 28 files changed, 2810 insertions(+), 64 deletions(-) create mode 100644 Documentation/driver-api/soundwire/bra.rst create mode 100644 drivers/soundwire/crc8.c create mode 100644 drivers/soundwire/crc8.h create mode 100644 include/sound/hda-sdw-bpt.h create mode 100644 sound/soc/sof/intel/hda-sdw-bpt.c
The Bulk Register Access protocol was left as a TODO topic since 2018. It's time to document this protocol and the design of its Linux support.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- Documentation/driver-api/soundwire/bra.rst | 478 ++++++++++++++++++ Documentation/driver-api/soundwire/index.rst | 1 + .../driver-api/soundwire/summary.rst | 5 +- 3 files changed, 480 insertions(+), 4 deletions(-) create mode 100644 Documentation/driver-api/soundwire/bra.rst
diff --git a/Documentation/driver-api/soundwire/bra.rst b/Documentation/driver-api/soundwire/bra.rst new file mode 100644 index 000000000000..4cc934bf614d --- /dev/null +++ b/Documentation/driver-api/soundwire/bra.rst @@ -0,0 +1,478 @@ +========================== +Bulk Register Access (BRA) +========================== + +Conventions +----------- + +Capitalized words used in this documentation are intentional and refer +to concepts of the SoundWire 1.x specification. + +Introduction +------------ + +The SoundWire 1.x specification provides a mechanism to speed-up +command/control transfers by reclaiming parts of the audio +bandwidth. The Bulk Register Access (BRA) protocol is a standard +solution based on the Bulk Payload Transport (BPT) definitions. + +The regular control channel uses Column 0 and can only send/retrieve +one byte per frame with write/read commands. With a typical 48kHz +frame rate, only 48kB/s can be transferred. + +The optional Bulk Register Access capability can transmit up to 12 +Mbits/s and reduce transfer times by several orders of magnitude, but +has multiple design constraints: + + (1) Each frame can only support a read or a write transfer, with a + 10-byte overhead per frame (header and footer response). + + (2) The read/writes SHALL be from/to contiguous register addresses + in the same frame. A fragmented register space decreases the + efficiency of the protocol by requiring multiple BRA transfers + scheduled in different frames. + + (3) The targeted Peripheral device SHALL support the optional Data + Port 0, and likewise the Manager SHALL expose audio-like Ports + to insert BRA packets in the audio payload using the concepts of + Sample Interval, HSTART, HSTOP, etc. + + (4) The BRA transport efficiency depends on the available + bandwidth. If there are no on-going audio transfers, the entire + frame minus Column 0 can be reclaimed for BRA. The frame shape + also impacts efficiency: since Column0 cannot be used for + BTP/BRA, the frame should rely on a large number of columns and + minimize the number of rows. The bus clock should be as high as + possible. + + (5) The number of bits transferred per frame SHALL be a multiple of + 8 bits. Padding bits SHALL be inserted if necessary at the end + of the data. + + (6) The regular read/write commands can be issued in parallel with + BRA transfers. This is convenient to e.g. deal with alerts, jack + detection or change the volume during firmware download, but + accessing the same address with two independent protocols has to + be avoided to avoid undefined behavior. + + (7) Some implementations may not be capable of handling the + bandwidth of the BRA protocol, e.g. in the case of a slow I2C + bus behind the SoundWire IP. In this case, the transfers may + need to be spaced in time or flow-controlled. + + (8) Each BRA packet SHALL be marked as 'Active' when valid data is + to be transmitted. This allows for software to allocate a BRA + stream but not transmit/discard data while processing the + results or preparing the next batch of data, or allowing the + peripheral to deal with the previous transfer. In addition BRA + transfer can be started early on without data being ready. + + (9) Up to 470 bytes may be transmitted per frame. + + (10) The address is represented with 32 bits and does not rely on + the paging registers used for the regular command/control + protocol in Column 0. + + +Error checking +-------------- + +Firmware download is one of the key usages of the Bulk Register Access +protocol. To make sure the binary data integrity is not compromised by +transmission or programming errors, each BRA packet provides: + + (1) A CRC on the 7-byte header. This CRC helps the Peripheral Device + check if it is addressed and set the start address and number of + bytes. The Peripheral Device provides a response in Byte 7. + + (2) A CRC on the data block (header excluded). This CRC is + transmitted as the last-but-one byte in the packet, prior to the + footer response. + +The header response can be one of + (a) Ack + (b) Nak + (c) Not Ready + +The footer response can be one of + (1) Ack + (2) Nak (CRC failure) + (3) Good (operation completed) + (4) Bad (operation failed) + +Example frame +------------- + +The example below is not to scale and makes simplifying assumptions +for clarity. The different chunks in the BRA packets are not required +to start on a new SoundWire Row, and the scale of data may vary. + + :: + + +---+--------------------------------------------+ + + | | + + | BRA HEADER | + + | | + + +--------------------------------------------+ + + C | HEADER CRC | + + O +--------------------------------------------+ + + M | HEADER RESPONSE | + + M +--------------------------------------------+ + + A | | + + N | | + + D | DATA | + + | | + + | | + + | | + + +--------------------------------------------+ + + | DATA CRC | + + +--------------------------------------------+ + + | FOOTER RESPONSE | + +---+--------------------------------------------+ + + +Assuming the frame uses N columns, the configuration shown above can +be programmed by setting the DP0 registers as: + + - HSTART = 1 + - HSTOP = N - 1 + - Sampling Interval = N + - WordLength = N - 1 + +Addressing restrictions +----------------------- + +The Device Number specified in the Header follows the SoundWire +definitions, and broadcast and group addressing are +permitted. However, in reality it is very unlikely that the exact same +binary data needs to be provided to the two different Peripheral +devices. The Linux implementation only allows for transfers to a +single device at a time. + +In the case of multiple Peripheral devices attached to different +Managers, the broadcast and group addressing is not supported by the +SoundWire specification. Each device must be handled with separate BRA +streams, possibly in parallel - the links are really independent. + +Unsupported features +-------------------- + +The Bulk Register Access specification provides a number of +capabilities that are not supported in known implementations, such as: + + (1) Transfers initiated by a Peripheral Device. The BRA Initiator is + always the Manager Device. + + (2) Flow-control capabilities and retransmission based on the + 'NotReady' header response require extra buffering in the + SoundWire IP and are not implemented. + +Bi-directional handling +----------------------- + +The BRA protocol can handle writes as well as reads, and in each +packet the header and footer response are provided by the Peripheral +Target device. On the Peripheral device, the BRA protocol is handled +by a single DP0 data port, and at the low-level the bus ownership can +will change for header/footer response as well as the data transmitted +during a read. + +On the host side, most implementations rely on a Port-like concept, +with two FIFOs consuming/generating data transfers in parallel +(Host->Peripheral and Peripheral->Host). The amount of data +consumed/produced by these FIFOs is not symmetrical, as a result +hardware typically inserts markers to help software and hardware +interpret raw data + +Each packet will typically have + + (1) a 'Start of Packet' indicator. + + (2) an 'End of Packet' indicator. + + (3) a packet identifier to correlate the data requested and + transmitted, and the error status for each frame + +Hardware implementations can check errors at the frame level, and +retry a transfer in case of errors. However, as for the flow-control +case, this requires extra buffering and intelligence in the +hardware. The Linux support assumes that the entire transfer is +cancelled if a single error is detected in one of the responses. + +Cadence IP BRA support +---------------------- + +Format requirements +~~~~~~~~~~~~~~~~~~~ + +The Cadence IP relies on PDI0 for TX and PDI1 for RX. The data needs +to be formatted with the following conventions: + + (1) all Data is stored in bits 15..0 of the 32-bit PDI FIFOs. + + (2) the start of packet is BIT(31). + + (3) the end of packet is BIT(30). + + (4) A packet ID is stored in bits 19..16. This packet ID is + determined by software and is typically a rolling counter. + + (5) Padding shall be inserted as needed so that the Header CRC, + Header response, Footer CRC, Footer response are always in + Byte0. Padding is inserted by software for writes, and on reads + software shall discard the padding added by the hardware. + +Example format +~~~~~~~~~~~~~~ + +The following table represents the sequence provided to PDI0 for a +write command followed by a read command. + +:: + + +---+---+--------+---------------+---------------+ + + 1 | 0 | ID = 0 | WR HDR[1] | WR HDR[0] | + + | | | WR HDR[3] | WR HDR[2] | + + | | | WR HDR[5] | WR HDR[4] | + + | | | pad | WR HDR CRC | + + | | | WR Data[1] | WR Data[0] | + + | | | WR Data[3] | WR Data[2] | + + | | | WR Data[n-2] | WR Data[n-3] | + + | | | pad | WR Data[n-1] | + + 0 | 1 | | pad | WR Data CRC | + +---+---+--------+---------------+---------------+ + + 1 | 0 | ID = 1 | RD HDR[1] | RD HDR[0] | + + | | | RD HDR[3] | RD HDR[2] | + + | | | RD HDR[5] | RD HDR[4] | + + 0 | 1 | | pad | RD HDR CRC | + +---+---+--------+---------------+---------------+ + + +The table below represents the data received on PDI1 for the same +write command followed by a read command. + +:: + + +---+---+--------+---------------+---------------+ + + 1 | 0 | ID = 0 | pad | WR Hdr Rsp | + + 0 | 1 | | pad | WR Ftr Rsp | + +---+---+--------+---------------+---------------+ + + 1 | 0 | ID = 0 | pad | Rd Hdr Rsp | + + | | | RD Data[1] | RD Data[0] | + + | | | RD Data[3] | RD Data[2] | + + | | | RD HDR[n-2] | RD Data[n-3] | + + | | | pad | RD Data[n-1] | + + | | | pad | RD Data CRC | + + 0 | 1 | | pad | RD Ftr Rsp | + +---+---+--------+---------------+---------------+ + + +Peripheral/bus interface +------------------------ + +Regmap use +~~~~~~~~~~ + +Existing codec drivers rely on regmap to download firmware to +Peripherals, so at a high-level it would seem natural to combine BRA +and regmap. The regmap layer could check if BRA is available or not, +and use a regular read-write command channel in the latter case. + +However, the regmap layer does not have information on latency or how +critical a BRA transfer is. It would make more sense to let the codec +driver make decisions (wait, timeout, fallback to regular +read/writes). + +In addition, the hardware may lose context and ask for the firmware to +be downloaded again. The firmware is not however a fixed definition, +the SDCA definition allows the hardware to request an updated firmware +or a different coefficient table to deal with specific environment +conditions. In other words, traditional regmap cache management is not +good enough, the Peripheral driver is required track hardware +notifications and react accordingly. + +Abstraction required +~~~~~~~~~~~~~~~~~~~~ + +There are no standard registers or mandatory implementation at the +Manager level, so the low-level BPT/BRA details must be hidden in +Manager-specific code. For example the Cadence IP format above is not +known to the codec drivers. + +Likewise, codec drivers should not have to know the frame size. The +computation of CRC and handling of responses is handled in helpers and +Manager-specific code. + +The host BRA driver may also have restrictions on pages allocated for +DMA, or other host-DSP communication protocols. The codec driver +should not be aware of any of these restrictions, since it might be +reused in combination with different implementations of Manager IPs. + +Concurrency between BRA and regular read/write +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The existing 'nread/nwrite' API already relies on a notion of start +address and number of bytes, so it would be possible to extend this +API with a 'hint' requesting BPT/BRA be used. + +However BRA transfers could be quite long, and the use of a single +mutex for regular read/write and BRA is a show-stopper. Independent +operation of the control/command and BRA transfers is a fundamental +requirement, e.g. to change the volume level with the existing regmap +interface while downloading firmware. + +This places the burden on the codec driver to verify that there is no +concurrent access to the same address with the command/control +protocol and the BRA protocol. + +In addition, the 'sdw_msg' structure hard-codes support for 16-bit +addresses and paging registers which are irrelevant for BPT/BRA +support based on native 32-bit addresses. A separate API with +'sdw_bpt_msg' makes more sense. + +One possible strategy to speed-up all initialization tasks would be to +start a BRA transfer for firmware download, then deal with all the +"regular" read/writes in parallel with the command channel, and last +to wait for the BRA transfers to complete. This would allow for a +degree of overlap instead of a purely sequential solution. As a +results, the BRA API must support async transfers and expose a +separate wait function. + +Error handling +~~~~~~~~~~~~~~ + +The expected response to a 'bra_message' and follow-up behavior may +vary: + + (1) A Peripheral driver may want to receive an immediate -EBUSY + response if the BRA protocol is not available at a given time. + + (2) A Peripheral driver may want to wait until a timeout for the + on-going transfer to be handled + + (3) A Peripheral driver may want to wait until existing BRA + transfers complete or deal with BRA as a background task when + audio transfers stop. In this case, there would be no timeout, + and the operation may not happen if the platform is suspended. + +BRA stream model +---------------- + +For regular audio transfers, the machine driver exposes a dailink +connecting CPU DAI(s) and Codec DAI(s). + +This model is not required BRA support: + + (1) The SoundWire DAIs are mainly wrappers for SoundWire Data + Ports, with possibly some analog or audio conversion + capabilities bolted behind the Data Port. In the context of + BRA, the DP0 is the destination. DP0 registers are standard and + can be programmed blindly without knowing what Peripheral is + connected to each link. In addition, if there are multiple + Peripherals on a link and some of them do not support DP0, the + write commands to program DP0 registers will generate harmless + COMMAND_IGNORED responses that will be wired-ORed with + responses from Peripherals which support DP0. In other words, + the DP0 programming can be done with broadcast commands, and + the information on the Target device can be added only in the + BRA Header. + + (2) At the CPU level, the DAI concept is not useful for BRA; the + machine driver will not create a dailink relying on DP0. The + only concept that is needed is the notion of port. + + (3) The stream concept relies on a set of master_rt and slave_rt + concepts. All of these entities represent ports and not DAIs. + + (4) With the assumption that a single BRA stream is used per link, + that stream can connect master ports as well as all peripheral + DP0 ports. + + (5) BRA transfers only make sense in the concept of one + Manager/Link, so the BRA stream handling does not rely on the + concept of multi-link aggregation allowed by regular DAI links. + +Audio DMA support +----------------- + +Some DMAs, such as HDaudio, require an audio format field to be +set. This format is in turn used to define acceptable bursts. BPT/BRA +support is not fully compatible with these definitions in that the +format may vary between read and write commands. + +In addition, on Intel HDaudio Intel platforms the DMAs need to be +programmed with a PCM format matching the bandwidth of the BPT/BRA +transfer. The format is based on 48kHz 32-bit samples, and the number +of channels varies to adjust the bandwidth. The notion of channel is +completely notional since the data is not typical audio +PCM. Programming channels helps reserve enough bandwidth and adjust +FIFO sizes to avoid xruns. Note that the quality of service comes as a +cost. Since all channels need to be present as a sample block, data +sizes not aligned to 128-bytes are not supported. + +BTP/BRA API available to peripheral drivers +------------------------------------------- + +ASoC Peripheral drivers may use + + - sdw_bpt_stream_open(mode) + + This function verifies that the BPT protocol with the + 'mode'. For now only BRA is accepted as a mode. This function + allocates a work buffer internally. This buffer is not exposed + to the caller. + + errors: + -ENODEV: BPT/BRA is not supported by the Manager. + + -EBUSY: another agent is already using the audio payload for + audio transfers. There is no way to predict when the audio + streams might stop, this will require the Peripheral driver + to fall back to the regular (slow) command channel. + + -EAGAIN: another agent is already transferring data using the + BPT/BRA protocol. Since the transfers will typically last + 10s or 100s of ms, the Peripheral driver may wait and retry + later. + + - sdw_bpt_message_send_async(bpt_message) + + This function sends the data using the Manager + implementation-defined capabilities (typically DMA or IPC + protocol). If the message exceeds the size of the buffer + allocated in the 'open' stage, the data will be copied and + transmitted in multiple chunks using the buffer. This function + cannot be called multiple times to queue transfers, the codec + driver needs to wait for completion of the requested transfer. + + errors: + + -ENODEV: BPT/BRA is not supported by the Manager. + + -EINVAL: no resources available. + + - sdw_bpt_message_wait(timeout) + + This function waits for the entire message provided by the codec + driver in the 'send_async' stage. Intermediate status for + smaller chunks will not be provided back to the codec driver, + only a return code will be provided. + + errors: + + -ENODEV: BPT/BRA is not supported by the Manager. + + -EINVAL: no transfer queued + + -EIO: some sort of transmisson error happened, typically a + bad CRC was detected. + + -ETIMEDOUT: transfer did not complete + + - sdw_bpt_stream_close() + + This functions releases the buffer allocated in the open stage + and decreases the refcounts. + + Note that it's possible to call send_async/message_wait multiple + times, it's not required to close/open. diff --git a/Documentation/driver-api/soundwire/index.rst b/Documentation/driver-api/soundwire/index.rst index 234911a0db99..8d826fd5781f 100644 --- a/Documentation/driver-api/soundwire/index.rst +++ b/Documentation/driver-api/soundwire/index.rst @@ -9,6 +9,7 @@ SoundWire Documentation stream error_handling locking + bra
.. only:: subproject and html
diff --git a/Documentation/driver-api/soundwire/summary.rst b/Documentation/driver-api/soundwire/summary.rst index 01dcb954f6d7..260a1c78545e 100644 --- a/Documentation/driver-api/soundwire/summary.rst +++ b/Documentation/driver-api/soundwire/summary.rst @@ -187,10 +187,7 @@ reconfigurations. Future enhancements to be done ==============================
- (1) Bulk Register Access (BRA) transfers. - - - (2) Multiple data lane support. + (1) Multiple data lane support.
Links =====
On Thu, Dec 07, 2023 at 04:29:29PM -0600, Pierre-Louis Bossart wrote:
+Addressing restrictions +-----------------------
+The Device Number specified in the Header follows the SoundWire +definitions, and broadcast and group addressing are +permitted. However, in reality it is very unlikely that the exact same +binary data needs to be provided to the two different Peripheral +devices. The Linux implementation only allows for transfers to a +single device at a time.
For the firmware download case it seems likely that this won't always be the case, but it's definitely a thing that could be done incrementally.
+Regmap use +~~~~~~~~~~
+Existing codec drivers rely on regmap to download firmware to +Peripherals, so at a high-level it would seem natural to combine BRA +and regmap. The regmap layer could check if BRA is available or not, +and use a regular read-write command channel in the latter case.
+However, the regmap layer does not have information on latency or how +critical a BRA transfer is. It would make more sense to let the codec +driver make decisions (wait, timeout, fallback to regular +read/writes).
These don't seem like insurmountable obstacles - they feel more like a copy break kind of situation where we can infer things from the pattern of transactions, and there's always the possibility of adding some calls to give regmap more high level information about the overall state of the device. One of the expected usage patterns with cache only mode is to build up a final register state then let the cache code work out the optimal pattern to actually write that out.
+In addition, the hardware may lose context and ask for the firmware to +be downloaded again. The firmware is not however a fixed definition, +the SDCA definition allows the hardware to request an updated firmware +or a different coefficient table to deal with specific environment +conditions. In other words, traditional regmap cache management is not +good enough, the Peripheral driver is required track hardware +notifications and react accordingly.
I might be missing something but those requests for redownload sound like things that would occur regardless of the mechanism used to perform the I/O?
+Concurrency between BRA and regular read/write +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The existing 'nread/nwrite' API already relies on a notion of start +address and number of bytes, so it would be possible to extend this +API with a 'hint' requesting BPT/BRA be used.
+However BRA transfers could be quite long, and the use of a single +mutex for regular read/write and BRA is a show-stopper. Independent +operation of the control/command and BRA transfers is a fundamental +requirement, e.g. to change the volume level with the existing regmap +interface while downloading firmware.
+This places the burden on the codec driver to verify that there is no +concurrent access to the same address with the command/control +protocol and the BRA protocol.
This makes me feel very nervous about robustness. I do note that regmap has an async interface which is currently only used for SPI and really only by the Cirrus DSPs (plus opportunisticly by rbtree cache sync), the original usage was to fill the pipleine of SPI controllers so we can ideally push data entirely from interrupt. TBH that was done before SMP became standard and it's a lot less clear in this day and age that this is actually helpful, the overhead of cross core synchronisation likely obliterates any benefit. There's sufficently few users that we could make API changes readily to fit better.
+One possible strategy to speed-up all initialization tasks would be to +start a BRA transfer for firmware download, then deal with all the +"regular" read/writes in parallel with the command channel, and last +to wait for the BRA transfers to complete. This would allow for a +degree of overlap instead of a purely sequential solution. As a +results, the BRA API must support async transfers and expose a +separate wait function.
This feels a lot like it could map onto the regmap async interface, it would need a bit of a rework (mainly that currently they provide ordering guarantees with respect to both each other and sync I/O) but those could be removed with some care) but it's got the "here's a list of I/O, here's another call to ensure it's all actually happened" thing. It sounds very much like how I was thinking of the async API when I was writing it and the initial users.
It's this bit that really got me thinking it could fit well into regmap.
+Error handling +~~~~~~~~~~~~~~
+The expected response to a 'bra_message' and follow-up behavior may +vary:
- (1) A Peripheral driver may want to receive an immediate -EBUSY
response if the BRA protocol is not available at a given time.
- (2) A Peripheral driver may want to wait until a timeout for the
on-going transfer to be handled
- (3) A Peripheral driver may want to wait until existing BRA
transfers complete or deal with BRA as a background task when
audio transfers stop. In this case, there would be no timeout,
and the operation may not happen if the platform is suspended.
Option 3 would be a jump for regmap.
Thanks for the comments Mark, much appreciated.
+Addressing restrictions +-----------------------
+The Device Number specified in the Header follows the SoundWire +definitions, and broadcast and group addressing are +permitted. However, in reality it is very unlikely that the exact same +binary data needs to be provided to the two different Peripheral +devices. The Linux implementation only allows for transfers to a +single device at a time.
For the firmware download case it seems likely that this won't always be the case, but it's definitely a thing that could be done incrementally.
One example discussed this week on the mailing list is the Cirrus Logic case, where the firmware contains information on which speakers a given amplifier should be doing, and each firmware is named as AMP-n.
I have really not found any practical case where the same data needs to be sent to N devices, and I don't have a burning desire to tie the codec initialization together with all the asynchronous nature of enumeration/probe.
+Regmap use +~~~~~~~~~~
+Existing codec drivers rely on regmap to download firmware to +Peripherals, so at a high-level it would seem natural to combine BRA +and regmap. The regmap layer could check if BRA is available or not, +and use a regular read-write command channel in the latter case.
+However, the regmap layer does not have information on latency or how +critical a BRA transfer is. It would make more sense to let the codec +driver make decisions (wait, timeout, fallback to regular +read/writes).
These don't seem like insurmountable obstacles - they feel more like a copy break kind of situation where we can infer things from the pattern of transactions, and there's always the possibility of adding some calls to give regmap more high level information about the overall state of the device. One of the expected usage patterns with cache only mode is to build up a final register state then let the cache code work out the optimal pattern to actually write that out.
I did expect some pushback on regmap :-)
The point is really that the main use for this download is a set of write-once memory areas which happen to be mapped as registers. There is no real need to declare or access each memory address individually.
In addition in terms of error handling, the expectation is that the set of writes are handled in a pass/fail manner. There's no real way to know which of the individual writes failed.
+In addition, the hardware may lose context and ask for the firmware to +be downloaded again. The firmware is not however a fixed definition, +the SDCA definition allows the hardware to request an updated firmware +or a different coefficient table to deal with specific environment +conditions. In other words, traditional regmap cache management is not +good enough, the Peripheral driver is required track hardware +notifications and react accordingly.
I might be missing something but those requests for redownload sound like things that would occur regardless of the mechanism used to perform the I/O?
What I meant is that the codec driver would e.g. need to fetch a different firmware table and download it. There's no real need to maintain a cache on the host side since the entire table will be downloaded.
+Concurrency between BRA and regular read/write +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The existing 'nread/nwrite' API already relies on a notion of start +address and number of bytes, so it would be possible to extend this +API with a 'hint' requesting BPT/BRA be used.
+However BRA transfers could be quite long, and the use of a single +mutex for regular read/write and BRA is a show-stopper. Independent +operation of the control/command and BRA transfers is a fundamental +requirement, e.g. to change the volume level with the existing regmap +interface while downloading firmware.
+This places the burden on the codec driver to verify that there is no +concurrent access to the same address with the command/control +protocol and the BRA protocol.
This makes me feel very nervous about robustness. I do note that regmap has an async interface which is currently only used for SPI and really only by the Cirrus DSPs (plus opportunisticly by rbtree cache sync), the original usage was to fill the pipleine of SPI controllers so we can ideally push data entirely from interrupt. TBH that was done before SMP became standard and it's a lot less clear in this day and age that this is actually helpful, the overhead of cross core synchronisation likely obliterates any benefit. There's sufficently few users that we could make API changes readily to fit better.
I am not that nervous, it's a known hardware issue and the software drivers SHALL make sure that the two methods of accessing a register are not used concurrently. We could add some sort of mutex on specific memory areas.
+One possible strategy to speed-up all initialization tasks would be to +start a BRA transfer for firmware download, then deal with all the +"regular" read/writes in parallel with the command channel, and last +to wait for the BRA transfers to complete. This would allow for a +degree of overlap instead of a purely sequential solution. As a +results, the BRA API must support async transfers and expose a +separate wait function.
This feels a lot like it could map onto the regmap async interface, it would need a bit of a rework (mainly that currently they provide ordering guarantees with respect to both each other and sync I/O) but those could be removed with some care) but it's got the "here's a list of I/O, here's another call to ensure it's all actually happened" thing. It sounds very much like how I was thinking of the async API when I was writing it and the initial users.
It's this bit that really got me thinking it could fit well into regmap.
I really don't know anything about this async interface, if you have pointers on existing examples I am all ears....I am not aware of any Intel platform or codec used on an Intel platform making use of this API.
At any rate the low-level behavior is to a) allocate and configure all the SoundWire resources b) allocate and configure all the DMA resources c) trigger DMA and enable SoundWire transfers d) wait for the DMA to complete
The codec API can be modified as needed, as long as there are provisions for these 4 steps.
+Error handling +~~~~~~~~~~~~~~
+The expected response to a 'bra_message' and follow-up behavior may +vary:
- (1) A Peripheral driver may want to receive an immediate -EBUSY
response if the BRA protocol is not available at a given time.
- (2) A Peripheral driver may want to wait until a timeout for the
on-going transfer to be handled
- (3) A Peripheral driver may want to wait until existing BRA
transfers complete or deal with BRA as a background task when
audio transfers stop. In this case, there would be no timeout,
and the operation may not happen if the platform is suspended.
Option 3 would be a jump for regmap.
Sorry, I don't get what a 'jump' means in this context.
On Thu, Dec 07, 2023 at 06:56:55PM -0600, Pierre-Louis Bossart wrote:
+The Device Number specified in the Header follows the SoundWire +definitions, and broadcast and group addressing are +permitted. However, in reality it is very unlikely that the exact same +binary data needs to be provided to the two different Peripheral +devices. The Linux implementation only allows for transfers to a +single device at a time.
For the firmware download case it seems likely that this won't always be the case, but it's definitely a thing that could be done incrementally.
One example discussed this week on the mailing list is the Cirrus Logic case, where the firmware contains information on which speakers a given amplifier should be doing, and each firmware is named as AMP-n.
I can imagine a vendor structuring things so they've got separate firmware and coefficent/configuration images, the firmware images could be shared.
I have really not found any practical case where the same data needs to be sent to N devices, and I don't have a burning desire to tie the codec initialization together with all the asynchronous nature of enumeration/probe.
Like I say it does seem like something that could be done incrementally.
These don't seem like insurmountable obstacles - they feel more like a copy break kind of situation where we can infer things from the pattern of transactions, and there's always the possibility of adding some calls to give regmap more high level information about the overall state of the device. One of the expected usage patterns with cache only mode is to build up a final register state then let the cache code work out the optimal pattern to actually write that out.
I did expect some pushback on regmap :-)
The point is really that the main use for this download is a set of write-once memory areas which happen to be mapped as registers. There is no real need to declare or access each memory address individually.
Yeah, normally it's just write only stuff but I've seen things like controls being added in the DSP memory before - the
In addition in terms of error handling, the expectation is that the set of writes are handled in a pass/fail manner. There's no real way to know which of the individual writes failed.
That's the case for any block writes.
I might be missing something but those requests for redownload sound like things that would occur regardless of the mechanism used to perform the I/O?
What I meant is that the codec driver would e.g. need to fetch a different firmware table and download it. There's no real need to maintain a cache on the host side since the entire table will be downloaded.
I mean, if that's the usage pattern surely things would just be marked volatile anyway? A cache is entirely optional.
This feels a lot like it could map onto the regmap async interface, it would need a bit of a rework (mainly that currently they provide ordering guarantees with respect to both each other and sync I/O) but those could be removed with some care) but it's got the "here's a list of I/O, here's another call to ensure it's all actually happened" thing. It sounds very much like how I was thinking of the async API when I was writing it and the initial users.
It's this bit that really got me thinking it could fit well into regmap.
I really don't know anything about this async interface, if you have pointers on existing examples I am all ears....I am not aware of any Intel platform or codec used on an Intel platform making use of this API.
grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, or there's the rbtree cache sync code which uses a back door to go into an async mode. Basically just variants of all the normal regmap I/O calls with a _complete() call you can use to wait for everything to happen. The implementation is a bit heavyweight since it was written to work with fairly slow buses.
At any rate the low-level behavior is to a) allocate and configure all the SoundWire resources b) allocate and configure all the DMA resources c) trigger DMA and enable SoundWire transfers d) wait for the DMA to complete
The codec API can be modified as needed, as long as there are provisions for these 4 steps.
That's not quite how the current API works, but it feels like it's close enough to the intent and there's literally one user of the actual API.
- (3) A Peripheral driver may want to wait until existing BRA
transfers complete or deal with BRA as a background task when
audio transfers stop. In this case, there would be no timeout,
and the operation may not happen if the platform is suspended.
Option 3 would be a jump for regmap.
Sorry, I don't get what a 'jump' means in this context.
Big change.
This feels a lot like it could map onto the regmap async interface, it would need a bit of a rework (mainly that currently they provide ordering guarantees with respect to both each other and sync I/O) but those could be removed with some care) but it's got the "here's a list of I/O, here's another call to ensure it's all actually happened" thing. It sounds very much like how I was thinking of the async API when I was writing it and the initial users.
It's this bit that really got me thinking it could fit well into regmap.
I really don't know anything about this async interface, if you have pointers on existing examples I am all ears....I am not aware of any Intel platform or codec used on an Intel platform making use of this API.
grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, or there's the rbtree cache sync code which uses a back door to go into an async mode. Basically just variants of all the normal regmap I/O calls with a _complete() call you can use to wait for everything to happen. The implementation is a bit heavyweight since it was written to work with fairly slow buses.
I spent a fair amount of time this afternoon trying to understand the regmap_async parts, and I am not following where in the code there is an ordering requirement/enforcement between async and sync usages.
I do see this comment in the code
* @async_write: Write operation which completes asynchronously, optional and must serialise with respect to non-async I/O.
But that 'must' is a requirement on the codec side, isn't it?
When using regmap_raw_write_async(), the lock is released immediately. I don't see anything at the regmap level that would prevent a codec driver from using regmap_raw_write() immediately.
That's probably a violation of the API to do so, but it's the same problem I was referring earlier in the conversation where 'regular' read/writes cannot happen in parallel with BTP/BRA transactions.
Also is this just me spacing out or there is no regmap_raw_read_async()?
On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, or there's the rbtree cache sync code which uses a back door to go into an async mode. Basically just variants of all the normal regmap I/O calls with a _complete() call you can use to wait for everything to happen. The implementation is a bit heavyweight since it was written to work with fairly slow buses.
I spent a fair amount of time this afternoon trying to understand the regmap_async parts, and I am not following where in the code there is an ordering requirement/enforcement between async and sync usages.
The only actual async implementation is SPI which processes things in order of submission, the sync API wraps the async API.
Also is this just me spacing out or there is no regmap_raw_read_async()?
Right, there was never any need.
On 12/19/23 17:53, Mark Brown wrote:
On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, or there's the rbtree cache sync code which uses a back door to go into an async mode. Basically just variants of all the normal regmap I/O calls with a _complete() call you can use to wait for everything to happen. The implementation is a bit heavyweight since it was written to work with fairly slow buses.
I spent a fair amount of time this afternoon trying to understand the regmap_async parts, and I am not following where in the code there is an ordering requirement/enforcement between async and sync usages.
The only actual async implementation is SPI which processes things in order of submission, the sync API wraps the async API.
Also is this just me spacing out or there is no regmap_raw_read_async()?
Right, there was never any need.
ok. I am starting to think that we could have a new type of regmap, say "regmap-sdw-bra", where the use of write_raw_async() would rely on the send/wait bus primitives, and write_raw() would fallback to the regular read/write commands. We'd need a mutual exclusion to prevent parallel async/sync access to the same regmap.
In other words, "memory" areas that are used for firmware downloads would be moved to a different regmap with async capabilities and no caching support.
On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote:
On 12/19/23 17:53, Mark Brown wrote:
On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, or there's the rbtree cache sync code which uses a back door to go into an async mode. Basically just variants of all the normal regmap I/O calls with a _complete() call you can use to wait for everything to happen. The implementation is a bit heavyweight since it was written to work with fairly slow buses.
I spent a fair amount of time this afternoon trying to understand the regmap_async parts, and I am not following where in the code there is an ordering requirement/enforcement between async and sync usages.
The only actual async implementation is SPI which processes things in order of submission, the sync API wraps the async API.
Also is this just me spacing out or there is no regmap_raw_read_async()?
Right, there was never any need.
ok. I am starting to think that we could have a new type of regmap, say "regmap-sdw-bra", where the use of write_raw_async() would rely on the send/wait bus primitives, and write_raw() would fallback to the regular read/write commands. We'd need a mutual exclusion to prevent parallel async/sync access to the same regmap.
In other words, "memory" areas that are used for firmware downloads would be moved to a different regmap with async capabilities and no caching support.
I would be a little inclined to say leave adding a regmap for a follow up series, whether we add it to the existing regmap or add a new one, or whatever, it should all sit happily on top of the API being added in this series. Makes it a little more contained to focus on one area at a time, and leave this series as adding core support for BRA. But that said, if we really want to I don't feel mega strongly on this one.
Thanks, Charles
On 12/20/23 16:16, Charles Keepax wrote:
On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote:
On 12/19/23 17:53, Mark Brown wrote:
On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
grep for regmap_.*async - cs_dsp.c is the upstream example in a driver, or there's the rbtree cache sync code which uses a back door to go into an async mode. Basically just variants of all the normal regmap I/O calls with a _complete() call you can use to wait for everything to happen. The implementation is a bit heavyweight since it was written to work with fairly slow buses.
I spent a fair amount of time this afternoon trying to understand the regmap_async parts, and I am not following where in the code there is an ordering requirement/enforcement between async and sync usages.
The only actual async implementation is SPI which processes things in order of submission, the sync API wraps the async API.
Also is this just me spacing out or there is no regmap_raw_read_async()?
Right, there was never any need.
ok. I am starting to think that we could have a new type of regmap, say "regmap-sdw-bra", where the use of write_raw_async() would rely on the send/wait bus primitives, and write_raw() would fallback to the regular read/write commands. We'd need a mutual exclusion to prevent parallel async/sync access to the same regmap.
In other words, "memory" areas that are used for firmware downloads would be moved to a different regmap with async capabilities and no caching support.
I would be a little inclined to say leave adding a regmap for a follow up series, whether we add it to the existing regmap or add a new one, or whatever, it should all sit happily on top of the API being added in this series. Makes it a little more contained to focus on one area at a time, and leave this series as adding core support for BRA. But that said, if we really want to I don't feel mega strongly on this one.
Right, I was probably going too far down in the details for a December 20 post.
The point I was trying to make it seems there's consensus that regmap with the async parts would be the API used by SoundWire/ASoC codecs, and the regmap implementation would rely on the bus/host send/wait routines.
The regmap stuff will need joint work with codec driver folks so it should indeed be done in a second step when the SoundWire bus/host parts are available.
Put differently: is there any sustained objection to the proposal of extending regmap with async BRA transfers?
On Wed, Dec 20, 2023 at 07:26:24PM +0100, Pierre-Louis Bossart wrote:
Put differently: is there any sustained objection to the proposal of extending regmap with async BRA transfers?
Not from me, and I agree that it makes sense to do it once the underlying SoundWire bits are in place.
On Wed, Dec 20, 2023 at 07:26:24PM +0100, Pierre-Louis Bossart wrote:
On 12/20/23 16:16, Charles Keepax wrote:
On Tue, Dec 19, 2023 at 06:08:15PM +0100, Pierre-Louis Bossart wrote:
On 12/19/23 17:53, Mark Brown wrote:
On Tue, Dec 19, 2023 at 05:50:30PM +0100, Pierre-Louis Bossart wrote:
Put differently: is there any sustained objection to the proposal of extending regmap with async BRA transfers?
Seems good to me, was not my intention to object to the idea itself.
Thanks, Charles
Hi Mark,
+One possible strategy to speed-up all initialization tasks would be to +start a BRA transfer for firmware download, then deal with all the +"regular" read/writes in parallel with the command channel, and last +to wait for the BRA transfers to complete. This would allow for a +degree of overlap instead of a purely sequential solution. As a +results, the BRA API must support async transfers and expose a +separate wait function.
This feels a lot like it could map onto the regmap async interface, it would need a bit of a rework (mainly that currently they provide ordering guarantees with respect to both each other and sync I/O) but those could be removed with some care) but it's got the "here's a list of I/O, here's another call to ensure it's all actually happened" thing. It sounds very much like how I was thinking of the async API when I was writing it and the initial users.
It's this bit that really got me thinking it could fit well into regmap.
I am now revisiting this entire patchset to try to enable firmware downloads with this SoundWire BPT/BRA protocol. I re-looked at regmap_raw_write_async()/regmap_async_complete() and they seem to map well with my initial write/wait_async proposal.
Do I get this right that these async capabilities could be used to enable this faster SoundWire protocol, but the regular regmap_write() could still happen in parallel with these async transfers? This could be useful if e.g. there's a jack detection while downloading firmware for another function.
I don't see any locking that would prevent such a dual operation - with the caveat that it would be up to the codec driver to make sure they don't try to access the same register with the two modes of operation.
The only thing that would need to be extended is that we'd need additional callbacks to check if the protocol is supported on a given hardware/firmware platform, and if there's no audio stream. In these cases the async writes would be demoted to regular writes. Otherwise the bus would be locked to make sure no reconfiguration takes place while the firmware download happens, and unlocked when the transfer ends.
Thanks for your help on all this, -Pierre
On Tue, Aug 20, 2024 at 09:48:05AM +0200, Pierre-Louis Bossart wrote:
This feels a lot like it could map onto the regmap async interface, it would need a bit of a rework (mainly that currently they provide ordering guarantees with respect to both each other and sync I/O) but those could be removed with some care) but it's got the "here's a list of I/O, here's another call to ensure it's all actually happened" thing. It sounds very much like how I was thinking of the async API when I was writing it and the initial users.
It's this bit that really got me thinking it could fit well into regmap.
Do I get this right that these async capabilities could be used to enable this faster SoundWire protocol, but the regular regmap_write() could still happen in parallel with these async transfers? This could be useful if e.g. there's a jack detection while downloading firmware for another function.
As far as the regmap core is concerned, yes - with SPI we do wind up with ordering but that's because the SPI sync API is a thin wrapper around the SPI async API rather than anything regmap does.
The only thing that would need to be extended is that we'd need additional callbacks to check if the protocol is supported on a given hardware/firmware platform, and if there's no audio stream. In these cases the async writes would be demoted to regular writes. Otherwise the bus would be locked to make sure no reconfiguration takes place while the firmware download happens, and unlocked when the transfer ends.
Those callbacks seem reasonable. We already do the demotion to sync for buses that don't have async, it'd just need to be made dynamic.
On 8/20/24 13:53, Mark Brown wrote:
On Tue, Aug 20, 2024 at 09:48:05AM +0200, Pierre-Louis Bossart wrote:
This feels a lot like it could map onto the regmap async interface, it would need a bit of a rework (mainly that currently they provide ordering guarantees with respect to both each other and sync I/O) but those could be removed with some care) but it's got the "here's a list of I/O, here's another call to ensure it's all actually happened" thing. It sounds very much like how I was thinking of the async API when I was writing it and the initial users.
It's this bit that really got me thinking it could fit well into regmap.
Do I get this right that these async capabilities could be used to enable this faster SoundWire protocol, but the regular regmap_write() could still happen in parallel with these async transfers? This could be useful if e.g. there's a jack detection while downloading firmware for another function.
As far as the regmap core is concerned, yes - with SPI we do wind up with ordering but that's because the SPI sync API is a thin wrapper around the SPI async API rather than anything regmap does.
ok. I am struggling a bit to adjust the plumbing that was obviously defined for SPI.
The regmap async_write() doesn't have any wait, but there's a notifier that needs to be called by *something* that waits. I think the SPI layer has a set of kthreads but in our case we could just rely on a a workqueue after 1ms or something and do the wait then for the DMAs to complete and finally call the notifier to wake-up the regmap stuff.
The only thing that would need to be extended is that we'd need additional callbacks to check if the protocol is supported on a given hardware/firmware platform, and if there's no audio stream. In these cases the async writes would be demoted to regular writes. Otherwise the bus would be locked to make sure no reconfiguration takes place while the firmware download happens, and unlocked when the transfer ends.
Those callbacks seem reasonable. We already do the demotion to sync for buses that don't have async, it'd just need to be made dynamic.
sounds good.
On Tue, Aug 20, 2024 at 04:58:30PM +0200, Pierre-Louis Bossart wrote:
On 8/20/24 13:53, Mark Brown wrote:
As far as the regmap core is concerned, yes - with SPI we do wind up with ordering but that's because the SPI sync API is a thin wrapper around the SPI async API rather than anything regmap does.
ok. I am struggling a bit to adjust the plumbing that was obviously defined for SPI.
The regmap async_write() doesn't have any wait, but there's a notifier that needs to be called by *something* that waits. I think the SPI layer has a set of kthreads but in our case we could just rely on a a workqueue after 1ms or something and do the wait then for the DMAs to complete and finally call the notifier to wake-up the regmap stuff.
From the regmap point of view the expectation is that when the caller wants to know that all the I/Os have actually happened it should call regmap_async_complete() which will do the actual blocking. regmap itself won't autonomously wait for anything, it's up to the caller to do that (presumably it will at some point care).
On Thu, Dec 07, 2023 at 04:29:29PM -0600, Pierre-Louis Bossart wrote:
The Bulk Register Access protocol was left as a TODO topic since 2018. It's time to document this protocol and the design of its Linux support.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
+Concurrency between BRA and regular read/write +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The existing 'nread/nwrite' API already relies on a notion of start +address and number of bytes, so it would be possible to extend this +API with a 'hint' requesting BPT/BRA be used.
+However BRA transfers could be quite long, and the use of a single +mutex for regular read/write and BRA is a show-stopper. Independent +operation of the control/command and BRA transfers is a fundamental +requirement, e.g. to change the volume level with the existing regmap +interface while downloading firmware.
Is this definitely a show stopper? Not saying that it wouldn't be desirable to do both from a speed perspective, but current systems that download firmware (I2C/SPI) typically will block the bus for some amount of time. There are also some desirable properties to a single lock such as not needing to worry about accessing the same register in the bulk transfer and a normal command transfer.
+Audio DMA support +-----------------
+Some DMAs, such as HDaudio, require an audio format field to be +set. This format is in turn used to define acceptable bursts. BPT/BRA +support is not fully compatible with these definitions in that the +format may vary between read and write commands.
+In addition, on Intel HDaudio Intel platforms the DMAs need to be +programmed with a PCM format matching the bandwidth of the BPT/BRA +transfer. The format is based on 48kHz 32-bit samples, and the number +of channels varies to adjust the bandwidth. The notion of channel is +completely notional since the data is not typical audio +PCM. Programming channels helps reserve enough bandwidth and adjust +FIFO sizes to avoid xruns. Note that the quality of service comes as a +cost. Since all channels need to be present as a sample block, data +sizes not aligned to 128-bytes are not supported.
Apologies but could you elaborate a litte on this? I am not sure I follow the reasoning, how does the 48k 32bit DMA implementation result in 128-byte limitation? I would have thought 1 channel would be 4-bytes and you are varying the channels so I would have expected 4-byte aligned maybe 8-byte if the DMA expects stereo pairs.
And what exactly do we mean by aligned, are we saying the length all transfers needs to be a multiple of 128-bytes?
I think we might have some annoying restrictions on the block size on our hardware as well I will go dig into that and report back.
Thanks, Charles
On 12/8/23 10:27, Charles Keepax wrote:
On Thu, Dec 07, 2023 at 04:29:29PM -0600, Pierre-Louis Bossart wrote:
The Bulk Register Access protocol was left as a TODO topic since 2018. It's time to document this protocol and the design of its Linux support.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
+Concurrency between BRA and regular read/write +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The existing 'nread/nwrite' API already relies on a notion of start +address and number of bytes, so it would be possible to extend this +API with a 'hint' requesting BPT/BRA be used.
+However BRA transfers could be quite long, and the use of a single +mutex for regular read/write and BRA is a show-stopper. Independent +operation of the control/command and BRA transfers is a fundamental +requirement, e.g. to change the volume level with the existing regmap +interface while downloading firmware.
Is this definitely a show stopper? Not saying that it wouldn't be desirable to do both from a speed perspective, but current systems that download firmware (I2C/SPI) typically will block the bus for some amount of time. There are also some desirable properties to a single lock such as not needing to worry about accessing the same register in the bulk transfer and a normal command transfer.
There are other cases where we do want to interact with the device even if we are in the middle of downloading stuff. We currently have a msg_lock that's used to prevent multiple read/writes from being sent on the bus, I really don't think it would be wise to use this same "lightweight" lock for much longer transfers. This could impact response to alerts, changes in state reported by the hardware, etc.
+Audio DMA support +-----------------
+Some DMAs, such as HDaudio, require an audio format field to be +set. This format is in turn used to define acceptable bursts. BPT/BRA +support is not fully compatible with these definitions in that the +format may vary between read and write commands.
+In addition, on Intel HDaudio Intel platforms the DMAs need to be +programmed with a PCM format matching the bandwidth of the BPT/BRA +transfer. The format is based on 48kHz 32-bit samples, and the number +of channels varies to adjust the bandwidth. The notion of channel is +completely notional since the data is not typical audio +PCM. Programming channels helps reserve enough bandwidth and adjust +FIFO sizes to avoid xruns. Note that the quality of service comes as a +cost. Since all channels need to be present as a sample block, data +sizes not aligned to 128-bytes are not supported.
Apologies but could you elaborate a litte on this? I am not sure I follow the reasoning, how does the 48k 32bit DMA implementation result in 128-byte limitation? I would have thought 1 channel would be 4-bytes and you are varying the channels so I would have expected 4-byte aligned maybe 8-byte if the DMA expects stereo pairs.
With the 50x4 default frame size that we're using, a BTP/BRA write requires 13.824 Mbits/s for TX and 3.072Mbits/s for RX. We have to treat these "non-audio' streams as respectively 10 and 2 channels of 48kHz 32bit data to reserve the equivalent PCM bandwidth, and that creates an alignment requirement since the DMA expects all 'channels' to be present in the same block.
We could try and play with parameters, e.g. I am not sure what happens if we used 192kHz, maybe that reduces the alignment back to 32 bytes. But the point is arbitrary sizes *cannot* be handled.
The HDaudio spec also mentions that for efficiency reason it's strongly recommended to use 128-byte multiples.
And what exactly do we mean by aligned, are we saying the length all transfers needs to be a multiple of 128-bytes?
Yes.
I think we might have some annoying restrictions on the block size on our hardware as well I will go dig into that and report back.
That would be very useful indeed, we have to make sure this alignment is reasonably supported across hosts and devices. If we can't agree we'll have to make the alignment host-dependent, but that will make the logic in codec drivers more complicated.
On Fri, Dec 08, 2023 at 12:45:21PM -0600, Pierre-Louis Bossart wrote:
On 12/8/23 10:27, Charles Keepax wrote:
I think we might have some annoying restrictions on the block size on our hardware as well I will go dig into that and report back.
That would be very useful indeed, we have to make sure this alignment is reasonably supported across hosts and devices. If we can't agree we'll have to make the alignment host-dependent, but that will make the logic in codec drivers more complicated.
Hopefully even if it's complicated it's something that can be factored out into library code rather than replicated.
Hi Pierre,
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
The Bulk Register Access protocol was left as a TODO topic since 2018. It's time to document this protocol and the design of its Linux support.
Thanks for letting me know about this thread. At least now with B4 we can grab threads we are not copied.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
Documentation/driver-api/soundwire/bra.rst | 478 ++++++++++++++++++
Can we split the cadence parts of this to bra-cadence.rst that way this file documents the core parts only
+Concurrency between BRA and regular read/write +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The existing 'nread/nwrite' API already relies on a notion of start +address and number of bytes, so it would be possible to extend this +API with a 'hint' requesting BPT/BRA be used.
+However BRA transfers could be quite long, and the use of a single +mutex for regular read/write and BRA is a show-stopper. Independent +operation of the control/command and BRA transfers is a fundamental +requirement, e.g. to change the volume level with the existing regmap +interface while downloading firmware.
+This places the burden on the codec driver to verify that there is no +concurrent access to the same address with the command/control +protocol and the BRA protocol.
+In addition, the 'sdw_msg' structure hard-codes support for 16-bit +addresses and paging registers which are irrelevant for BPT/BRA +support based on native 32-bit addresses. A separate API with +'sdw_bpt_msg' makes more sense.
+One possible strategy to speed-up all initialization tasks would be to +start a BRA transfer for firmware download, then deal with all the +"regular" read/writes in parallel with the command channel, and last +to wait for the BRA transfers to complete. This would allow for a +degree of overlap instead of a purely sequential solution. As a +results, the BRA API must support async transfers and expose a +separate wait function.
That sounds logical to me
+Error handling +~~~~~~~~~~~~~~
+The expected response to a 'bra_message' and follow-up behavior may +vary:
- (1) A Peripheral driver may want to receive an immediate -EBUSY
response if the BRA protocol is not available at a given time.
- (2) A Peripheral driver may want to wait until a timeout for the
on-going transfer to be handled
- (3) A Peripheral driver may want to wait until existing BRA
transfers complete or deal with BRA as a background task when
audio transfers stop. In this case, there would be no timeout,
and the operation may not happen if the platform is suspended.
Is this runtime suspend or S3/S4 case?
+BTP/BRA API available to peripheral drivers +-------------------------------------------
+ASoC Peripheral drivers may use
- sdw_bpt_stream_open(mode)
This function verifies that the BPT protocol with the
'mode'. For now only BRA is accepted as a mode. This function
allocates a work buffer internally. This buffer is not exposed
to the caller.
errors:
-ENODEV: BPT/BRA is not supported by the Manager.
-EBUSY: another agent is already using the audio payload for
audio transfers. There is no way to predict when the audio
streams might stop, this will require the Peripheral driver
to fall back to the regular (slow) command channel.
-EAGAIN: another agent is already transferring data using the
BPT/BRA protocol. Since the transfers will typically last
10s or 100s of ms, the Peripheral driver may wait and retry
later.
- sdw_bpt_message_send_async(bpt_message)
why not have a single API that does both? First check if it is supported and then allocate buffers and do the transfer.. What are the advantages of using this two step process
Documentation/driver-api/soundwire/bra.rst | 478 ++++++++++++++++++
Can we split the cadence parts of this to bra-cadence.rst that way this file documents the core parts only
Yes, we can split the Cadence parts out.
+Error handling +~~~~~~~~~~~~~~
+The expected response to a 'bra_message' and follow-up behavior may +vary:
- (1) A Peripheral driver may want to receive an immediate -EBUSY
response if the BRA protocol is not available at a given time.
- (2) A Peripheral driver may want to wait until a timeout for the
on-going transfer to be handled
- (3) A Peripheral driver may want to wait until existing BRA
transfers complete or deal with BRA as a background task when
audio transfers stop. In this case, there would be no timeout,
and the operation may not happen if the platform is suspended.
Is this runtime suspend or S3/S4 case?
System suspend (which can also mean S0i1).
I don't think we can have a case where a peripheral driver waits on something without having done a pm_runtime_get_sync() to prevent runtime_pm suspend.
+BTP/BRA API available to peripheral drivers +-------------------------------------------
+ASoC Peripheral drivers may use
- sdw_bpt_stream_open(mode)
This function verifies that the BPT protocol with the
'mode'. For now only BRA is accepted as a mode. This function
allocates a work buffer internally. This buffer is not exposed
to the caller.
errors:
-ENODEV: BPT/BRA is not supported by the Manager.
-EBUSY: another agent is already using the audio payload for
audio transfers. There is no way to predict when the audio
streams might stop, this will require the Peripheral driver
to fall back to the regular (slow) command channel.
-EAGAIN: another agent is already transferring data using the
BPT/BRA protocol. Since the transfers will typically last
10s or 100s of ms, the Peripheral driver may wait and retry
later.
- sdw_bpt_message_send_async(bpt_message)
why not have a single API that does both? First check if it is supported and then allocate buffers and do the transfer.. What are the advantages of using this two step process
Symmetry is the only thing that comes to my mind. Open - close and send - wait are natural matches, aren't they?
We do need a wait(), so bundling open() and send() would be odd.
But you have a point that the open() is not generic in that it also prepares the DMA buffers for transmission. Maybe it's more natural to follow the traditional open(), hw_params(), hw_free, close() from ALSA.
On Mon, Dec 18, 2023 at 01:58:47PM +0100, Pierre-Louis Bossart wrote:
why not have a single API that does both? First check if it is supported and then allocate buffers and do the transfer.. What are the advantages of using this two step process
Symmetry is the only thing that comes to my mind. Open - close and send
- wait are natural matches, aren't they?
We do need a wait(), so bundling open() and send() would be odd.
I agree send->wait->close would be odd, But you just bundle close into wait. So the API becomes just send->wait, which seems pretty logical.
But you have a point that the open() is not generic in that it also prepares the DMA buffers for transmission. Maybe it's more natural to follow the traditional open(), hw_params(), hw_free, close() from ALSA.
I think this just makes it worse, you are now adding even more calls. The problem I see here is that, open and close (at least to me) strongly implies that you can do multiple operations between them and unless I have misunderstood something here you can't.
Thanks, Charles
On 12/18/23 08:29, Charles Keepax wrote:
On Mon, Dec 18, 2023 at 01:58:47PM +0100, Pierre-Louis Bossart wrote:
why not have a single API that does both? First check if it is supported and then allocate buffers and do the transfer.. What are the advantages of using this two step process
Symmetry is the only thing that comes to my mind. Open - close and send
- wait are natural matches, aren't they?
We do need a wait(), so bundling open() and send() would be odd.
I agree send->wait->close would be odd, But you just bundle close into wait. So the API becomes just send->wait, which seems pretty logical.
Fair enough, send()/wait() would work indeed.
I guess I wanted to keep the callbacks reasonably small (already 200 lines for the open), but we can split the 'send' callback into smaller helpers to keep the code readable. There's no good reason to expose these smaller helpers to codec drivers.
But you have a point that the open() is not generic in that it also prepares the DMA buffers for transmission. Maybe it's more natural to follow the traditional open(), hw_params(), hw_free, close() from ALSA.
I think this just makes it worse, you are now adding even more calls. The problem I see here is that, open and close (at least to me) strongly implies that you can do multiple operations between them and unless I have misunderstood something here you can't.
That's right, the open was not compatible with multiple operations. Collapsing open/send and wait/close sounds more logical, thanks for the feedback.
On 18-12-23, 17:33, Pierre-Louis Bossart wrote:
On 12/18/23 08:29, Charles Keepax wrote:
On Mon, Dec 18, 2023 at 01:58:47PM +0100, Pierre-Louis Bossart wrote:
why not have a single API that does both? First check if it is supported and then allocate buffers and do the transfer.. What are the advantages of using this two step process
Symmetry is the only thing that comes to my mind. Open - close and send
- wait are natural matches, aren't they?
We do need a wait(), so bundling open() and send() would be odd.
I agree send->wait->close would be odd, But you just bundle close into wait. So the API becomes just send->wait, which seems pretty logical.
Fair enough, send()/wait() would work indeed.
I guess I wanted to keep the callbacks reasonably small (already 200 lines for the open), but we can split the 'send' callback into smaller helpers to keep the code readable. There's no good reason to expose these smaller helpers to codec drivers.
Yes! that would be a better design IMO
But you have a point that the open() is not generic in that it also prepares the DMA buffers for transmission. Maybe it's more natural to follow the traditional open(), hw_params(), hw_free, close() from ALSA.
I think this just makes it worse, you are now adding even more calls. The problem I see here is that, open and close (at least to me) strongly implies that you can do multiple operations between them and unless I have misunderstood something here you can't.
That's right, the open was not compatible with multiple operations. Collapsing open/send and wait/close sounds more logical, thanks for the feedback.
Sure
On 18-12-23, 14:29, Charles Keepax wrote:
On Mon, Dec 18, 2023 at 01:58:47PM +0100, Pierre-Louis Bossart wrote:
why not have a single API that does both? First check if it is supported and then allocate buffers and do the transfer.. What are the advantages of using this two step process
Symmetry is the only thing that comes to my mind. Open - close and send
- wait are natural matches, aren't they?
We do need a wait(), so bundling open() and send() would be odd.
I agree send->wait->close would be odd, But you just bundle close into wait. So the API becomes just send->wait, which seems pretty logical.
You would drop close as well, only send and wait...
But you have a point that the open() is not generic in that it also prepares the DMA buffers for transmission. Maybe it's more natural to follow the traditional open(), hw_params(), hw_free, close() from ALSA.
I think this just makes it worse, you are now adding even more calls. The problem I see here is that, open and close (at least to me) strongly implies that you can do multiple operations between them and unless I have misunderstood something here you can't.
Thanks, Charles
On 18-12-23, 13:58, Pierre-Louis Bossart wrote:
Documentation/driver-api/soundwire/bra.rst | 478 ++++++++++++++++++
Can we split the cadence parts of this to bra-cadence.rst that way this file documents the core parts only
Yes, we can split the Cadence parts out.
Great
+Error handling +~~~~~~~~~~~~~~
+The expected response to a 'bra_message' and follow-up behavior may +vary:
- (1) A Peripheral driver may want to receive an immediate -EBUSY
response if the BRA protocol is not available at a given time.
- (2) A Peripheral driver may want to wait until a timeout for the
on-going transfer to be handled
- (3) A Peripheral driver may want to wait until existing BRA
transfers complete or deal with BRA as a background task when
audio transfers stop. In this case, there would be no timeout,
and the operation may not happen if the platform is suspended.
Is this runtime suspend or S3/S4 case?
System suspend (which can also mean S0i1).
I don't think we can have a case where a peripheral driver waits on something without having done a pm_runtime_get_sync() to prevent runtime_pm suspend.
+BTP/BRA API available to peripheral drivers +-------------------------------------------
+ASoC Peripheral drivers may use
- sdw_bpt_stream_open(mode)
This function verifies that the BPT protocol with the
'mode'. For now only BRA is accepted as a mode. This function
allocates a work buffer internally. This buffer is not exposed
to the caller.
errors:
-ENODEV: BPT/BRA is not supported by the Manager.
-EBUSY: another agent is already using the audio payload for
audio transfers. There is no way to predict when the audio
streams might stop, this will require the Peripheral driver
to fall back to the regular (slow) command channel.
-EAGAIN: another agent is already transferring data using the
BPT/BRA protocol. Since the transfers will typically last
10s or 100s of ms, the Peripheral driver may wait and retry
later.
- sdw_bpt_message_send_async(bpt_message)
why not have a single API that does both? First check if it is supported and then allocate buffers and do the transfer.. What are the advantages of using this two step process
Symmetry is the only thing that comes to my mind. Open - close and send
- wait are natural matches, aren't they?
Why have symmetry to DAI apis, why not symmetry to regmap write APIs..? This is data transfer, so I am not sure why would we model it as a DAI. (Internal implementation may rely on that but from API design, i dont think that should be a concern)
We do need a wait(), so bundling open() and send() would be odd.
But you have a point that the open() is not generic in that it also prepares the DMA buffers for transmission. Maybe it's more natural to follow the traditional open(), hw_params(), hw_free, close() from ALSA.
The register definitions are missing a BULK_ENABLE bitfield which must be set for DP0.
In addition, the existing mapping from PDI to Data Port is 1:1. That's fine for PCM streams which are by construction in one direction only. The BTP/BRA protocol is bidirectional and relies on DP0 only, which breaks the 1:1 mapping. DP0 MUST be mapped to both PDI0 and PDI1, with PDI1 taking care of the TX direction and PDI1 of the RX direction.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/cadence_master.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c index 725222036853..bb623f82826c 100644 --- a/drivers/soundwire/cadence_master.c +++ b/drivers/soundwire/cadence_master.c @@ -183,6 +183,7 @@ MODULE_PARM_DESC(cdns_mcp_int_mask, "Cadence MCP IntMask"); #define CDNS_PORTCTRL_TEST_FAILED BIT(1) #define CDNS_PORTCTRL_DIRN BIT(7) #define CDNS_PORTCTRL_BANK_INVERT BIT(8) +#define CDNS_PORTCTRL_BULK_ENABLE BIT(16)
#define CDNS_PORT_OFFSET 0x80
@@ -1831,13 +1832,20 @@ void sdw_cdns_config_stream(struct sdw_cdns *cdns,
if (cdns->bus.params.m_data_mode != SDW_PORT_DATA_MODE_NORMAL) val |= CDNS_PORTCTRL_TEST_FAILED; + } else if (pdi->num == 0) { + val |= CDNS_PORTCTRL_BULK_ENABLE; } offset = CDNS_PORTCTRL + pdi->num * CDNS_PORT_OFFSET; cdns_updatel(cdns, offset, - CDNS_PORTCTRL_DIRN | CDNS_PORTCTRL_TEST_FAILED, + CDNS_PORTCTRL_DIRN | CDNS_PORTCTRL_TEST_FAILED | + CDNS_PORTCTRL_BULK_ENABLE, val);
- val = pdi->num; + /* The DataPort0 needs to be mapped to both PDI and PDI1 ! */ + if (pdi->num == 1) + val = 0; + else + val = pdi->num; val |= CDNS_PDI_CONFIG_SOFT_RESET; val |= FIELD_PREP(CDNS_PDI_CONFIG_CHANNEL, (1 << ch) - 1); cdns_writel(cdns, CDNS_PDI_CONFIG(pdi->num), val);
In the existing definition of sdw_stream_runtime, the 'type' member is never set and defaults to PCM. To prepare for the BPT/BRA support, we need to special-case streams and make use of the 'type'.
No functional change for now, the implicit PCM type is now explicit.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- Documentation/driver-api/soundwire/stream.rst | 2 +- drivers/soundwire/stream.c | 7 +++++-- include/linux/soundwire/sdw.h | 3 ++- sound/soc/qcom/sdw.c | 2 +- 4 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/Documentation/driver-api/soundwire/stream.rst b/Documentation/driver-api/soundwire/stream.rst index b432a2de45d3..65b9bb742689 100644 --- a/Documentation/driver-api/soundwire/stream.rst +++ b/Documentation/driver-api/soundwire/stream.rst @@ -291,7 +291,7 @@ per stream. From ASoC DPCM framework, this stream state maybe linked to
.. code-block:: c
- int sdw_alloc_stream(char * stream_name); + int sdw_alloc_stream(char * stream_name, enum sdw_stream_type type);
The SoundWire core provides a sdw_startup_stream() helper function, typically called during a dailink .startup() callback, which performs diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c index c4ab360e727f..c1e3bb897fb9 100644 --- a/drivers/soundwire/stream.c +++ b/drivers/soundwire/stream.c @@ -1743,12 +1743,14 @@ static int set_stream(struct snd_pcm_substream *substream, * sdw_alloc_stream() - Allocate and return stream runtime * * @stream_name: SoundWire stream name + * @type: stream type (could be PCM, PDM or BPT) * * Allocates a SoundWire stream runtime instance. * sdw_alloc_stream should be called only once per stream. Typically * invoked from ALSA/ASoC machine/platform driver. */ -struct sdw_stream_runtime *sdw_alloc_stream(const char *stream_name) +struct sdw_stream_runtime *sdw_alloc_stream(const char *stream_name, + enum sdw_stream_type type) { struct sdw_stream_runtime *stream;
@@ -1760,6 +1762,7 @@ struct sdw_stream_runtime *sdw_alloc_stream(const char *stream_name) INIT_LIST_HEAD(&stream->master_list); stream->state = SDW_STREAM_ALLOCATED; stream->m_rt_count = 0; + stream->type = type;
return stream; } @@ -1788,7 +1791,7 @@ int sdw_startup_stream(void *sdw_substream) if (!name) return -ENOMEM;
- sdw_stream = sdw_alloc_stream(name); + sdw_stream = sdw_alloc_stream(name, SDW_STREAM_PCM); if (!sdw_stream) { dev_err(rtd->dev, "alloc stream failed for substream DAI %s\n", substream->name); ret = -ENOMEM; diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 71a7031f7b3a..5cc7fed1bf19 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -1024,7 +1024,8 @@ struct sdw_stream_runtime { int m_rt_count; };
-struct sdw_stream_runtime *sdw_alloc_stream(const char *stream_name); +struct sdw_stream_runtime *sdw_alloc_stream(const char *stream_name, + enum sdw_stream_type type); void sdw_release_stream(struct sdw_stream_runtime *stream);
int sdw_compute_params(struct sdw_bus *bus); diff --git a/sound/soc/qcom/sdw.c b/sound/soc/qcom/sdw.c index 77dbe0c28b29..d91a1bd0c66c 100644 --- a/sound/soc/qcom/sdw.c +++ b/sound/soc/qcom/sdw.c @@ -27,7 +27,7 @@ int qcom_snd_sdw_startup(struct snd_pcm_substream *substream) struct snd_soc_dai *codec_dai; int ret, i;
- sruntime = sdw_alloc_stream(cpu_dai->name); + sruntime = sdw_alloc_stream(cpu_dai->name, SDW_STREAM_PCM); if (!sruntime) return -ENOMEM;
BPT/BRA need to be special cased, i.e. there's no point in using the bandwidth allocation since the entire frame can be used.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- include/linux/soundwire/sdw.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 5cc7fed1bf19..9f04f3ad044f 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -140,12 +140,14 @@ enum sdw_dpn_pkg_mode { * * @SDW_STREAM_PCM: PCM data stream * @SDW_STREAM_PDM: PDM data stream + * @SDW_STREAM_BPT: BPT data stream * * spec doesn't define this, but is used in implementation */ enum sdw_stream_type { SDW_STREAM_PCM = 0, SDW_STREAM_PDM = 1, + SDW_STREAM_BPT = 2, };
/** @@ -959,7 +961,7 @@ struct sdw_port_config { * @ch_count: Channel count of the stream * @bps: Number of bits per audio sample * @direction: Data direction - * @type: Stream type PCM or PDM + * @type: Stream type PCM, PDM or BPT */ struct sdw_stream_config { unsigned int frame_rate; @@ -1009,7 +1011,7 @@ struct sdw_stream_params { * @name: SoundWire stream name * @params: Stream parameters * @state: Current state of the stream - * @type: Stream type PCM or PDM + * @type: Stream type PCM, PDM or BPT * @master_list: List of Master runtime(s) in this stream. * master_list can contain only one m_rt per Master instance * for a stream
For BPT support, we want to allocate the entire audio payload and bypass the allocation based on PCM/PDM parameters.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/amd_manager.c | 2 +- .../soundwire/generic_bandwidth_allocation.c | 84 ++++++++++++++++++- drivers/soundwire/qcom.c | 2 +- drivers/soundwire/stream.c | 6 +- include/linux/soundwire/sdw.h | 4 +- 5 files changed, 91 insertions(+), 7 deletions(-)
diff --git a/drivers/soundwire/amd_manager.c b/drivers/soundwire/amd_manager.c index a3b1f4e6f0f9..96480f6ddd3f 100644 --- a/drivers/soundwire/amd_manager.c +++ b/drivers/soundwire/amd_manager.c @@ -389,7 +389,7 @@ static u32 amd_sdw_read_ping_status(struct sdw_bus *bus) return slave_stat; }
-static int amd_sdw_compute_params(struct sdw_bus *bus) +static int amd_sdw_compute_params(struct sdw_bus *bus, bool bpt_stream) { struct sdw_transport_data t_data = {0}; struct sdw_master_runtime *m_rt; diff --git a/drivers/soundwire/generic_bandwidth_allocation.c b/drivers/soundwire/generic_bandwidth_allocation.c index c70a63d009ae..04083122acde 100644 --- a/drivers/soundwire/generic_bandwidth_allocation.c +++ b/drivers/soundwire/generic_bandwidth_allocation.c @@ -81,6 +81,82 @@ void sdw_compute_slave_ports(struct sdw_master_runtime *m_rt, } EXPORT_SYMBOL(sdw_compute_slave_ports);
+static void sdw_compute_slave_dp0_ports(struct sdw_master_runtime *m_rt, + struct sdw_transport_data *t_data) +{ + struct sdw_slave_runtime *s_rt = NULL; + struct sdw_port_runtime *p_rt; + int port_bo; + int sample_int; + unsigned int bps; + + port_bo = t_data->block_offset; + bps = m_rt->bus->params.col - 1; + sample_int = m_rt->bus->params.col; + + list_for_each_entry(s_rt, &m_rt->slave_rt_list, m_rt_node) { + list_for_each_entry(p_rt, &s_rt->port_list, port_node) { + sdw_fill_xport_params(&p_rt->transport_params, + p_rt->num, false, + SDW_BLK_GRP_CNT_1, + sample_int, port_bo, port_bo >> 8, + t_data->hstart, + t_data->hstop, + SDW_BLK_PKG_PER_PORT, 0x0); + + sdw_fill_port_params(&p_rt->port_params, + p_rt->num, bps, + SDW_PORT_FLOW_MODE_ISOCH, + SDW_PORT_DATA_MODE_NORMAL); + } + } +} + +static void sdw_compute_dp0_master_ports(struct sdw_master_runtime *m_rt) +{ + struct sdw_transport_data t_data = {0}; + struct sdw_port_runtime *p_rt; + struct sdw_bus *bus = m_rt->bus; + unsigned int bps; + int sample_int; + int port_bo; + int hstart; + int hstop; + + bps = bus->params.col - 1; + sample_int = bus->params.col; + hstart = 1; + hstop = bus->params.col - 1; + port_bo = 0; + + t_data.hstart = hstart; + t_data.hstop = hstop; + t_data.block_offset = port_bo; + t_data.sub_block_offset = 0; + + list_for_each_entry(p_rt, &m_rt->port_list, port_node) { + sdw_fill_xport_params(&p_rt->transport_params, p_rt->num, + false, SDW_BLK_GRP_CNT_1, sample_int, + port_bo, port_bo >> 8, hstart, hstop, + SDW_BLK_PKG_PER_PORT, 0x0); + + sdw_fill_port_params(&p_rt->port_params, + p_rt->num, bps, + SDW_PORT_FLOW_MODE_ISOCH, + SDW_PORT_DATA_MODE_NORMAL); + } + + sdw_compute_slave_dp0_ports(m_rt, &t_data); +} + +static void sdw_compute_dp0_port_params(struct sdw_bus *bus) +{ + struct sdw_master_runtime *m_rt; + + list_for_each_entry(m_rt, &bus->m_rt_list, bus_node) + sdw_compute_dp0_master_ports(m_rt); +} + static void sdw_compute_master_ports(struct sdw_master_runtime *m_rt, struct sdw_group_params *params, int port_bo, int hstop) @@ -392,8 +468,9 @@ static int sdw_compute_bus_params(struct sdw_bus *bus) * sdw_compute_params: Compute bus, transport and port parameters * * @bus: SDW Bus instance + * @bpt_stream: boolean to conditionally use dedicated bit allocation */ -int sdw_compute_params(struct sdw_bus *bus) +int sdw_compute_params(struct sdw_bus *bus, bool bpt_stream) { int ret;
@@ -402,6 +479,11 @@ int sdw_compute_params(struct sdw_bus *bus) if (ret < 0) return ret;
+ if (bpt_stream) { + sdw_compute_dp0_port_params(bus); + return 0; + } + /* Compute transport and port params */ ret = sdw_compute_port_params(bus); if (ret < 0) { diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c index 3c4d6debab1f..8dd3319e8e40 100644 --- a/drivers/soundwire/qcom.c +++ b/drivers/soundwire/qcom.c @@ -1060,7 +1060,7 @@ static const struct sdw_master_ops qcom_swrm_ops = { .pre_bank_switch = qcom_swrm_pre_bank_switch, };
-static int qcom_swrm_compute_params(struct sdw_bus *bus) +static int qcom_swrm_compute_params(struct sdw_bus *bus, bool bpt_stream) { struct qcom_swrm_ctrl *ctrl = to_qcom_sdw(bus); struct sdw_master_runtime *m_rt; diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c index c1e3bb897fb9..16ee58d4b7a2 100644 --- a/drivers/soundwire/stream.c +++ b/drivers/soundwire/stream.c @@ -1356,6 +1356,7 @@ static void sdw_release_bus_lock(struct sdw_stream_runtime *stream) static int _sdw_prepare_stream(struct sdw_stream_runtime *stream, bool update_params) { + bool bpt_stream = (stream->type == SDW_STREAM_BPT) ? true : false; struct sdw_master_runtime *m_rt; struct sdw_bus *bus; struct sdw_master_prop *prop; @@ -1382,7 +1383,7 @@ static int _sdw_prepare_stream(struct sdw_stream_runtime *stream,
/* Compute params */ if (bus->compute_params) { - ret = bus->compute_params(bus); + ret = bus->compute_params(bus, bpt_stream); if (ret < 0) { dev_err(bus->dev, "Compute params failed: %d\n", ret); @@ -1640,6 +1641,7 @@ EXPORT_SYMBOL(sdw_disable_stream);
static int _sdw_deprepare_stream(struct sdw_stream_runtime *stream) { + bool bpt_stream = (stream->type == SDW_STREAM_BPT) ? true : false; struct sdw_master_runtime *m_rt; struct sdw_bus *bus; int ret = 0; @@ -1660,7 +1662,7 @@ static int _sdw_deprepare_stream(struct sdw_stream_runtime *stream)
/* Compute params */ if (bus->compute_params) { - ret = bus->compute_params(bus); + ret = bus->compute_params(bus, bpt_stream); if (ret < 0) { dev_err(bus->dev, "Compute params failed: %d\n", ret); diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 9f04f3ad044f..1dda65d91bad 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -918,7 +918,7 @@ struct sdw_bus { struct lock_class_key bus_lock_key; struct mutex msg_lock; struct lock_class_key msg_lock_key; - int (*compute_params)(struct sdw_bus *bus); + int (*compute_params)(struct sdw_bus *bus, bool bpt_stream); const struct sdw_master_ops *ops; const struct sdw_master_port_ops *port_ops; struct sdw_bus_params params; @@ -1030,7 +1030,7 @@ struct sdw_stream_runtime *sdw_alloc_stream(const char *stream_name, enum sdw_stream_type type); void sdw_release_stream(struct sdw_stream_runtime *stream);
-int sdw_compute_params(struct sdw_bus *bus); +int sdw_compute_params(struct sdw_bus *bus, bool bpt_stream);
int sdw_stream_add_master(struct sdw_bus *bus, struct sdw_stream_config *stream_config,
DP0 (Data Port 0) is very similar to regular data ports, with minor tweaks we can reuse the same code.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/stream.c | 94 +++++++++++++++++++++++++++----------- 1 file changed, 68 insertions(+), 26 deletions(-)
diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c index 16ee58d4b7a2..f9762610f051 100644 --- a/drivers/soundwire/stream.c +++ b/drivers/soundwire/stream.c @@ -87,6 +87,10 @@ static int _sdw_program_slave_port_params(struct sdw_bus *bus, return ret; }
+ /* DP0 does not implement BlockCtrl3 */ + if (!t_params->port_num) + goto skip_block_ctrl3; + /* Program DPN_BlockCtrl3 register */ ret = sdw_write_no_pm(slave, addr2, t_params->blk_pkg_mode); if (ret < 0) { @@ -94,6 +98,7 @@ static int _sdw_program_slave_port_params(struct sdw_bus *bus, return ret; }
+skip_block_ctrl3: /* * Data ports are FULL, SIMPLE and REDUCED. This function handles * FULL and REDUCED only and beyond this point only FULL is @@ -130,18 +135,29 @@ static int sdw_program_slave_port_params(struct sdw_bus *bus, struct sdw_port_params *p_params = &p_rt->port_params; struct sdw_slave_prop *slave_prop = &s_rt->slave->prop; u32 addr1, addr2, addr3, addr4, addr5, addr6; - struct sdw_dpn_prop *dpn_prop; + enum sdw_dpn_type port_type; + bool read_only_wordlength; int ret; u8 wbuf;
if (s_rt->slave->is_mockup_device) return 0;
- dpn_prop = sdw_get_slave_dpn_prop(s_rt->slave, - s_rt->direction, - t_params->port_num); - if (!dpn_prop) - return -EINVAL; + if (t_params->port_num) { + struct sdw_dpn_prop *dpn_prop; + + dpn_prop = sdw_get_slave_dpn_prop(s_rt->slave, + s_rt->direction, + t_params->port_num); + if (!dpn_prop) + return -EINVAL; + + read_only_wordlength = dpn_prop->read_only_wordlength; + port_type = dpn_prop->type; + } else { + read_only_wordlength = false; + port_type = SDW_DPN_FULL; + }
addr1 = SDW_DPN_PORTCTRL(t_params->port_num); addr2 = SDW_DPN_BLOCKCTRL1(t_params->port_num); @@ -171,7 +187,7 @@ static int sdw_program_slave_port_params(struct sdw_bus *bus, return ret; }
- if (!dpn_prop->read_only_wordlength) { + if (!read_only_wordlength) { /* Program DPN_BlockCtrl1 register */ ret = sdw_write_no_pm(s_rt->slave, addr2, (p_params->bps - 1)); if (ret < 0) { @@ -223,9 +239,9 @@ static int sdw_program_slave_port_params(struct sdw_bus *bus, } }
- if (dpn_prop->type != SDW_DPN_SIMPLE) { + if (port_type != SDW_DPN_SIMPLE) { ret = _sdw_program_slave_port_params(bus, s_rt->slave, - t_params, dpn_prop->type); + t_params, port_type); if (ret < 0) dev_err(&s_rt->slave->dev, "Transport reg write failed for port: %d\n", @@ -432,6 +448,9 @@ static int sdw_prep_deprep_slave_ports(struct sdw_bus *bus, struct completion *port_ready; struct sdw_dpn_prop *dpn_prop; struct sdw_prepare_ch prep_ch; + u32 imp_def_interrupts; + bool simple_ch_prep_sm; + u32 ch_prep_timeout; bool intr = false; int ret = 0, val; u32 addr; @@ -439,20 +458,31 @@ static int sdw_prep_deprep_slave_ports(struct sdw_bus *bus, prep_ch.num = p_rt->num; prep_ch.ch_mask = p_rt->ch_mask;
- dpn_prop = sdw_get_slave_dpn_prop(s_rt->slave, - s_rt->direction, - prep_ch.num); - if (!dpn_prop) { - dev_err(bus->dev, - "Slave Port:%d properties not found\n", prep_ch.num); - return -EINVAL; + if (p_rt->num) { + dpn_prop = sdw_get_slave_dpn_prop(s_rt->slave, + s_rt->direction, + prep_ch.num); + if (!dpn_prop) { + dev_err(bus->dev, + "Slave Port:%d properties not found\n", prep_ch.num); + return -EINVAL; + } + + imp_def_interrupts = dpn_prop->imp_def_interrupts; + simple_ch_prep_sm = dpn_prop->simple_ch_prep_sm; + } else { + struct sdw_dp0_prop *dp0_prop = s_rt->slave->prop.dp0_prop; + + imp_def_interrupts = dp0_prop->imp_def_interrupts; + simple_ch_prep_sm = dp0_prop->simple_ch_prep_sm; + ch_prep_timeout = dp0_prop->ch_prep_timeout; }
prep_ch.prepare = prep;
prep_ch.bank = bus->params.next_bank;
- if (dpn_prop->imp_def_interrupts || !dpn_prop->simple_ch_prep_sm || + if (imp_def_interrupts || !simple_ch_prep_sm || bus->params.s_data_mode != SDW_PORT_DATA_MODE_NORMAL) intr = true;
@@ -463,7 +493,7 @@ static int sdw_prep_deprep_slave_ports(struct sdw_bus *bus, */ if (prep && intr) { ret = sdw_configure_dpn_intr(s_rt->slave, p_rt->num, prep, - dpn_prop->imp_def_interrupts); + imp_def_interrupts); if (ret < 0) return ret; } @@ -472,7 +502,7 @@ static int sdw_prep_deprep_slave_ports(struct sdw_bus *bus, sdw_do_port_prep(s_rt, prep_ch, prep ? SDW_OPS_PORT_PRE_PREP : SDW_OPS_PORT_PRE_DEPREP);
/* Prepare Slave port implementing CP_SM */ - if (!dpn_prop->simple_ch_prep_sm) { + if (!simple_ch_prep_sm) { addr = SDW_DPN_PREPARECTRL(p_rt->num);
if (prep) @@ -489,7 +519,7 @@ static int sdw_prep_deprep_slave_ports(struct sdw_bus *bus, /* Wait for completion on port ready */ port_ready = &s_rt->slave->port_ready[prep_ch.num]; wait_for_completion_timeout(port_ready, - msecs_to_jiffies(dpn_prop->ch_prep_timeout)); + msecs_to_jiffies(ch_prep_timeout));
val = sdw_read_no_pm(s_rt->slave, SDW_DPN_PREPARESTATUS(p_rt->num)); if ((val < 0) || (val & p_rt->ch_mask)) { @@ -506,7 +536,7 @@ static int sdw_prep_deprep_slave_ports(struct sdw_bus *bus, /* Disable interrupt after Port de-prepare */ if (!prep && intr) ret = sdw_configure_dpn_intr(s_rt->slave, p_rt->num, prep, - dpn_prop->imp_def_interrupts); + imp_def_interrupts);
return ret; } @@ -970,7 +1000,8 @@ static int sdw_slave_port_is_valid_range(struct device *dev, int num)
static int sdw_slave_port_config(struct sdw_slave *slave, struct sdw_slave_runtime *s_rt, - const struct sdw_port_config *port_config) + const struct sdw_port_config *port_config, + bool is_bpt_stream) { struct sdw_port_runtime *p_rt; int ret; @@ -982,9 +1013,14 @@ static int sdw_slave_port_config(struct sdw_slave *slave, * TODO: Check valid port range as defined by DisCo/ * slave */ - ret = sdw_slave_port_is_valid_range(&slave->dev, port_config[i].num); - if (ret < 0) - return ret; + if (!is_bpt_stream) { + ret = sdw_slave_port_is_valid_range(&slave->dev, port_config[i].num); + if (ret < 0) + return ret; + } else { + if (port_config[i].num) + return -EINVAL; + }
ret = sdw_port_config(p_rt, port_config, i); if (ret < 0) @@ -1293,6 +1329,11 @@ struct sdw_dpn_prop *sdw_get_slave_dpn_prop(struct sdw_slave *slave, u8 num_ports; int i;
+ if (!port_num) { + dev_err(&slave->dev, "%s: port_num is zero\n", __func__); + return NULL; + } + if (direction == SDW_DATA_DIR_TX) { num_ports = hweight32(slave->prop.source_ports); dpn_prop = slave->prop.src_dpn_prop; @@ -2056,7 +2097,8 @@ int sdw_stream_add_slave(struct sdw_slave *slave, if (ret) goto unlock;
- ret = sdw_slave_port_config(slave, s_rt, port_config); + ret = sdw_slave_port_config(slave, s_rt, port_config, + stream->type == SDW_STREAM_BPT ? true : false); if (ret) goto unlock;
On Thu, Dec 07, 2023 at 04:29:34PM -0600, Pierre-Louis Bossart wrote:
DP0 (Data Port 0) is very similar to regular data ports, with minor tweaks we can reuse the same code.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
- dpn_prop = sdw_get_slave_dpn_prop(s_rt->slave,
s_rt->direction,
t_params->port_num);
- if (!dpn_prop)
return -EINVAL;
- if (t_params->port_num) {
struct sdw_dpn_prop *dpn_prop;
dpn_prop = sdw_get_slave_dpn_prop(s_rt->slave,
s_rt->direction,
t_params->port_num);
if (!dpn_prop)
return -EINVAL;
read_only_wordlength = dpn_prop->read_only_wordlength;
port_type = dpn_prop->type;
- } else {
read_only_wordlength = false;
port_type = SDW_DPN_FULL;
- }
Would it be nicer to just add a special case sdw_get_slave_dpn_prop to return dp0_prop and avoid all this special casing in the rest of the code?
Thanks, Charles
On 12/12/23 06:30, Charles Keepax wrote:
On Thu, Dec 07, 2023 at 04:29:34PM -0600, Pierre-Louis Bossart wrote:
DP0 (Data Port 0) is very similar to regular data ports, with minor tweaks we can reuse the same code.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
- dpn_prop = sdw_get_slave_dpn_prop(s_rt->slave,
s_rt->direction,
t_params->port_num);
- if (!dpn_prop)
return -EINVAL;
- if (t_params->port_num) {
struct sdw_dpn_prop *dpn_prop;
dpn_prop = sdw_get_slave_dpn_prop(s_rt->slave,
s_rt->direction,
t_params->port_num);
if (!dpn_prop)
return -EINVAL;
read_only_wordlength = dpn_prop->read_only_wordlength;
port_type = dpn_prop->type;
- } else {
read_only_wordlength = false;
port_type = SDW_DPN_FULL;
- }
Would it be nicer to just add a special case sdw_get_slave_dpn_prop to return dp0_prop and avoid all this special casing in the rest of the code?
There are multiple cases in the same function where we need to check for DP0. We could move the read_only_wordlength and port_type to the sdw_get_slave_dpn_prop() helper, but then we would spread the special cases in multiple locations. It's not pretty either way.
Add definitions and helpers for the BPT/BRA protocol. Peripheral drivers (aka ASoC codec drivers) can use this API to send bulk data such as firmware or tables.
The API is only available when no other audio streams have been allocated, and only one BTP/BRA stream is allowed per link. To avoid the addition of yet another lock, the refcount tests are handled in the stream master_runtime alloc/free routines where the bus_lock is already held. Another benefit of this approach is that the same bus_lock is used to handle runtime and port linked lists, which reduces the potential for misaligned configurations.
In addition to exclusion with audio streams, BPT transfers have a lot of overhead, specifically registers writes are needed to enable transport in DP0. In addition, most DMAs don't handle too well very small data sets.
This patch suggests a minimum bound of 64 bytes, for smaller transfers codec drivers should rely on the regular read/write commands in Column0.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/bus.c | 77 +++++++++++++++++++++++++++++++++++ drivers/soundwire/bus.h | 18 ++++++++ drivers/soundwire/stream.c | 30 ++++++++++++++ include/linux/soundwire/sdw.h | 76 ++++++++++++++++++++++++++++++++++ 4 files changed, 201 insertions(+)
diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index f3fec15c3112..e5758d2ed88f 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -2015,3 +2015,80 @@ void sdw_clear_slave_status(struct sdw_bus *bus, u32 request) } } EXPORT_SYMBOL(sdw_clear_slave_status); + +int sdw_bpt_close_stream(struct sdw_bus *bus, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg) +{ + int ret; + + ret = bus->ops->bpt_close_stream(bus, slave, mode, msg); + if (ret < 0) + dev_err(bus->dev, "BPT stream close, err %d\n", ret); + + return ret; +} +EXPORT_SYMBOL(sdw_bpt_close_stream); + +int sdw_bpt_open_stream(struct sdw_bus *bus, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg) +{ + int ret; + + /* only Bulk Register Access (BRA) is supported for now */ + if (mode != SDW_BRA) + return -EINVAL; + + if (msg->len < SDW_BPT_MSG_MIN_BYTES) { + dev_err(bus->dev, "BPT message length %d, min supported %d\n", + msg->len, SDW_BPT_MSG_MIN_BYTES); + return -EINVAL; + } + + if (msg->len % SDW_BPT_MSG_BYTE_ALIGNMENT) { + dev_err(bus->dev, "BPT message length %d is not a multiple of %d bytes\n", + msg->len, SDW_BPT_MSG_BYTE_ALIGNMENT); + return -EINVAL; + } + + /* check device is enumerated */ + if (slave->dev_num == SDW_ENUM_DEV_NUM || + slave->dev_num > SDW_MAX_DEVICES) + return -ENODEV; + + /* make sure all callbacks are defined */ + if (!bus->ops->bpt_open_stream || + !bus->ops->bpt_close_stream || + !bus->ops->bpt_send_async || + !bus->ops->bpt_wait) + return -ENOTSUPP; + + ret = bus->ops->bpt_open_stream(bus, slave, mode, msg); + if (ret < 0) + dev_err(bus->dev, "BPT stream open, err %d\n", ret); + + return ret; +} +EXPORT_SYMBOL(sdw_bpt_open_stream); + +int sdw_bpt_send_async(struct sdw_bus *bus, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg) +{ + if (msg->len > SDW_BPT_MSG_MAX_BYTES) + return -EINVAL; + + return bus->ops->bpt_send_async(bus, slave, msg); +} +EXPORT_SYMBOL(sdw_bpt_send_async); + +int sdw_bpt_wait(struct sdw_bus *bus, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg) +{ + return bus->ops->bpt_wait(bus, slave, msg); +} +EXPORT_SYMBOL(sdw_bpt_wait); diff --git a/drivers/soundwire/bus.h b/drivers/soundwire/bus.h index fda6b24ac2da..936852ab3154 100644 --- a/drivers/soundwire/bus.h +++ b/drivers/soundwire/bus.h @@ -72,6 +72,24 @@ struct sdw_msg { bool page; };
+/** + * struct sdw_btp_msg - Message structure + * @addr: Start Register address accessed in the Slave + * @len: number of bytes to transfer. More than 64Kb can be transferred + * but a practical limit of SDW_BPT_MSG_MAX_BYTES is enforced. + * @dev_num: Slave device number + * @flags: transfer flags, indicate if xfer is read or write + * @buf: message data buffer (filled by host for write, filled + * by Peripheral hardware for reads) + */ +struct sdw_bpt_msg { + u32 addr; + u32 len; + u8 dev_num; + u8 flags; + u8 *buf; +}; + #define SDW_DOUBLE_RATE_FACTOR 2 #define SDW_STRM_RATE_GROUPING 1
diff --git a/drivers/soundwire/stream.c b/drivers/soundwire/stream.c index f9762610f051..b097b1a808f9 100644 --- a/drivers/soundwire/stream.c +++ b/drivers/soundwire/stream.c @@ -1189,6 +1189,20 @@ static struct sdw_master_runtime struct sdw_master_runtime *m_rt, *walk_m_rt; struct list_head *insert_after;
+ if (stream->type == SDW_STREAM_BPT) { + if (bus->stream_refcount > 0 || bus->bpt_stream_refcount > 0) { + dev_err(bus->dev, "%s: invalid configuration %d streams %d BPT streams\n", + __func__, bus->stream_refcount, bus->bpt_stream_refcount); + return ERR_PTR(-EBUSY); + } + } else { + if (bus->bpt_stream_refcount > 0) { + dev_err(bus->dev, "%s: BTP stream already allocated\n", + __func__); + return ERR_PTR(-EAGAIN); + } + } + m_rt = kzalloc(sizeof(*m_rt), GFP_KERNEL); if (!m_rt) return NULL; @@ -1217,6 +1231,8 @@ static struct sdw_master_runtime m_rt->stream = stream;
bus->stream_refcount++; + if (stream->type == SDW_STREAM_BPT) + bus->bpt_stream_refcount++;
return m_rt; } @@ -1265,6 +1281,8 @@ static void sdw_master_rt_free(struct sdw_master_runtime *m_rt, list_del(&m_rt->bus_node); kfree(m_rt);
+ if (stream->type == SDW_STREAM_BPT) + bus->bpt_stream_refcount--; bus->stream_refcount--; }
@@ -1941,6 +1959,12 @@ int sdw_stream_add_master(struct sdw_bus *bus, m_rt = sdw_master_rt_find(bus, stream); if (!m_rt) { m_rt = sdw_master_rt_alloc(bus, stream); + if (IS_ERR(m_rt)) { + ret = PTR_ERR(m_rt); + dev_err(bus->dev, "%s: Master runtime alloc failed for stream:%s: %d\n", + __func__, stream->name, ret); + goto unlock; + } if (!m_rt) { dev_err(bus->dev, "%s: Master runtime alloc failed for stream:%s\n", __func__, stream->name); @@ -2056,6 +2080,12 @@ int sdw_stream_add_slave(struct sdw_slave *slave, * So, allocate m_rt and add Slave to it. */ m_rt = sdw_master_rt_alloc(slave->bus, stream); + if (IS_ERR(m_rt)) { + ret = PTR_ERR(m_rt); + dev_err(&slave->dev, "%s: Master runtime alloc failed for stream:%s: %d\n", + __func__, stream->name, ret); + goto unlock; + } if (!m_rt) { dev_err(&slave->dev, "%s: Master runtime alloc failed for stream:%s\n", __func__, stream->name); diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 1dda65d91bad..e54c5bbd2b91 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -835,6 +835,46 @@ struct sdw_defer { struct sdw_msg *msg; };
+/* + * Add a practical limit to BPT transfer sizes. BPT is typically used + * to transfer firmware, and larger firmware transfers will increase + * the cold latency beyond typical OS or user requirements. + */ +#define SDW_BPT_MSG_MAX_BYTES (1024 * 1024) + +/* + * Add a minimum number of bytes for BPT transfer sizes. BPT has a lot of + * overhead, enabling/disabling a stream costs 6 write commands, plus the bank + * switch that could be delayed in time. In addition, transferring very small + * data sets over DMA is known to be problematic on multiple platforms. + */ +#define SDW_BPT_MSG_MIN_BYTES 128 + +/* + * Add an alignment requirement for BPT transfer sizes. BPT is + * typically with DMAs which are most-efficient with data aligned to a + * cache line. For example HDaudio DMAs strongly recommend to use + * multiples of 128 bytes. + * This could be a platform-specific value, but that would be problematic for + * codec drivers which are typically not platform-specific by definition. + * It's probably simpler for everyone to align on a baseline and use regular + * read/write commands for the remaining bytes. + */ +#define SDW_BPT_MSG_BYTE_ALIGNMENT 128 + +/** + * sdw_bpt_type - SoundWire Bulk Payload Transfer (BPT) type + * @SDW_BRA: Soundwire Bulk Register Access (BRA) + * + * The SoundWire specification allows for implementation-defined + * solutions, they would be added after BRA. + */ +enum sdw_bpt_type { + SDW_BRA, +}; + +struct sdw_bpt_msg; + /** * struct sdw_master_ops - Master driver ops * @read_prop: Read Master properties @@ -850,6 +890,10 @@ struct sdw_defer { * @get_device_num: Callback for vendor-specific device_number allocation * @put_device_num: Callback for vendor-specific device_number release * @new_peripheral_assigned: Callback to handle enumeration of new peripheral. + * @bpt_open_stream: reserve resources for BPT stream + * @bpt_close_stream: release resources for BPT stream + * @bpt_send_async: send message using BTP protocol + * @bpt_wait: wait for message completion using BTP protocol */ struct sdw_master_ops { int (*read_prop)(struct sdw_bus *bus); @@ -869,6 +913,20 @@ struct sdw_master_ops { void (*new_peripheral_assigned)(struct sdw_bus *bus, struct sdw_slave *slave, int dev_num); + int (*bpt_open_stream)(struct sdw_bus *bus, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg); + int (*bpt_close_stream)(struct sdw_bus *bus, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg); + int (*bpt_send_async)(struct sdw_bus *bus, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg); + int (*bpt_wait)(struct sdw_bus *bus, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg); };
/** @@ -905,6 +963,8 @@ struct sdw_master_ops { * synchronization will be used even if a stream only uses a single * SoundWire segment. * @stream_refcount: number of streams currently using this bus + * @btp_stream_refcount: number of BTP streams currently using this bus (should + * be zero or one, multiple streams per link is not supported). */ struct sdw_bus { struct device *dev; @@ -935,6 +995,7 @@ struct sdw_bus { bool multi_link; int hw_sync_min_links; int stream_refcount; + int bpt_stream_refcount; };
int sdw_bus_master_add(struct sdw_bus *bus, struct device *parent, @@ -1052,6 +1113,21 @@ int sdw_bus_exit_clk_stop(struct sdw_bus *bus); int sdw_compare_devid(struct sdw_slave *slave, struct sdw_slave_id id); void sdw_extract_slave_id(struct sdw_bus *bus, u64 addr, struct sdw_slave_id *id);
+int sdw_bpt_open_stream(struct sdw_bus *bus, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg); +int sdw_bpt_close_stream(struct sdw_bus *bus, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg); +int sdw_bpt_send_async(struct sdw_bus *bus, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg); +int sdw_bpt_wait(struct sdw_bus *bus, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg); + #if IS_ENABLED(CONFIG_SOUNDWIRE)
int sdw_stream_add_slave(struct sdw_slave *slave,
On Thu, Dec 07, 2023 at 04:29:35PM -0600, Pierre-Louis Bossart wrote:
This patch suggests a minimum bound of 64 bytes, for smaller transfers
128 in the code.
+int sdw_bpt_close_stream(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg)
+{
Seems weird to pass the message to close?
+int sdw_bpt_open_stream(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg)
Ditto, here does it make sense to pass the msg to open? I guess the idea is that one only sends a single message in one open->send->wait->close cycle? Would be much nicer if multiple send/waits could be done in a single open/close, or if the limitation is unavoidable why split out open/send into separate calls at all? Just have send and wait and wrap open/close into them.
- if (msg->len % SDW_BPT_MSG_BYTE_ALIGNMENT) {
dev_err(bus->dev, "BPT message length %d is not a multiple of %d bytes\n",
msg->len, SDW_BPT_MSG_BYTE_ALIGNMENT);
return -EINVAL;
- }
Should this really be here? My understanding is this alignment is a limitation of specific hardware so should this check not be in the Cadence master code.
+int sdw_bpt_send_async(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg)
+{
- if (msg->len > SDW_BPT_MSG_MAX_BYTES)
return -EINVAL;
Does this check make sense since this was already checked in open? I guess the user could pass in a different message at this stage, but that I guess is part of what feels weird about passing the message into open.
+/**
- struct sdw_btp_msg - Message structure
- @addr: Start Register address accessed in the Slave
- @len: number of bytes to transfer. More than 64Kb can be transferred
- but a practical limit of SDW_BPT_MSG_MAX_BYTES is enforced.
Where is the 64kb coming from here?
+/*
- Add a minimum number of bytes for BPT transfer sizes. BPT has a lot of
- overhead, enabling/disabling a stream costs 6 write commands, plus the bank
- switch that could be delayed in time. In addition, transferring very small
- data sets over DMA is known to be problematic on multiple platforms.
- */
+#define SDW_BPT_MSG_MIN_BYTES 128
Is it really necessary for the core to enforce a minimum transfer size (well except for the required alignment thing)? Yes small transfers don't obviously make sense, but there is nothing inherently wrong with doing one, which makes me feel it is excessive for the core to block more than it has to here.
I would be of the opinion its up to driver writers if they have some reason to do small BRA transfers. Firstly, since we are so keen on allowing BRA and normal writes to overlap, one could see use-cases when you want to do something through BRA such that it doesn't block the normal command stream. Also there is already 1 feature on cs42l43 that can only be accessed through BRA, yes that is a heroically questionable hardware design choice. Whilst we are mostly ignoring that for now, I can see this being something some other hardware teams decide to heroically do at some point as well.
Thanks, Charles
On 12/12/23 06:19, Charles Keepax wrote:
On Thu, Dec 07, 2023 at 04:29:35PM -0600, Pierre-Louis Bossart wrote:
This patch suggests a minimum bound of 64 bytes, for smaller transfers
128 in the code.
indeed, that's a left-over from an earlier trial. Will fix.
+int sdw_bpt_close_stream(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg)
+{
Seems weird to pass the message to close?
It's not necessary in my current solution, but I opted to keep the arguments aligned.
+int sdw_bpt_open_stream(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg)
Ditto, here does it make sense to pass the msg to open? I guess the idea is that one only sends a single message in one open->send->wait->close cycle? Would be much nicer if multiple send/waits could be done in a single open/close, or if the limitation is unavoidable why split out open/send into separate calls at all? Just have send and wait and wrap open/close into them.
it's needed for the Intel solution, the open() will allocate the required DMA buffers and copy the data from the messsage into the DMA buffers with the required formatting expected by the IP.
An alternative would be to simplify open/close but then we have to add a hw_params/prepare and hw_free steps. I didn't see a need for such a split, unlike for audio the arguments are not going to be variable.
But yes it's a good question, what exactly is an open/startup callback supposed to do in ALSA/ASoC?
- if (msg->len % SDW_BPT_MSG_BYTE_ALIGNMENT) {
dev_err(bus->dev, "BPT message length %d is not a multiple of %d bytes\n",
msg->len, SDW_BPT_MSG_BYTE_ALIGNMENT);
return -EINVAL;
- }
Should this really be here? My understanding is this alignment is a limitation of specific hardware so should this check not be in the Cadence master code.
As discussed earlier, we *could* move this to host-specific parts, but then the codec driver would need to know about host alignment requirements. One of those 'be careful what you ask for' things where you may have more work to do on the codec driver side...
+int sdw_bpt_send_async(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg)
+{
- if (msg->len > SDW_BPT_MSG_MAX_BYTES)
return -EINVAL;
Does this check make sense since this was already checked in open? I guess the user could pass in a different message at this stage, but that I guess is part of what feels weird about passing the message into open.
The error check is a big open, we could assume that all the previous steps were done the right way and skip tests, or we would add a set of paranoia checks.
The primary intent of those checks is to help the integration of codec drivers, it's better to get an error code and a clear error message than a kernel oops because the expected sequence is not followed. Once the integration is done, those tests are probably not very useful indeed.
+/**
- struct sdw_btp_msg - Message structure
- @addr: Start Register address accessed in the Slave
- @len: number of bytes to transfer. More than 64Kb can be transferred
- but a practical limit of SDW_BPT_MSG_MAX_BYTES is enforced.
Where is the 64kb coming from here?
Ah, this is a reference to the 16 bit address limitation in the regular read/write commands.
+/*
- Add a minimum number of bytes for BPT transfer sizes. BPT has a lot of
- overhead, enabling/disabling a stream costs 6 write commands, plus the bank
- switch that could be delayed in time. In addition, transferring very small
- data sets over DMA is known to be problematic on multiple platforms.
- */
+#define SDW_BPT_MSG_MIN_BYTES 128
Is it really necessary for the core to enforce a minimum transfer size (well except for the required alignment thing)? Yes small transfers don't obviously make sense, but there is nothing inherently wrong with doing one, which makes me feel it is excessive for the core to block more than it has to here.
I think it's really better to have a clear design intent that BRA is not meant for small transfers. It would be opening a can of worms if we start seeing uses of this protocol beyond the intended firmware/table download and history buffer retrieval.
I would be of the opinion its up to driver writers if they have some reason to do small BRA transfers. Firstly, since we are so keen on allowing BRA and normal writes to overlap, one could see use-cases when you want to do something through BRA such that it doesn't block the normal command stream. Also there is already 1 feature on cs42l43 that can only be accessed through BRA, yes that is a heroically questionable hardware design choice. Whilst we are mostly ignoring that for now, I can see this being something some other hardware teams decide to heroically do at some point as well.
To be clear, I would like to allow BRA in parallel with normal writes *to a different set of registers* to deal with alerts, etc.
A set of registers only accessible through BRA is a very questionable design indeed. That's not what the SoundWire spec intended and it's not what this patchset had in mind. You'll have to tell us more on what exactly the hardware is and does, and how strongly you want this capability supported...
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add definitions and helpers for the BPT/BRA protocol. Peripheral drivers (aka ASoC codec drivers) can use this API to send bulk data such as firmware or tables.
The API is only available when no other audio streams have been allocated, and only one BTP/BRA stream is allowed per link. To avoid the addition of yet another lock, the refcount tests are handled in the stream master_runtime alloc/free routines where the bus_lock is already held. Another benefit of this approach is that the same bus_lock is used to handle runtime and port linked lists, which reduces the potential for misaligned configurations.
In addition to exclusion with audio streams, BPT transfers have a lot of overhead, specifically registers writes are needed to enable transport in DP0. In addition, most DMAs don't handle too well very small data sets.
This patch suggests a minimum bound of 64 bytes, for smaller transfers codec drivers should rely on the regular read/write commands in Column0.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
drivers/soundwire/bus.c | 77 +++++++++++++++++++++++++++++++++++ drivers/soundwire/bus.h | 18 ++++++++ drivers/soundwire/stream.c | 30 ++++++++++++++ include/linux/soundwire/sdw.h | 76 ++++++++++++++++++++++++++++++++++ 4 files changed, 201 insertions(+)
diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index f3fec15c3112..e5758d2ed88f 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -2015,3 +2015,80 @@ void sdw_clear_slave_status(struct sdw_bus *bus, u32 request) } } EXPORT_SYMBOL(sdw_clear_slave_status);
+int sdw_bpt_close_stream(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg)
+{
- int ret;
- ret = bus->ops->bpt_close_stream(bus, slave, mode, msg);
- if (ret < 0)
dev_err(bus->dev, "BPT stream close, err %d\n", ret);
- return ret;
+} +EXPORT_SYMBOL(sdw_bpt_close_stream);
+int sdw_bpt_open_stream(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg)
+{
- int ret;
- /* only Bulk Register Access (BRA) is supported for now */
- if (mode != SDW_BRA)
return -EINVAL;
- if (msg->len < SDW_BPT_MSG_MIN_BYTES) {
dev_err(bus->dev, "BPT message length %d, min supported %d\n",
msg->len, SDW_BPT_MSG_MIN_BYTES);
return -EINVAL;
- }
- if (msg->len % SDW_BPT_MSG_BYTE_ALIGNMENT) {
dev_err(bus->dev, "BPT message length %d is not a multiple of %d bytes\n",
msg->len, SDW_BPT_MSG_BYTE_ALIGNMENT);
return -EINVAL;
- }
Is this a protocol requirement?
- /* check device is enumerated */
- if (slave->dev_num == SDW_ENUM_DEV_NUM ||
slave->dev_num > SDW_MAX_DEVICES)
return -ENODEV;
- /* make sure all callbacks are defined */
- if (!bus->ops->bpt_open_stream ||
!bus->ops->bpt_close_stream ||
!bus->ops->bpt_send_async ||
!bus->ops->bpt_wait)
return -ENOTSUPP;
should this not be checked at probe time, if device declares the support
- ret = bus->ops->bpt_open_stream(bus, slave, mode, msg);
- if (ret < 0)
dev_err(bus->dev, "BPT stream open, err %d\n", ret);
- return ret;
+} +EXPORT_SYMBOL(sdw_bpt_open_stream);
can we open multiple times (i dont see a check preventing that), how do we close..?
Re-iterating my comment on documentation patch, can we do with a async api and wait api, that makes symantics a lot simpler, right..?
+int sdw_bpt_send_async(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg)
+{
- if (msg->len > SDW_BPT_MSG_MAX_BYTES)
return -EINVAL;
- return bus->ops->bpt_send_async(bus, slave, msg);
+} +EXPORT_SYMBOL(sdw_bpt_send_async);
Can we call this multiple times after open, it is unclear to me. Can you please add kernel-doc comments about the APIs here as well
struct sdw_master_ops { int (*read_prop)(struct sdw_bus *bus); @@ -869,6 +913,20 @@ struct sdw_master_ops { void (*new_peripheral_assigned)(struct sdw_bus *bus, struct sdw_slave *slave, int dev_num);
- int (*bpt_open_stream)(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg);
- int (*bpt_close_stream)(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg);
- int (*bpt_send_async)(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg);
- int (*bpt_wait)(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg);
do we need both bus and slave, that was a mistake in orignal design IMO. We should fix that for bpt_ apis
+int sdw_bpt_open_stream(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg)
+{
- int ret;
- /* only Bulk Register Access (BRA) is supported for now */
- if (mode != SDW_BRA)
return -EINVAL;
- if (msg->len < SDW_BPT_MSG_MIN_BYTES) {
dev_err(bus->dev, "BPT message length %d, min supported %d\n",
msg->len, SDW_BPT_MSG_MIN_BYTES);
return -EINVAL;
- }
- if (msg->len % SDW_BPT_MSG_BYTE_ALIGNMENT) {
dev_err(bus->dev, "BPT message length %d is not a multiple of %d bytes\n",
msg->len, SDW_BPT_MSG_BYTE_ALIGNMENT);
return -EINVAL;
- }
Is this a protocol requirement?
No, it's an implementation requirement.
We could move this to host-specific parts but then the codec drivers will have to know about alignment requirements for each host they are use with. IOW, it's more work for codec drivers if we don't have a minimum bar for alignment requirement across all platforms.
- /* check device is enumerated */
- if (slave->dev_num == SDW_ENUM_DEV_NUM ||
slave->dev_num > SDW_MAX_DEVICES)
return -ENODEV;
- /* make sure all callbacks are defined */
- if (!bus->ops->bpt_open_stream ||
!bus->ops->bpt_close_stream ||
!bus->ops->bpt_send_async ||
!bus->ops->bpt_wait)
return -ENOTSUPP;
should this not be checked at probe time, if device declares the support
sdw_bpt_open_stream() would be called by the peripheral driver (or regmap as a proxy). The peripheral driver could also decide to check for those callback during its probe, but that's beyond the scope of this patchset.
These checks are just there for paranoia, in case a peripheral driver uses BTP/BRA on a host where they are not supported.
It's not science-fiction, we see AMD- and INTEL-based platforms using the same SoundWire-based codecs.
- ret = bus->ops->bpt_open_stream(bus, slave, mode, msg);
- if (ret < 0)
dev_err(bus->dev, "BPT stream open, err %d\n", ret);
- return ret;
+} +EXPORT_SYMBOL(sdw_bpt_open_stream);
can we open multiple times (i dont see a check preventing that), how do we close..?
there's a refcount preventing multiples BTP streams from being opened.
Re-iterating my comment on documentation patch, can we do with a async api and wait api, that makes symantics a lot simpler, right..?
see reply in previous email, combining open+send is weird IMHO.
+int sdw_bpt_send_async(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg)
+{
- if (msg->len > SDW_BPT_MSG_MAX_BYTES)
return -EINVAL;
- return bus->ops->bpt_send_async(bus, slave, msg);
+} +EXPORT_SYMBOL(sdw_bpt_send_async);
Can we call this multiple times after open, it is unclear to me. Can you please add kernel-doc comments about the APIs here as well
This can be called multiple times but it's useless: all the buffers are prepared in the open() stage. This is the moral equivalent of a trigger step, just enable data transfers.
struct sdw_master_ops { int (*read_prop)(struct sdw_bus *bus); @@ -869,6 +913,20 @@ struct sdw_master_ops { void (*new_peripheral_assigned)(struct sdw_bus *bus, struct sdw_slave *slave, int dev_num);
- int (*bpt_open_stream)(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg);
- int (*bpt_close_stream)(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg);
- int (*bpt_send_async)(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg);
- int (*bpt_wait)(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg);
do we need both bus and slave, that was a mistake in orignal design IMO. We should fix that for bpt_ apis
No disagreement. All the routines follow the same template, if we change one we should also change the others.
The main question as discussed with Charles is whether we want to pass the 'msg' argument in all routines.
On Mon, Dec 18, 2023 at 02:12:36PM +0100, Pierre-Louis Bossart wrote:
Is this a protocol requirement?
No, it's an implementation requirement.
We could move this to host-specific parts but then the codec drivers will have to know about alignment requirements for each host they are use with. IOW, it's more work for codec drivers if we don't have a minimum bar for alignment requirement across all platforms.
I do certainly see that side of the argument and it does probably warrant some thought as to how a slave might learn the alignment requirements. I guess maybe some sort of core helper function to return the alignment? Or putting it in properties the slave can access? One could even keep the check here, but just pull the value from something system specific.
The danger with putting it in the core is IMHO:
a) It rules out certain use-cases, generally I think its a bad idea if the framework design prohibits stuff the underlying bus could do because someone will, at some point, want to do it.
b) The core limit could get a bit out of hand once more controllers are added. The core limit needs to be a multiple of all the controller limits, if a controller comes along with a weird alignment requirement, that gets problematic fast.
Thanks, Charles
On 12/18/23 08:57, Charles Keepax wrote:
On Mon, Dec 18, 2023 at 02:12:36PM +0100, Pierre-Louis Bossart wrote:
Is this a protocol requirement?
No, it's an implementation requirement.
We could move this to host-specific parts but then the codec drivers will have to know about alignment requirements for each host they are use with. IOW, it's more work for codec drivers if we don't have a minimum bar for alignment requirement across all platforms.
I do certainly see that side of the argument and it does probably warrant some thought as to how a slave might learn the alignment requirements. I guess maybe some sort of core helper function to return the alignment? Or putting it in properties the slave can access? One could even keep the check here, but just pull the value from something system specific.
The danger with putting it in the core is IMHO:
a) It rules out certain use-cases, generally I think its a bad idea if the framework design prohibits stuff the underlying bus could do because someone will, at some point, want to do it.
SoundWire has lots of fancy and borderline nebulous concepts, my take is "let's do few things and do them well". We can always revisit new usages later, for now my main objective is "speed up downloads".
b) The core limit could get a bit out of hand once more controllers are added. The core limit needs to be a multiple of all the controller limits, if a controller comes along with a weird alignment requirement, that gets problematic fast.
I don't have any information on other controllers, but I wouldn't be surprised if most of the quirks are due to peripheral limitations and ambiguous interpretations of what 'ACK' means. I tried writing to reserved parts of the memory or non-existent registers and nothing bad was reported...
On 18-12-23, 14:12, Pierre-Louis Bossart wrote:
+int sdw_bpt_open_stream(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg)
+{
- int ret;
- /* only Bulk Register Access (BRA) is supported for now */
- if (mode != SDW_BRA)
return -EINVAL;
- if (msg->len < SDW_BPT_MSG_MIN_BYTES) {
dev_err(bus->dev, "BPT message length %d, min supported %d\n",
msg->len, SDW_BPT_MSG_MIN_BYTES);
return -EINVAL;
- }
- if (msg->len % SDW_BPT_MSG_BYTE_ALIGNMENT) {
dev_err(bus->dev, "BPT message length %d is not a multiple of %d bytes\n",
msg->len, SDW_BPT_MSG_BYTE_ALIGNMENT);
return -EINVAL;
- }
Is this a protocol requirement?
No, it's an implementation requirement.
We could move this to host-specific parts but then the codec drivers will have to know about alignment requirements for each host they are use with. IOW, it's more work for codec drivers if we don't have a minimum bar for alignment requirement across all platforms.
- /* check device is enumerated */
- if (slave->dev_num == SDW_ENUM_DEV_NUM ||
slave->dev_num > SDW_MAX_DEVICES)
return -ENODEV;
- /* make sure all callbacks are defined */
- if (!bus->ops->bpt_open_stream ||
!bus->ops->bpt_close_stream ||
!bus->ops->bpt_send_async ||
!bus->ops->bpt_wait)
return -ENOTSUPP;
should this not be checked at probe time, if device declares the support
sdw_bpt_open_stream() would be called by the peripheral driver (or regmap as a proxy). The peripheral driver could also decide to check for those callback during its probe, but that's beyond the scope of this patchset.
I would think that it is better to have capablities registered by the driver and those are checked at registration, so we know if bpt is supported or not for a particular platform.
This make more sense to me as some driver, depending on the SoC may or maynot support this, so easy way would be to turn off caps, what do you think?
These checks are just there for paranoia, in case a peripheral driver uses BTP/BRA on a host where they are not supported.
It's not science-fiction, we see AMD- and INTEL-based platforms using the same SoundWire-based codecs.
Ofcourse, it is entrely reasonable thing to do, event across x86/arm64
- ret = bus->ops->bpt_open_stream(bus, slave, mode, msg);
- if (ret < 0)
dev_err(bus->dev, "BPT stream open, err %d\n", ret);
- return ret;
+} +EXPORT_SYMBOL(sdw_bpt_open_stream);
can we open multiple times (i dont see a check preventing that), how do we close..?
there's a refcount preventing multiples BTP streams from being opened.
Re-iterating my comment on documentation patch, can we do with a async api and wait api, that makes symantics a lot simpler, right..?
see reply in previous email, combining open+send is weird IMHO.
+int sdw_bpt_send_async(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg)
+{
- if (msg->len > SDW_BPT_MSG_MAX_BYTES)
return -EINVAL;
- return bus->ops->bpt_send_async(bus, slave, msg);
+} +EXPORT_SYMBOL(sdw_bpt_send_async);
Can we call this multiple times after open, it is unclear to me. Can you please add kernel-doc comments about the APIs here as well
This can be called multiple times but it's useless: all the buffers are prepared in the open() stage. This is the moral equivalent of a trigger step, just enable data transfers.
struct sdw_master_ops { int (*read_prop)(struct sdw_bus *bus); @@ -869,6 +913,20 @@ struct sdw_master_ops { void (*new_peripheral_assigned)(struct sdw_bus *bus, struct sdw_slave *slave, int dev_num);
- int (*bpt_open_stream)(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg);
- int (*bpt_close_stream)(struct sdw_bus *bus,
struct sdw_slave *slave,
enum sdw_bpt_type mode,
struct sdw_bpt_msg *msg);
- int (*bpt_send_async)(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg);
- int (*bpt_wait)(struct sdw_bus *bus,
struct sdw_slave *slave,
struct sdw_bpt_msg *msg);
do we need both bus and slave, that was a mistake in orignal design IMO. We should fix that for bpt_ apis
No disagreement. All the routines follow the same template, if we change one we should also change the others.
The main question as discussed with Charles is whether we want to pass the 'msg' argument in all routines.
Lets revisit when we have new API
Add a convenience pointer to the 'sdw_bus' structure. BPT is a dedicated stream which will typically not be handled by DAIs or dailinks. Since there's only one BPT stream per link, storing the pointer at the link level seems rather natural.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- include/linux/soundwire/sdw.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index e54c5bbd2b91..8db0cd7d0d89 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -965,6 +965,7 @@ struct sdw_master_ops { * @stream_refcount: number of streams currently using this bus * @btp_stream_refcount: number of BTP streams currently using this bus (should * be zero or one, multiple streams per link is not supported). + * @bpt_stream: pointer stored for convenience. */ struct sdw_bus { struct device *dev; @@ -996,6 +997,7 @@ struct sdw_bus { int hw_sync_min_links; int stream_refcount; int bpt_stream_refcount; + struct sdw_stream_runtime *bpt_stream; };
int sdw_bus_master_add(struct sdw_bus *bus, struct device *parent,
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add a convenience pointer to the 'sdw_bus' structure. BPT is a dedicated stream which will typically not be handled by DAIs or dailinks. Since there's only one BPT stream per link, storing the pointer at the link level seems rather natural.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
include/linux/soundwire/sdw.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index e54c5bbd2b91..8db0cd7d0d89 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -965,6 +965,7 @@ struct sdw_master_ops {
- @stream_refcount: number of streams currently using this bus
- @btp_stream_refcount: number of BTP streams currently using this bus (should
- be zero or one, multiple streams per link is not supported).
*/
- @bpt_stream: pointer stored for convenience.
struct sdw_bus { struct device *dev; @@ -996,6 +997,7 @@ struct sdw_bus { int hw_sync_min_links; int stream_refcount; int bpt_stream_refcount;
- struct sdw_stream_runtime *bpt_stream;
So we are limiting to single stream? Can we have multiple transfers queued up, which I guess might imply multiple streams?
On 12/18/23 05:55, Vinod Koul wrote:
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add a convenience pointer to the 'sdw_bus' structure. BPT is a dedicated stream which will typically not be handled by DAIs or dailinks. Since there's only one BPT stream per link, storing the pointer at the link level seems rather natural.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
include/linux/soundwire/sdw.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index e54c5bbd2b91..8db0cd7d0d89 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -965,6 +965,7 @@ struct sdw_master_ops {
- @stream_refcount: number of streams currently using this bus
- @btp_stream_refcount: number of BTP streams currently using this bus (should
- be zero or one, multiple streams per link is not supported).
*/
- @bpt_stream: pointer stored for convenience.
struct sdw_bus { struct device *dev; @@ -996,6 +997,7 @@ struct sdw_bus { int hw_sync_min_links; int stream_refcount; int bpt_stream_refcount;
- struct sdw_stream_runtime *bpt_stream;
So we are limiting to single stream? Can we have multiple transfers queued up, which I guess might imply multiple streams?
Yes for now there is a BTP/BRA single stream active when there are no audio transfers taking place. This is the only way to guarantee predictable download/resume times.
There is no mechanism to queue up transfers, be it from the same peripheral device or different ones. It would be up to the peripheral driver to wait for the BTP stream to be available.
Adding a queuing mechanism is a bridge too far for now, most platforms only have 1 or 2 devices only, and a peripheral driver may or may not be ok with delayed downloads. For now the main ask is to reduce download times, a single stream is already a good start. There are other refinements we need to add as well, such as changing clocks to use the fastest gear. I'd like to proceed with baby steps...
On 18-12-23, 14:20, Pierre-Louis Bossart wrote:
On 12/18/23 05:55, Vinod Koul wrote:
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add a convenience pointer to the 'sdw_bus' structure. BPT is a dedicated stream which will typically not be handled by DAIs or dailinks. Since there's only one BPT stream per link, storing the pointer at the link level seems rather natural.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
include/linux/soundwire/sdw.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index e54c5bbd2b91..8db0cd7d0d89 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -965,6 +965,7 @@ struct sdw_master_ops {
- @stream_refcount: number of streams currently using this bus
- @btp_stream_refcount: number of BTP streams currently using this bus (should
- be zero or one, multiple streams per link is not supported).
*/
- @bpt_stream: pointer stored for convenience.
struct sdw_bus { struct device *dev; @@ -996,6 +997,7 @@ struct sdw_bus { int hw_sync_min_links; int stream_refcount; int bpt_stream_refcount;
- struct sdw_stream_runtime *bpt_stream;
So we are limiting to single stream? Can we have multiple transfers queued up, which I guess might imply multiple streams?
Yes for now there is a BTP/BRA single stream active when there are no audio transfers taking place. This is the only way to guarantee predictable download/resume times.
There is no mechanism to queue up transfers, be it from the same peripheral device or different ones. It would be up to the peripheral driver to wait for the BTP stream to be available.
Adding a queuing mechanism is a bridge too far for now, most platforms only have 1 or 2 devices only, and a peripheral driver may or may not be ok with delayed downloads. For now the main ask is to reduce download times, a single stream is already a good start. There are other refinements we need to add as well, such as changing clocks to use the fastest gear. I'd like to proceed with baby steps...
Since the API is async in nature, I think it is very reasonable that we add the queue support (a simple list would do) and notify when the transfer is complete..
On 12/21/23 15:39, Vinod Koul wrote:
On 18-12-23, 14:20, Pierre-Louis Bossart wrote:
On 12/18/23 05:55, Vinod Koul wrote:
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add a convenience pointer to the 'sdw_bus' structure. BPT is a dedicated stream which will typically not be handled by DAIs or dailinks. Since there's only one BPT stream per link, storing the pointer at the link level seems rather natural.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
include/linux/soundwire/sdw.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index e54c5bbd2b91..8db0cd7d0d89 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -965,6 +965,7 @@ struct sdw_master_ops {
- @stream_refcount: number of streams currently using this bus
- @btp_stream_refcount: number of BTP streams currently using this bus (should
- be zero or one, multiple streams per link is not supported).
*/
- @bpt_stream: pointer stored for convenience.
struct sdw_bus { struct device *dev; @@ -996,6 +997,7 @@ struct sdw_bus { int hw_sync_min_links; int stream_refcount; int bpt_stream_refcount;
- struct sdw_stream_runtime *bpt_stream;
So we are limiting to single stream? Can we have multiple transfers queued up, which I guess might imply multiple streams?
Yes for now there is a BTP/BRA single stream active when there are no audio transfers taking place. This is the only way to guarantee predictable download/resume times.
There is no mechanism to queue up transfers, be it from the same peripheral device or different ones. It would be up to the peripheral driver to wait for the BTP stream to be available.
Adding a queuing mechanism is a bridge too far for now, most platforms only have 1 or 2 devices only, and a peripheral driver may or may not be ok with delayed downloads. For now the main ask is to reduce download times, a single stream is already a good start. There are other refinements we need to add as well, such as changing clocks to use the fastest gear. I'd like to proceed with baby steps...
Since the API is async in nature, I think it is very reasonable that we add the queue support (a simple list would do) and notify when the transfer is complete..
On paper queueing is nice/elegant.
In practice it's likely that there's a bit of configuration needed before/after each download, e.g. restart the DSP.
It would also be an absolute horror to deal with error flow if one of the queued transfers did not succeed.
I would be more than happy if we successfully enable a single transfer across multiple platforms, and look at queueing as a optimization.
BTW even if there are multiple transfers queued, there's still only one stream active at a given time.
Add the lookup table required by crc8(). All configuration values were directly table from the MIPI SoundWire 1.x specification.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/Makefile | 4 +- drivers/soundwire/crc8.c | 277 +++++++++++++++++++++++++++++++++++++ drivers/soundwire/crc8.h | 11 ++ 3 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 drivers/soundwire/crc8.c create mode 100644 drivers/soundwire/crc8.h
diff --git a/drivers/soundwire/Makefile b/drivers/soundwire/Makefile index 657f5888a77b..170128dd9318 100644 --- a/drivers/soundwire/Makefile +++ b/drivers/soundwire/Makefile @@ -5,7 +5,9 @@
#Bus Objs soundwire-bus-y := bus_type.o bus.o master.o slave.o mipi_disco.o stream.o \ - sysfs_slave.o sysfs_slave_dpn.o + sysfs_slave.o sysfs_slave_dpn.o \ + crc8.o + obj-$(CONFIG_SOUNDWIRE) += soundwire-bus.o
soundwire-generic-allocation-objs := generic_bandwidth_allocation.o diff --git a/drivers/soundwire/crc8.c b/drivers/soundwire/crc8.c new file mode 100644 index 000000000000..b6b984d7f39a --- /dev/null +++ b/drivers/soundwire/crc8.c @@ -0,0 +1,277 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2024 Intel Corporation. + +#include <linux/crc8.h> +#include <linux/module.h> +#include "crc8.h" + +/* + * the MIPI SoundWire CRC8 polynomial is X^8 + X^6 + X^3 + X^2 + 1, MSB first + * The value is (1)01001101 = 0x4D + * + * the table below was generated with + * + * u8 crc8_lookup_table[CRC8_TABLE_SIZE]; + * crc8_populate_msb(crc8_lookup_table, SDW_CRC8_POLY); + * + */ + +const u8 sdw_crc8_lookup_msb[CRC8_TABLE_SIZE] = { + 0x00, /* 0 */ + 0x4d, /* 1 */ + 0x9a, /* 2 */ + 0xd7, /* 3 */ + 0x79, /* 4 */ + 0x34, /* 5 */ + 0xe3, /* 6 */ + 0xae, /* 7 */ + 0xf2, /* 8 */ + 0xbf, /* 9 */ + 0x68, /* 10 */ + 0x25, /* 11 */ + 0x8b, /* 12 */ + 0xc6, /* 13 */ + 0x11, /* 14 */ + 0x5c, /* 15 */ + 0xa9, /* 16 */ + 0xe4, /* 17 */ + 0x33, /* 18 */ + 0x7e, /* 19 */ + 0xd0, /* 20 */ + 0x9d, /* 21 */ + 0x4a, /* 22 */ + 0x07, /* 23 */ + 0x5b, /* 24 */ + 0x16, /* 25 */ + 0xc1, /* 26 */ + 0x8c, /* 27 */ + 0x22, /* 28 */ + 0x6f, /* 29 */ + 0xb8, /* 30 */ + 0xf5, /* 31 */ + 0x1f, /* 32 */ + 0x52, /* 33 */ + 0x85, /* 34 */ + 0xc8, /* 35 */ + 0x66, /* 36 */ + 0x2b, /* 37 */ + 0xfc, /* 38 */ + 0xb1, /* 39 */ + 0xed, /* 40 */ + 0xa0, /* 41 */ + 0x77, /* 42 */ + 0x3a, /* 43 */ + 0x94, /* 44 */ + 0xd9, /* 45 */ + 0x0e, /* 46 */ + 0x43, /* 47 */ + 0xb6, /* 48 */ + 0xfb, /* 49 */ + 0x2c, /* 50 */ + 0x61, /* 51 */ + 0xcf, /* 52 */ + 0x82, /* 53 */ + 0x55, /* 54 */ + 0x18, /* 55 */ + 0x44, /* 56 */ + 0x09, /* 57 */ + 0xde, /* 58 */ + 0x93, /* 59 */ + 0x3d, /* 60 */ + 0x70, /* 61 */ + 0xa7, /* 62 */ + 0xea, /* 63 */ + 0x3e, /* 64 */ + 0x73, /* 65 */ + 0xa4, /* 66 */ + 0xe9, /* 67 */ + 0x47, /* 68 */ + 0x0a, /* 69 */ + 0xdd, /* 70 */ + 0x90, /* 71 */ + 0xcc, /* 72 */ + 0x81, /* 73 */ + 0x56, /* 74 */ + 0x1b, /* 75 */ + 0xb5, /* 76 */ + 0xf8, /* 77 */ + 0x2f, /* 78 */ + 0x62, /* 79 */ + 0x97, /* 80 */ + 0xda, /* 81 */ + 0x0d, /* 82 */ + 0x40, /* 83 */ + 0xee, /* 84 */ + 0xa3, /* 85 */ + 0x74, /* 86 */ + 0x39, /* 87 */ + 0x65, /* 88 */ + 0x28, /* 89 */ + 0xff, /* 90 */ + 0xb2, /* 91 */ + 0x1c, /* 92 */ + 0x51, /* 93 */ + 0x86, /* 94 */ + 0xcb, /* 95 */ + 0x21, /* 96 */ + 0x6c, /* 97 */ + 0xbb, /* 98 */ + 0xf6, /* 99 */ + 0x58, /* 100 */ + 0x15, /* 101 */ + 0xc2, /* 102 */ + 0x8f, /* 103 */ + 0xd3, /* 104 */ + 0x9e, /* 105 */ + 0x49, /* 106 */ + 0x04, /* 107 */ + 0xaa, /* 108 */ + 0xe7, /* 109 */ + 0x30, /* 110 */ + 0x7d, /* 111 */ + 0x88, /* 112 */ + 0xc5, /* 113 */ + 0x12, /* 114 */ + 0x5f, /* 115 */ + 0xf1, /* 116 */ + 0xbc, /* 117 */ + 0x6b, /* 118 */ + 0x26, /* 119 */ + 0x7a, /* 120 */ + 0x37, /* 121 */ + 0xe0, /* 122 */ + 0xad, /* 123 */ + 0x03, /* 124 */ + 0x4e, /* 125 */ + 0x99, /* 126 */ + 0xd4, /* 127 */ + 0x7c, /* 128 */ + 0x31, /* 129 */ + 0xe6, /* 130 */ + 0xab, /* 131 */ + 0x05, /* 132 */ + 0x48, /* 133 */ + 0x9f, /* 134 */ + 0xd2, /* 135 */ + 0x8e, /* 136 */ + 0xc3, /* 137 */ + 0x14, /* 138 */ + 0x59, /* 139 */ + 0xf7, /* 140 */ + 0xba, /* 141 */ + 0x6d, /* 142 */ + 0x20, /* 143 */ + 0xd5, /* 144 */ + 0x98, /* 145 */ + 0x4f, /* 146 */ + 0x02, /* 147 */ + 0xac, /* 148 */ + 0xe1, /* 149 */ + 0x36, /* 150 */ + 0x7b, /* 151 */ + 0x27, /* 152 */ + 0x6a, /* 153 */ + 0xbd, /* 154 */ + 0xf0, /* 155 */ + 0x5e, /* 156 */ + 0x13, /* 157 */ + 0xc4, /* 158 */ + 0x89, /* 159 */ + 0x63, /* 160 */ + 0x2e, /* 161 */ + 0xf9, /* 162 */ + 0xb4, /* 163 */ + 0x1a, /* 164 */ + 0x57, /* 165 */ + 0x80, /* 166 */ + 0xcd, /* 167 */ + 0x91, /* 168 */ + 0xdc, /* 169 */ + 0x0b, /* 170 */ + 0x46, /* 171 */ + 0xe8, /* 172 */ + 0xa5, /* 173 */ + 0x72, /* 174 */ + 0x3f, /* 175 */ + 0xca, /* 176 */ + 0x87, /* 177 */ + 0x50, /* 178 */ + 0x1d, /* 179 */ + 0xb3, /* 180 */ + 0xfe, /* 181 */ + 0x29, /* 182 */ + 0x64, /* 183 */ + 0x38, /* 184 */ + 0x75, /* 185 */ + 0xa2, /* 186 */ + 0xef, /* 187 */ + 0x41, /* 188 */ + 0x0c, /* 189 */ + 0xdb, /* 190 */ + 0x96, /* 191 */ + 0x42, /* 192 */ + 0x0f, /* 193 */ + 0xd8, /* 194 */ + 0x95, /* 195 */ + 0x3b, /* 196 */ + 0x76, /* 197 */ + 0xa1, /* 198 */ + 0xec, /* 199 */ + 0xb0, /* 200 */ + 0xfd, /* 201 */ + 0x2a, /* 202 */ + 0x67, /* 203 */ + 0xc9, /* 204 */ + 0x84, /* 205 */ + 0x53, /* 206 */ + 0x1e, /* 207 */ + 0xeb, /* 208 */ + 0xa6, /* 209 */ + 0x71, /* 210 */ + 0x3c, /* 211 */ + 0x92, /* 212 */ + 0xdf, /* 213 */ + 0x08, /* 214 */ + 0x45, /* 215 */ + 0x19, /* 216 */ + 0x54, /* 217 */ + 0x83, /* 218 */ + 0xce, /* 219 */ + 0x60, /* 220 */ + 0x2d, /* 221 */ + 0xfa, /* 222 */ + 0xb7, /* 223 */ + 0x5d, /* 224 */ + 0x10, /* 225 */ + 0xc7, /* 226 */ + 0x8a, /* 227 */ + 0x24, /* 228 */ + 0x69, /* 229 */ + 0xbe, /* 230 */ + 0xf3, /* 231 */ + 0xaf, /* 232 */ + 0xe2, /* 233 */ + 0x35, /* 234 */ + 0x78, /* 235 */ + 0xd6, /* 236 */ + 0x9b, /* 237 */ + 0x4c, /* 238 */ + 0x01, /* 239 */ + 0xf4, /* 240 */ + 0xb9, /* 241 */ + 0x6e, /* 242 */ + 0x23, /* 243 */ + 0x8d, /* 244 */ + 0xc0, /* 245 */ + 0x17, /* 246 */ + 0x5a, /* 247 */ + 0x06, /* 248 */ + 0x4b, /* 249 */ + 0x9c, /* 250 */ + 0xd1, /* 251 */ + 0x7f, /* 252 */ + 0x32, /* 253 */ + 0xe5, /* 254 */ + 0xa8 /* 255 */ +}; +EXPORT_SYMBOL(sdw_crc8_lookup_msb); diff --git a/drivers/soundwire/crc8.h b/drivers/soundwire/crc8.h new file mode 100644 index 000000000000..9a88d3866016 --- /dev/null +++ b/drivers/soundwire/crc8.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +/* Copyright(c) 2024 Intel Corporation. */ + +#ifndef __SDW_CRC8_H +#define __SDW_CRC8_H + +#define SDW_CRC8_SEED 0xFF +#define SDW_CRC8_POLY 0x4D +extern const u8 sdw_crc8_lookup_msb[CRC8_TABLE_SIZE]; + +#endif /* __SDW_CRC8_H */
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add the lookup table required by crc8(). All configuration values were directly table from the MIPI SoundWire 1.x specification.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
drivers/soundwire/Makefile | 4 +- drivers/soundwire/crc8.c | 277 +++++++++++++++++++++++++++++++++++++ drivers/soundwire/crc8.h | 11 ++ 3 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 drivers/soundwire/crc8.c create mode 100644 drivers/soundwire/crc8.h
diff --git a/drivers/soundwire/Makefile b/drivers/soundwire/Makefile index 657f5888a77b..170128dd9318 100644 --- a/drivers/soundwire/Makefile +++ b/drivers/soundwire/Makefile @@ -5,7 +5,9 @@
#Bus Objs soundwire-bus-y := bus_type.o bus.o master.o slave.o mipi_disco.o stream.o \
sysfs_slave.o sysfs_slave_dpn.o
sysfs_slave.o sysfs_slave_dpn.o \
crc8.o
obj-$(CONFIG_SOUNDWIRE) += soundwire-bus.o
soundwire-generic-allocation-objs := generic_bandwidth_allocation.o diff --git a/drivers/soundwire/crc8.c b/drivers/soundwire/crc8.c new file mode 100644 index 000000000000..b6b984d7f39a --- /dev/null +++ b/drivers/soundwire/crc8.c @@ -0,0 +1,277 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2024 Intel Corporation.
+#include <linux/crc8.h> +#include <linux/module.h> +#include "crc8.h"
+/*
- the MIPI SoundWire CRC8 polynomial is X^8 + X^6 + X^3 + X^2 + 1, MSB first
- The value is (1)01001101 = 0x4D
- the table below was generated with
- u8 crc8_lookup_table[CRC8_TABLE_SIZE];
- crc8_populate_msb(crc8_lookup_table, SDW_CRC8_POLY);
Good that you found this API, so next question would be why should we have this static table in kernel and not generate on probe if bpt is supported..? Many subsystems use these APIs to generate the tables..
- */
+const u8 sdw_crc8_lookup_msb[CRC8_TABLE_SIZE] = {
- 0x00, /* 0 */
- 0x4d, /* 1 */
- 0x9a, /* 2 */
- 0xd7, /* 3 */
- 0x79, /* 4 */
- 0x34, /* 5 */
- 0xe3, /* 6 */
- 0xae, /* 7 */
- 0xf2, /* 8 */
- 0xbf, /* 9 */
- 0x68, /* 10 */
- 0x25, /* 11 */
- 0x8b, /* 12 */
- 0xc6, /* 13 */
- 0x11, /* 14 */
- 0x5c, /* 15 */
- 0xa9, /* 16 */
- 0xe4, /* 17 */
- 0x33, /* 18 */
- 0x7e, /* 19 */
- 0xd0, /* 20 */
- 0x9d, /* 21 */
- 0x4a, /* 22 */
- 0x07, /* 23 */
- 0x5b, /* 24 */
- 0x16, /* 25 */
- 0xc1, /* 26 */
- 0x8c, /* 27 */
- 0x22, /* 28 */
- 0x6f, /* 29 */
- 0xb8, /* 30 */
- 0xf5, /* 31 */
- 0x1f, /* 32 */
- 0x52, /* 33 */
- 0x85, /* 34 */
- 0xc8, /* 35 */
- 0x66, /* 36 */
- 0x2b, /* 37 */
- 0xfc, /* 38 */
- 0xb1, /* 39 */
- 0xed, /* 40 */
- 0xa0, /* 41 */
- 0x77, /* 42 */
- 0x3a, /* 43 */
- 0x94, /* 44 */
- 0xd9, /* 45 */
- 0x0e, /* 46 */
- 0x43, /* 47 */
- 0xb6, /* 48 */
- 0xfb, /* 49 */
- 0x2c, /* 50 */
- 0x61, /* 51 */
- 0xcf, /* 52 */
- 0x82, /* 53 */
- 0x55, /* 54 */
- 0x18, /* 55 */
- 0x44, /* 56 */
- 0x09, /* 57 */
- 0xde, /* 58 */
- 0x93, /* 59 */
- 0x3d, /* 60 */
- 0x70, /* 61 */
- 0xa7, /* 62 */
- 0xea, /* 63 */
- 0x3e, /* 64 */
- 0x73, /* 65 */
- 0xa4, /* 66 */
- 0xe9, /* 67 */
- 0x47, /* 68 */
- 0x0a, /* 69 */
- 0xdd, /* 70 */
- 0x90, /* 71 */
- 0xcc, /* 72 */
- 0x81, /* 73 */
- 0x56, /* 74 */
- 0x1b, /* 75 */
- 0xb5, /* 76 */
- 0xf8, /* 77 */
- 0x2f, /* 78 */
- 0x62, /* 79 */
- 0x97, /* 80 */
- 0xda, /* 81 */
- 0x0d, /* 82 */
- 0x40, /* 83 */
- 0xee, /* 84 */
- 0xa3, /* 85 */
- 0x74, /* 86 */
- 0x39, /* 87 */
- 0x65, /* 88 */
- 0x28, /* 89 */
- 0xff, /* 90 */
- 0xb2, /* 91 */
- 0x1c, /* 92 */
- 0x51, /* 93 */
- 0x86, /* 94 */
- 0xcb, /* 95 */
- 0x21, /* 96 */
- 0x6c, /* 97 */
- 0xbb, /* 98 */
- 0xf6, /* 99 */
- 0x58, /* 100 */
- 0x15, /* 101 */
- 0xc2, /* 102 */
- 0x8f, /* 103 */
- 0xd3, /* 104 */
- 0x9e, /* 105 */
- 0x49, /* 106 */
- 0x04, /* 107 */
- 0xaa, /* 108 */
- 0xe7, /* 109 */
- 0x30, /* 110 */
- 0x7d, /* 111 */
- 0x88, /* 112 */
- 0xc5, /* 113 */
- 0x12, /* 114 */
- 0x5f, /* 115 */
- 0xf1, /* 116 */
- 0xbc, /* 117 */
- 0x6b, /* 118 */
- 0x26, /* 119 */
- 0x7a, /* 120 */
- 0x37, /* 121 */
- 0xe0, /* 122 */
- 0xad, /* 123 */
- 0x03, /* 124 */
- 0x4e, /* 125 */
- 0x99, /* 126 */
- 0xd4, /* 127 */
- 0x7c, /* 128 */
- 0x31, /* 129 */
- 0xe6, /* 130 */
- 0xab, /* 131 */
- 0x05, /* 132 */
- 0x48, /* 133 */
- 0x9f, /* 134 */
- 0xd2, /* 135 */
- 0x8e, /* 136 */
- 0xc3, /* 137 */
- 0x14, /* 138 */
- 0x59, /* 139 */
- 0xf7, /* 140 */
- 0xba, /* 141 */
- 0x6d, /* 142 */
- 0x20, /* 143 */
- 0xd5, /* 144 */
- 0x98, /* 145 */
- 0x4f, /* 146 */
- 0x02, /* 147 */
- 0xac, /* 148 */
- 0xe1, /* 149 */
- 0x36, /* 150 */
- 0x7b, /* 151 */
- 0x27, /* 152 */
- 0x6a, /* 153 */
- 0xbd, /* 154 */
- 0xf0, /* 155 */
- 0x5e, /* 156 */
- 0x13, /* 157 */
- 0xc4, /* 158 */
- 0x89, /* 159 */
- 0x63, /* 160 */
- 0x2e, /* 161 */
- 0xf9, /* 162 */
- 0xb4, /* 163 */
- 0x1a, /* 164 */
- 0x57, /* 165 */
- 0x80, /* 166 */
- 0xcd, /* 167 */
- 0x91, /* 168 */
- 0xdc, /* 169 */
- 0x0b, /* 170 */
- 0x46, /* 171 */
- 0xe8, /* 172 */
- 0xa5, /* 173 */
- 0x72, /* 174 */
- 0x3f, /* 175 */
- 0xca, /* 176 */
- 0x87, /* 177 */
- 0x50, /* 178 */
- 0x1d, /* 179 */
- 0xb3, /* 180 */
- 0xfe, /* 181 */
- 0x29, /* 182 */
- 0x64, /* 183 */
- 0x38, /* 184 */
- 0x75, /* 185 */
- 0xa2, /* 186 */
- 0xef, /* 187 */
- 0x41, /* 188 */
- 0x0c, /* 189 */
- 0xdb, /* 190 */
- 0x96, /* 191 */
- 0x42, /* 192 */
- 0x0f, /* 193 */
- 0xd8, /* 194 */
- 0x95, /* 195 */
- 0x3b, /* 196 */
- 0x76, /* 197 */
- 0xa1, /* 198 */
- 0xec, /* 199 */
- 0xb0, /* 200 */
- 0xfd, /* 201 */
- 0x2a, /* 202 */
- 0x67, /* 203 */
- 0xc9, /* 204 */
- 0x84, /* 205 */
- 0x53, /* 206 */
- 0x1e, /* 207 */
- 0xeb, /* 208 */
- 0xa6, /* 209 */
- 0x71, /* 210 */
- 0x3c, /* 211 */
- 0x92, /* 212 */
- 0xdf, /* 213 */
- 0x08, /* 214 */
- 0x45, /* 215 */
- 0x19, /* 216 */
- 0x54, /* 217 */
- 0x83, /* 218 */
- 0xce, /* 219 */
- 0x60, /* 220 */
- 0x2d, /* 221 */
- 0xfa, /* 222 */
- 0xb7, /* 223 */
- 0x5d, /* 224 */
- 0x10, /* 225 */
- 0xc7, /* 226 */
- 0x8a, /* 227 */
- 0x24, /* 228 */
- 0x69, /* 229 */
- 0xbe, /* 230 */
- 0xf3, /* 231 */
- 0xaf, /* 232 */
- 0xe2, /* 233 */
- 0x35, /* 234 */
- 0x78, /* 235 */
- 0xd6, /* 236 */
- 0x9b, /* 237 */
- 0x4c, /* 238 */
- 0x01, /* 239 */
- 0xf4, /* 240 */
- 0xb9, /* 241 */
- 0x6e, /* 242 */
- 0x23, /* 243 */
- 0x8d, /* 244 */
- 0xc0, /* 245 */
- 0x17, /* 246 */
- 0x5a, /* 247 */
- 0x06, /* 248 */
- 0x4b, /* 249 */
- 0x9c, /* 250 */
- 0xd1, /* 251 */
- 0x7f, /* 252 */
- 0x32, /* 253 */
- 0xe5, /* 254 */
- 0xa8 /* 255 */
+}; +EXPORT_SYMBOL(sdw_crc8_lookup_msb); diff --git a/drivers/soundwire/crc8.h b/drivers/soundwire/crc8.h new file mode 100644 index 000000000000..9a88d3866016 --- /dev/null +++ b/drivers/soundwire/crc8.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +/* Copyright(c) 2024 Intel Corporation. */
+#ifndef __SDW_CRC8_H +#define __SDW_CRC8_H
+#define SDW_CRC8_SEED 0xFF +#define SDW_CRC8_POLY 0x4D +extern const u8 sdw_crc8_lookup_msb[CRC8_TABLE_SIZE];
+#endif /* __SDW_CRC8_H */
2.39.2
On 12/18/23 06:01, Vinod Koul wrote:
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add the lookup table required by crc8(). All configuration values were directly table from the MIPI SoundWire 1.x specification.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
drivers/soundwire/Makefile | 4 +- drivers/soundwire/crc8.c | 277 +++++++++++++++++++++++++++++++++++++ drivers/soundwire/crc8.h | 11 ++ 3 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 drivers/soundwire/crc8.c create mode 100644 drivers/soundwire/crc8.h
diff --git a/drivers/soundwire/Makefile b/drivers/soundwire/Makefile index 657f5888a77b..170128dd9318 100644 --- a/drivers/soundwire/Makefile +++ b/drivers/soundwire/Makefile @@ -5,7 +5,9 @@
#Bus Objs soundwire-bus-y := bus_type.o bus.o master.o slave.o mipi_disco.o stream.o \
sysfs_slave.o sysfs_slave_dpn.o
sysfs_slave.o sysfs_slave_dpn.o \
crc8.o
obj-$(CONFIG_SOUNDWIRE) += soundwire-bus.o
soundwire-generic-allocation-objs := generic_bandwidth_allocation.o diff --git a/drivers/soundwire/crc8.c b/drivers/soundwire/crc8.c new file mode 100644 index 000000000000..b6b984d7f39a --- /dev/null +++ b/drivers/soundwire/crc8.c @@ -0,0 +1,277 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2024 Intel Corporation.
+#include <linux/crc8.h> +#include <linux/module.h> +#include "crc8.h"
+/*
- the MIPI SoundWire CRC8 polynomial is X^8 + X^6 + X^3 + X^2 + 1, MSB first
- The value is (1)01001101 = 0x4D
- the table below was generated with
- u8 crc8_lookup_table[CRC8_TABLE_SIZE];
- crc8_populate_msb(crc8_lookup_table, SDW_CRC8_POLY);
Good that you found this API, so next question would be why should we have this static table in kernel and not generate on probe if bpt is supported..? Many subsystems use these APIs to generate the tables..
The table is going to be the same for all hosts, it's simpler if everyone uses a constant table, no? We're talking about 256 bytes added for the common bus parts, be it with dynamically allocated memory or a static table.
I don't mind reverting to a dynamically allocated table populated at boot if that was the consensus.
On 18-12-23, 14:26, Pierre-Louis Bossart wrote:
On 12/18/23 06:01, Vinod Koul wrote:
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add the lookup table required by crc8(). All configuration values were directly table from the MIPI SoundWire 1.x specification.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
drivers/soundwire/Makefile | 4 +- drivers/soundwire/crc8.c | 277 +++++++++++++++++++++++++++++++++++++ drivers/soundwire/crc8.h | 11 ++ 3 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 drivers/soundwire/crc8.c create mode 100644 drivers/soundwire/crc8.h
diff --git a/drivers/soundwire/Makefile b/drivers/soundwire/Makefile index 657f5888a77b..170128dd9318 100644 --- a/drivers/soundwire/Makefile +++ b/drivers/soundwire/Makefile @@ -5,7 +5,9 @@
#Bus Objs soundwire-bus-y := bus_type.o bus.o master.o slave.o mipi_disco.o stream.o \
sysfs_slave.o sysfs_slave_dpn.o
sysfs_slave.o sysfs_slave_dpn.o \
crc8.o
obj-$(CONFIG_SOUNDWIRE) += soundwire-bus.o
soundwire-generic-allocation-objs := generic_bandwidth_allocation.o diff --git a/drivers/soundwire/crc8.c b/drivers/soundwire/crc8.c new file mode 100644 index 000000000000..b6b984d7f39a --- /dev/null +++ b/drivers/soundwire/crc8.c @@ -0,0 +1,277 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2024 Intel Corporation.
+#include <linux/crc8.h> +#include <linux/module.h> +#include "crc8.h"
+/*
- the MIPI SoundWire CRC8 polynomial is X^8 + X^6 + X^3 + X^2 + 1, MSB first
- The value is (1)01001101 = 0x4D
- the table below was generated with
- u8 crc8_lookup_table[CRC8_TABLE_SIZE];
- crc8_populate_msb(crc8_lookup_table, SDW_CRC8_POLY);
Good that you found this API, so next question would be why should we have this static table in kernel and not generate on probe if bpt is supported..? Many subsystems use these APIs to generate the tables..
The table is going to be the same for all hosts, it's simpler if everyone uses a constant table, no? We're talking about 256 bytes added for the common bus parts, be it with dynamically allocated memory or a static table.
I don't mind reverting to a dynamically allocated table populated at boot if that was the consensus.
Most of the kernel users I looked have dynamically allocated table populated at boot, also out of many users how many would support BTP.? Your older platforms, current qcom dont, so not point is adding for everyone...
On 12/21/23 15:42, Vinod Koul wrote:
On 18-12-23, 14:26, Pierre-Louis Bossart wrote:
On 12/18/23 06:01, Vinod Koul wrote:
On 07-12-23, 16:29, Pierre-Louis Bossart wrote:
Add the lookup table required by crc8(). All configuration values were directly table from the MIPI SoundWire 1.x specification.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
drivers/soundwire/Makefile | 4 +- drivers/soundwire/crc8.c | 277 +++++++++++++++++++++++++++++++++++++ drivers/soundwire/crc8.h | 11 ++ 3 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 drivers/soundwire/crc8.c create mode 100644 drivers/soundwire/crc8.h
diff --git a/drivers/soundwire/Makefile b/drivers/soundwire/Makefile index 657f5888a77b..170128dd9318 100644 --- a/drivers/soundwire/Makefile +++ b/drivers/soundwire/Makefile @@ -5,7 +5,9 @@
#Bus Objs soundwire-bus-y := bus_type.o bus.o master.o slave.o mipi_disco.o stream.o \
sysfs_slave.o sysfs_slave_dpn.o
sysfs_slave.o sysfs_slave_dpn.o \
crc8.o
obj-$(CONFIG_SOUNDWIRE) += soundwire-bus.o
soundwire-generic-allocation-objs := generic_bandwidth_allocation.o diff --git a/drivers/soundwire/crc8.c b/drivers/soundwire/crc8.c new file mode 100644 index 000000000000..b6b984d7f39a --- /dev/null +++ b/drivers/soundwire/crc8.c @@ -0,0 +1,277 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2024 Intel Corporation.
+#include <linux/crc8.h> +#include <linux/module.h> +#include "crc8.h"
+/*
- the MIPI SoundWire CRC8 polynomial is X^8 + X^6 + X^3 + X^2 + 1, MSB first
- The value is (1)01001101 = 0x4D
- the table below was generated with
- u8 crc8_lookup_table[CRC8_TABLE_SIZE];
- crc8_populate_msb(crc8_lookup_table, SDW_CRC8_POLY);
Good that you found this API, so next question would be why should we have this static table in kernel and not generate on probe if bpt is supported..? Many subsystems use these APIs to generate the tables..
The table is going to be the same for all hosts, it's simpler if everyone uses a constant table, no? We're talking about 256 bytes added for the common bus parts, be it with dynamically allocated memory or a static table.
I don't mind reverting to a dynamically allocated table populated at boot if that was the consensus.
Most of the kernel users I looked have dynamically allocated table populated at boot, also out of many users how many would support BTP.? Your older platforms, current qcom dont, so not point is adding for everyone...
All Intel hardware supports BTP/BRA, we just didn't have a compelling reason to enable it so far. I've seen AMD stating that they also have BTP/BRA. That's 2/3 of controllers.
I don't mind reverting to a devm_ allocated table, I am not sure I see the benefits for 256 bytes of constant data.
On 21-12-23, 18:15, Pierre-Louis Bossart wrote:
I don't mind reverting to a dynamically allocated table populated at boot if that was the consensus.
Most of the kernel users I looked have dynamically allocated table populated at boot, also out of many users how many would support BTP.? Your older platforms, current qcom dont, so not point is adding for everyone...
All Intel hardware supports BTP/BRA, we just didn't have a compelling reason to enable it so far. I've seen AMD stating that they also have BTP/BRA. That's 2/3 of controllers.
Not all qcom platforms support this, so this would be enabled selectively for them
The Cadence IP expects a specific format (detailed in the Documentation). Add helpers to copy the data into the DMA buffer.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/Kconfig | 1 + drivers/soundwire/cadence_master.c | 634 +++++++++++++++++++++++++++++ drivers/soundwire/cadence_master.h | 30 ++ 3 files changed, 665 insertions(+)
diff --git a/drivers/soundwire/Kconfig b/drivers/soundwire/Kconfig index 4d8f3b7024ae..18ca21edd552 100644 --- a/drivers/soundwire/Kconfig +++ b/drivers/soundwire/Kconfig @@ -30,6 +30,7 @@ config SOUNDWIRE_AMD
config SOUNDWIRE_CADENCE tristate + select CRC8
config SOUNDWIRE_INTEL tristate "Intel SoundWire Master driver" diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c index bb623f82826c..2da0c415c125 100644 --- a/drivers/soundwire/cadence_master.c +++ b/drivers/soundwire/cadence_master.c @@ -6,6 +6,7 @@ * Used by Master driver */
+#include <linux/crc8.h> #include <linux/delay.h> #include <linux/device.h> #include <linux/debugfs.h> @@ -20,6 +21,7 @@ #include <sound/soc.h> #include <linux/workqueue.h> #include "bus.h" +#include "crc8.h" #include "cadence_master.h"
static int interrupt_mask; @@ -1890,5 +1892,637 @@ struct sdw_cdns_pdi *sdw_cdns_alloc_pdi(struct sdw_cdns *cdns, } EXPORT_SYMBOL(sdw_cdns_alloc_pdi);
+/* BPT/BRA helpers */ + +#define SDW_CDNS_BRA_HDR 6 /* defined by MIPI */ +#define SDW_CDNS_BRA_HDR_CRC 1 /* defined by MIPI */ +#define SDW_CDNS_BRA_HDR_CRC_PAD 1 /* Cadence only */ +#define SDW_CDNS_BRA_HDR_RESP 1 /* defined by MIPI */ +#define SDW_CDNS_BRA_HDR_RESP_PAD 1 /* Cadence only */ + +#define SDW_CDNS_BRA_DATA_PAD 1 /* Cadence only */ +#define SDW_CDNS_BRA_DATA_CRC 1 /* defined by MIPI */ +#define SDW_CDNS_BRA_DATA_CRC_PAD 1 /* Cadence only */ + +#define SDW_CDNS_BRA_FOOTER_RESP 1 /* defined by MIPI */ +#define SDW_CDNS_BRA_FOOTER_RESP_PAD 1 /* Cadence only */ + +static unsigned int sdw_cdns_bra_actual_data_size(unsigned int allocated_bytes_per_frame) +{ + unsigned int total; + + total = allocated_bytes_per_frame - + SDW_CDNS_BRA_HDR - + SDW_CDNS_BRA_HDR_CRC - + SDW_CDNS_BRA_HDR_RESP; + + total -= SDW_CDNS_BRA_DATA_CRC; + + total -= SDW_CDNS_BRA_FOOTER_RESP; + + return total; +} + +static unsigned int sdw_cdns_write_pdi0_buffer_size(unsigned int actual_data_size) +{ + unsigned int total; + + total = SDW_CDNS_BRA_HDR + + SDW_CDNS_BRA_HDR_CRC + + SDW_CDNS_BRA_HDR_CRC_PAD; + + total += actual_data_size; + if (actual_data_size & 1) + total += SDW_CDNS_BRA_DATA_PAD; + + total += SDW_CDNS_BRA_DATA_CRC + + SDW_CDNS_BRA_DATA_CRC_PAD; + + return total * 2; +} + +static unsigned int sdw_cdns_write_pdi1_buffer_size(unsigned int actual_data_size) +{ + unsigned int total; + + total = SDW_CDNS_BRA_HDR_RESP + + SDW_CDNS_BRA_HDR_RESP_PAD; + + total += SDW_CDNS_BRA_FOOTER_RESP + + SDW_CDNS_BRA_FOOTER_RESP_PAD; + + return total * 2; +} + +static unsigned int sdw_cdns_read_pdi0_buffer_size(unsigned int actual_data_size) +{ + unsigned int total; + + total = SDW_CDNS_BRA_HDR + + SDW_CDNS_BRA_HDR_CRC + + SDW_CDNS_BRA_HDR_CRC_PAD; + + return total * 2; +} + +static unsigned int sdw_cdns_read_pdi1_buffer_size(unsigned int actual_data_size) +{ + unsigned int total; + + total = SDW_CDNS_BRA_HDR_RESP + + SDW_CDNS_BRA_HDR_RESP_PAD; + + total += actual_data_size; + if (actual_data_size & 1) + total += SDW_CDNS_BRA_DATA_PAD; + + total += SDW_CDNS_BRA_HDR_CRC + + SDW_CDNS_BRA_HDR_CRC_PAD; + + total += SDW_CDNS_BRA_FOOTER_RESP + + SDW_CDNS_BRA_FOOTER_RESP_PAD; + + return total * 2; +} + +int sdw_cdns_bpt_find_buffer_sizes(int command, /* 0: write, 1: read */ + int row, int col, + unsigned int data_bytes, + unsigned int requested_bytes_per_frame, + unsigned int *data_per_frame, + unsigned int *pdi0_buffer_size, + unsigned int *pdi1_buffer_size, + unsigned int *num_frames) +{ + unsigned int bpt_bits = row * (col - 1); + unsigned int bpt_bytes = bpt_bits >> 3; + unsigned int actual_bpt_bytes; + unsigned int pdi0_tx_size; + unsigned int pdi1_rx_size; + unsigned int remainder; + + if (!data_bytes) + return -EINVAL; + + actual_bpt_bytes = sdw_cdns_bra_actual_data_size(bpt_bytes); + if (data_bytes < actual_bpt_bytes) + actual_bpt_bytes = data_bytes; + + /* + * the caller may want to set the number of bytes per frame, + * allow when possible + */ + if (requested_bytes_per_frame < actual_bpt_bytes) + actual_bpt_bytes = requested_bytes_per_frame; + + *data_per_frame = actual_bpt_bytes; + + if (command == 0) { + /* + * for writes we need to send all the data_bytes per frame, + * even for the last frame which may only transport fewer bytes + */ + + *num_frames = DIV_ROUND_UP(data_bytes, actual_bpt_bytes); + + pdi0_tx_size = sdw_cdns_write_pdi0_buffer_size(actual_bpt_bytes); + pdi1_rx_size = sdw_cdns_write_pdi1_buffer_size(actual_bpt_bytes); + + *pdi0_buffer_size = pdi0_tx_size * *num_frames; + *pdi1_buffer_size = pdi1_rx_size * *num_frames; + } else { + /* + * for reads we need to retrieve only what is requested in the BTP + * header, so the last frame needs to be special-cased + */ + *num_frames = data_bytes / actual_bpt_bytes; + + pdi0_tx_size = sdw_cdns_read_pdi0_buffer_size(actual_bpt_bytes); + pdi1_rx_size = sdw_cdns_read_pdi1_buffer_size(actual_bpt_bytes); + + *pdi0_buffer_size = pdi0_tx_size * *num_frames; + *pdi1_buffer_size = pdi1_rx_size * *num_frames; + + remainder = data_bytes % actual_bpt_bytes; + if (remainder) { + pdi0_tx_size = sdw_cdns_read_pdi0_buffer_size(remainder); + pdi1_rx_size = sdw_cdns_read_pdi1_buffer_size(remainder); + + *num_frames = *num_frames + 1; + *pdi0_buffer_size += pdi0_tx_size; + *pdi1_buffer_size += pdi1_rx_size; + } + } + + return 0; +} +EXPORT_SYMBOL(sdw_cdns_bpt_find_buffer_sizes); + +static int sdw_cdns_copy_write_data(u8 *data, int data_size, u8 *dma_buffer, int dma_buffer_size) +{ + int i; + int j; + + /* size check to prevent out of bounds access */ + i = data_size - 1; + j = (2 * i) - (i & 1); + if (data_size & 1) + j++; + j += 2; + if (j >= dma_buffer_size) + return -EINVAL; + + /* copy data */ + for (i = 0; i < data_size; i++) { + j = (2 * i) - (i & 1); + dma_buffer[j] = data[i]; + } + /* add required pad */ + if (data_size & 1) + dma_buffer[++j] = 0; + /* skip last two bytes */ + j += 2; + + /* offset and data are off-by-one */ + return j + 1; +} + +static int sdw_cdns_prepare_write_pd0_buffer(u8 *header, unsigned int header_size, + u8 *data, unsigned int data_size, + u8 *dma_buffer, unsigned int dma_buffer_size, + unsigned int *dma_data_written, + unsigned int frame_counter) +{ + u8 crc; + int data_written; + u8 *last_byte; + + /* clear buffer */ + memset(dma_buffer, 0, dma_buffer_size); + *dma_data_written = 0; + + dma_buffer[3] = BIT(7); + dma_buffer[3] |= frame_counter & GENMASK(3, 0); + + data_written = sdw_cdns_copy_write_data(header, header_size, dma_buffer, dma_buffer_size); + if (data_written < 0) + return data_written; + dma_buffer += data_written; + dma_buffer_size -= data_written; + *dma_data_written += data_written; + + crc = SDW_CRC8_SEED; + crc = crc8(sdw_crc8_lookup_msb, header, header_size, crc); + + data_written = sdw_cdns_copy_write_data(&crc, 1, dma_buffer, dma_buffer_size); + if (data_written < 0) + return data_written; + dma_buffer += data_written; + dma_buffer_size -= data_written; + *dma_data_written += data_written; + + data_written = sdw_cdns_copy_write_data(data, data_size, dma_buffer, dma_buffer_size); + if (data_written < 0) + return data_written; + dma_buffer += data_written; + dma_buffer_size -= data_written; + *dma_data_written += data_written; + + crc = SDW_CRC8_SEED; + crc = crc8(sdw_crc8_lookup_msb, data, data_size, crc); + data_written = sdw_cdns_copy_write_data(&crc, 1, dma_buffer, dma_buffer_size); + if (data_written < 0) + return data_written; + dma_buffer += data_written; + dma_buffer_size -= data_written; + *dma_data_written += data_written; + + /* tag last byte */ + last_byte = dma_buffer - 1; + last_byte[0] = BIT(6); + + return 0; +} + +static int sdw_cdns_prepare_read_pd0_buffer(u8 *header, unsigned int header_size, + u8 *dma_buffer, unsigned int dma_buffer_size, + unsigned int *dma_data_written, + unsigned int frame_counter) +{ + u8 crc; + int data_written; + u8 *last_byte; + + /* clear buffer */ + memset(dma_buffer, 0, dma_buffer_size); + *dma_data_written = 0; + + dma_buffer[3] = BIT(7); + dma_buffer[3] |= frame_counter & GENMASK(3, 0); + + data_written = sdw_cdns_copy_write_data(header, header_size, dma_buffer, dma_buffer_size); + if (data_written < 0) + return data_written; + dma_buffer += data_written; + dma_buffer_size -= data_written; + *dma_data_written += data_written; + + crc = SDW_CRC8_SEED; + crc = crc8(sdw_crc8_lookup_msb, header, header_size, crc); + + data_written = sdw_cdns_copy_write_data(&crc, 1, dma_buffer, dma_buffer_size); + if (data_written < 0) + return data_written; + dma_buffer += data_written; + dma_buffer_size -= data_written; + *dma_data_written += data_written; + + /* tag last byte */ + last_byte = dma_buffer - 1; + last_byte[0] = BIT(6); + + return 0; +} + +#define CDNS_BTP_ROLLING_COUNTER_START 1 + +int sdw_cdns_prepare_write_dma_buffer(u8 dev_num, u32 start_register, + u8 *data, int data_size, + int data_per_frame, + u8 *dma_buffer, int dma_buffer_size, + int *dma_buffer_total_bytes) +{ + u8 header[SDW_CDNS_BRA_HDR]; + u8 *p_dma_buffer; + u8 *p_data; + int dma_data_written; + int total_dma_data_written; + u8 counter; + int remainder; + int ret; + + counter = CDNS_BTP_ROLLING_COUNTER_START; + + header[0] = BIT(1); /* write command: BIT(1) set */ + header[0] |= GENMASK(7, 6); /* header is active */ + header[0] |= (dev_num << 2); + + p_data = &data[0]; + p_dma_buffer = &dma_buffer[0]; + total_dma_data_written = 0; + + while (data_size / data_per_frame) { + header[1] = data_per_frame; + header[2] = start_register >> 24 & 0xFF; + header[3] = start_register >> 16 & 0xFF; + header[4] = start_register >> 8 & 0xFF; + header[5] = start_register >> 0 & 0xFF; + + ret = sdw_cdns_prepare_write_pd0_buffer(header, SDW_CDNS_BRA_HDR, + p_data, data_per_frame, + p_dma_buffer, dma_buffer_size, + &dma_data_written, + counter); + if (ret < 0) + return ret; + + counter++; + + p_data += data_per_frame; + data_size -= data_per_frame; + + p_dma_buffer += dma_data_written; + dma_buffer_size -= dma_data_written; + total_dma_data_written += dma_data_written; + + start_register += data_per_frame; + } + + remainder = data_size % data_per_frame; + if (remainder) { + header[1] = remainder; + header[2] = start_register >> 24 & 0xFF; + header[3] = start_register >> 16 & 0xFF; + header[4] = start_register >> 8 & 0xFF; + header[5] = start_register >> 0 & 0xFF; + + ret = sdw_cdns_prepare_write_pd0_buffer(header, SDW_CDNS_BRA_HDR, + p_data, remainder, + p_dma_buffer, dma_buffer_size, + &dma_data_written, + counter); + if (ret < 0) + return ret; + + total_dma_data_written += dma_data_written; + } + + *dma_buffer_total_bytes = total_dma_data_written; + + return 0; +} +EXPORT_SYMBOL(sdw_cdns_prepare_write_dma_buffer); + +int sdw_cdns_prepare_read_dma_buffer(u8 dev_num, u32 start_register, + int data_size, + int data_per_frame, + u8 *dma_buffer, int dma_buffer_size, + int *dma_buffer_total_bytes) +{ + u8 header[SDW_CDNS_BRA_HDR]; + u8 *p_dma_buffer; + int dma_data_written; + int total_dma_data_written; + u8 counter; + int remainder; + int ret; + + counter = CDNS_BTP_ROLLING_COUNTER_START; + + header[0] = 0; /* read command: BIT(1) cleared */ + header[0] |= GENMASK(7, 6); /* header is active */ + header[0] |= (dev_num << 2); + + p_dma_buffer = &dma_buffer[0]; + total_dma_data_written = 0; + + while (data_size / data_per_frame) { + header[1] = data_per_frame; + header[2] = start_register >> 24 & 0xFF; + header[3] = start_register >> 16 & 0xFF; + header[4] = start_register >> 8 & 0xFF; + header[5] = start_register >> 0 & 0xFF; + + ret = sdw_cdns_prepare_read_pd0_buffer(header, SDW_CDNS_BRA_HDR, + p_dma_buffer, dma_buffer_size, + &dma_data_written, + counter); + if (ret < 0) + return ret; + + counter++; + + data_size -= data_per_frame; + + p_dma_buffer += dma_data_written; + dma_buffer_size -= dma_data_written; + total_dma_data_written += dma_data_written; + + start_register += data_per_frame; + } + + remainder = data_size % data_per_frame; + if (remainder) { + header[1] = remainder; + header[2] = start_register >> 24 & 0xFF; + header[3] = start_register >> 16 & 0xFF; + header[4] = start_register >> 8 & 0xFF; + header[5] = start_register >> 0 & 0xFF; + + ret = sdw_cdns_prepare_read_pd0_buffer(header, SDW_CDNS_BRA_HDR, + p_dma_buffer, dma_buffer_size, + &dma_data_written, + counter); + if (ret < 0) + return ret; + + total_dma_data_written += dma_data_written; + } + + *dma_buffer_total_bytes = total_dma_data_written; + + return 0; +} +EXPORT_SYMBOL(sdw_cdns_prepare_read_dma_buffer); + +static int check_counter(u32 val, u8 counter) +{ + u8 frame; + + frame = (val >> 24) & GENMASK(3, 0); + if (counter != frame) + return -EIO; + return 0; +} + +static int check_response(u32 val) +{ + u8 response; + + response = (val >> 3) & GENMASK(1, 0); + if (response == 0) /* Ignored */ + return -ENODATA; + if (response != 1) /* ACK */ + return -EIO; + + return 0; +} + +static int check_frame_start(u32 header, u8 counter) +{ + int ret; + + /* check frame_start marker */ + if (!(header & BIT(31))) + return -EIO; + + ret = check_counter(header, counter); + if (ret < 0) + return ret; + + return check_response(header); +} + +static int check_frame_end(u32 footer) +{ + /* check frame_end marker */ + if (!(footer & BIT(30))) + return -EIO; + + return check_response(footer); +} + +int sdw_cdns_check_write_response(struct device *dev, + u8 *dma_buffer, int dma_buffer_size, + int num_frames) +{ + u32 *p_data; + int counter; + u32 header; + u32 footer; + int ret; + int i; + + /* paranoia check on buffer size */ + if (dma_buffer_size != num_frames * 8) + return -EINVAL; + + counter = CDNS_BTP_ROLLING_COUNTER_START; + p_data = (u32 *)dma_buffer; + + for (i = 0; i < num_frames; i++) { + header = *p_data++; + footer = *p_data++; + + ret = check_frame_start(header, counter); + if (ret < 0) { + dev_err(dev, "%s: bad frame %d/%d start header %x\n", + __func__, i, num_frames, header); + return ret; + } + + ret = check_frame_end(footer); + if (ret < 0) { + dev_err(dev, "%s: bad frame %d/%d end footer %x\n", + __func__, i, num_frames, footer); + return ret; + } + + counter++; + counter &= GENMASK(3, 0); + } + return 0; +} +EXPORT_SYMBOL(sdw_cdns_check_write_response); + +static u8 extract_read_data(u32 *data, int num_bytes, u8 *buffer) +{ + u32 val; + int i; + u8 crc; + u8 b0; + u8 b1; + + crc = SDW_CRC8_SEED; + + /* process two bytes at a time */ + for (i = 0; i < num_bytes / 2; i++) { + val = *data++; + + b0 = val & 0xff; + b1 = (val >> 8) & 0xff; + + *buffer++ = b0; + crc = crc8(sdw_crc8_lookup_msb, &b0, 1, crc); + + *buffer++ = b1; + crc = crc8(sdw_crc8_lookup_msb, &b1, 1, crc); + } + /* handle remaining byte if it exists */ + if (num_bytes & 1) { + val = *data; + + b0 = val & 0xff; + + *buffer++ = b0; + crc = crc8(sdw_crc8_lookup_msb, &b0, 1, crc); + } + return crc; +} + +int sdw_cdns_check_read_response(struct device *dev, + u8 *dma_buffer, int dma_buffer_size, + u8 *buffer, int buffer_size, + int num_frames, int data_per_frame) +{ + int total_num_bytes = 0; + u32 *p_data; + u8 *p_buf; + int counter; + u32 header; + u32 footer; + u8 expected_crc; + u8 crc; + int len; + int ret; + int i; + + counter = CDNS_BTP_ROLLING_COUNTER_START; + p_data = (u32 *)dma_buffer; + p_buf = buffer; + + for (i = 0; i < num_frames; i++) { + header = *p_data++; + + ret = check_frame_start(header, counter); + if (ret < 0) { + dev_err(dev, "%s: bad frame %d/%d start header %x\n", + __func__, i, num_frames, header); + return ret; + } + + len = data_per_frame; + if (total_num_bytes + data_per_frame > buffer_size) + len = buffer_size - total_num_bytes; + + crc = extract_read_data(p_data, len, p_buf); + + p_data += (len + 1) / 2; + expected_crc = *p_data++ & 0xff; + + if (crc != expected_crc) { + dev_err(dev, "%s: bad frame %d/%d crc %#x expected %#x\n", + __func__, i, num_frames, crc, expected_crc); + return -EIO; + } + + p_buf += len; + total_num_bytes += len; + + footer = *p_data++; + ret = check_frame_end(footer); + if (ret < 0) { + dev_err(dev, "%s: bad frame %d/%d end footer %x\n", + __func__, i, num_frames, footer); + return ret; + } + + counter++; + counter &= GENMASK(3, 0); + } + return 0; +} +EXPORT_SYMBOL(sdw_cdns_check_read_response); + MODULE_LICENSE("Dual BSD/GPL"); MODULE_DESCRIPTION("Cadence Soundwire Library"); diff --git a/drivers/soundwire/cadence_master.h b/drivers/soundwire/cadence_master.h index bc84435e420f..0463440d8486 100644 --- a/drivers/soundwire/cadence_master.h +++ b/drivers/soundwire/cadence_master.h @@ -202,4 +202,34 @@ void sdw_cdns_check_self_clearing_bits(struct sdw_cdns *cdns, const char *string void sdw_cdns_config_update(struct sdw_cdns *cdns); int sdw_cdns_config_update_set_wait(struct sdw_cdns *cdns);
+/* SoundWire BPT/BRA helpers to format data */ +int sdw_cdns_bpt_find_buffer_sizes(int command, /* 0: write, 1: read */ + int row, int col, + unsigned int data_bytes, + unsigned int requested_bytes_per_frame, + unsigned int *data_per_frame, + unsigned int *pdi0_buffer_size, + unsigned int *pdi1_buffer_size, + unsigned int *num_frames); + +int sdw_cdns_prepare_write_dma_buffer(u8 dev_num, u32 start_register, + u8 *data, int data_size, + int data_per_frame, + u8 *dma_buffer, int dma_buffer_size, + int *dma_buffer_total_bytes); + +int sdw_cdns_prepare_read_dma_buffer(u8 dev_num, u32 start_register, + int data_size, + int data_per_frame, + u8 *dma_buffer, int dma_buffer_size, + int *dma_buffer_total_bytes); + +int sdw_cdns_check_write_response(struct device *dev, + u8 *dma_buffer, int dma_buffer_size, + int num_frames); + +int sdw_cdns_check_read_response(struct device *dev, + u8 *dma_buffer, int dma_buffer_size, + u8 *buffer, int buffer_size, + int num_frames, int data_per_frame); #endif /* __SDW_CADENCE_H */
Mirror abstraction added for master ops.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/intel_auxdevice.c | 55 +++++++++++++++++++++++++++++ include/linux/soundwire/sdw_intel.h | 16 +++++++++ 2 files changed, 71 insertions(+)
diff --git a/drivers/soundwire/intel_auxdevice.c b/drivers/soundwire/intel_auxdevice.c index 93698532deac..1cdf4699cf2a 100644 --- a/drivers/soundwire/intel_auxdevice.c +++ b/drivers/soundwire/intel_auxdevice.c @@ -74,6 +74,56 @@ static bool is_wake_capable(struct sdw_slave *slave) return false; }
+static int generic_bpt_stream_open(struct sdw_bus *bus, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg) +{ + struct sdw_cdns *cdns = bus_to_cdns(bus); + struct sdw_intel *sdw = cdns_to_intel(cdns); + + if (sdw->link_res->hw_ops->bpt_open_stream) + return sdw->link_res->hw_ops->bpt_open_stream(sdw, slave, mode, msg); + return -ENOTSUPP; +} + +static int generic_bpt_stream_close(struct sdw_bus *bus, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg) +{ + struct sdw_cdns *cdns = bus_to_cdns(bus); + struct sdw_intel *sdw = cdns_to_intel(cdns); + + if (sdw->link_res->hw_ops->bpt_close_stream) + return sdw->link_res->hw_ops->bpt_close_stream(sdw, slave, mode, msg); + return -ENOTSUPP; +} + +static int generic_bpt_send_async(struct sdw_bus *bus, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg) +{ + struct sdw_cdns *cdns = bus_to_cdns(bus); + struct sdw_intel *sdw = cdns_to_intel(cdns); + + if (sdw->link_res->hw_ops->bpt_send_async) + return sdw->link_res->hw_ops->bpt_send_async(sdw, slave, msg); + return -ENOTSUPP; +} + +static int generic_bpt_wait(struct sdw_bus *bus, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg) +{ + struct sdw_cdns *cdns = bus_to_cdns(bus); + struct sdw_intel *sdw = cdns_to_intel(cdns); + + if (sdw->link_res->hw_ops->bpt_wait) + return sdw->link_res->hw_ops->bpt_wait(sdw, slave, msg); + return -ENOTSUPP; +} + static int generic_pre_bank_switch(struct sdw_bus *bus) { struct sdw_cdns *cdns = bus_to_cdns(bus); @@ -203,6 +253,11 @@ static struct sdw_master_ops sdw_intel_ops = { .get_device_num = intel_get_device_num_ida, .put_device_num = intel_put_device_num_ida, .new_peripheral_assigned = generic_new_peripheral_assigned, + + .bpt_open_stream = generic_bpt_stream_open, + .bpt_close_stream = generic_bpt_stream_close, + .bpt_send_async = generic_bpt_send_async, + .bpt_wait = generic_bpt_wait, };
/* diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 00bb22d96ae5..9e114df37a5b 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -423,6 +423,22 @@ struct sdw_intel_hw_ops { bool (*sync_check_cmdsync_unlocked)(struct sdw_intel *sdw);
void (*program_sdi)(struct sdw_intel *sdw, int dev_num); + + int (*bpt_open_stream)(struct sdw_intel *sdw, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg); + int (*bpt_close_stream)(struct sdw_intel *sdw, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg); + int (*bpt_send_async)(struct sdw_intel *sdw, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg); + int (*bpt_wait)(struct sdw_intel *sdw, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg); + };
extern const struct sdw_intel_hw_ops sdw_intel_cnl_hw_ops;
Add SoundWire BPT DMA helpers as a separate module to avoid circular dependencies.
For now this assumes no link DMA, only coupled mode.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- include/sound/hda-sdw-bpt.h | 76 +++++++ sound/soc/sof/intel/Kconfig | 8 +- sound/soc/sof/intel/Makefile | 4 + sound/soc/sof/intel/hda-sdw-bpt.c | 328 ++++++++++++++++++++++++++++++ 4 files changed, 415 insertions(+), 1 deletion(-) create mode 100644 include/sound/hda-sdw-bpt.h create mode 100644 sound/soc/sof/intel/hda-sdw-bpt.c
diff --git a/include/sound/hda-sdw-bpt.h b/include/sound/hda-sdw-bpt.h new file mode 100644 index 000000000000..163c4513c26f --- /dev/null +++ b/include/sound/hda-sdw-bpt.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */ +/* + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * Copyright(c) 2023 Intel Corporation. All rights reserved. + */ + +#include <linux/device.h> + +struct hdac_ext_stream; +struct snd_dma_buffer; + +#if IS_ENABLED(CONFIG_SND_SOF_SOF_HDA_SDW_BPT) +int hda_sdw_bpt_open(struct device *dev, + int link_id, + struct hdac_ext_stream **bpt_tx_stream, + struct snd_dma_buffer *dmab_tx_bdl, + u32 bpt_tx_num_bytes, + u32 tx_dma_bandwidth, + struct hdac_ext_stream **bpt_rx_stream, + struct snd_dma_buffer *dmab_rx_bdl, + u32 bpt_rx_num_bytes, + u32 rx_dma_bandwidth); + +int hda_sdw_bpt_send_async(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct hdac_ext_stream *bpt_rx_stream); + +int hda_sdw_bpt_wait(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct hdac_ext_stream *bpt_rx_stream); + +int hda_sdw_bpt_close(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct snd_dma_buffer *dmab_tx_bdl, + struct hdac_ext_stream *bpt_rx_stream, + struct snd_dma_buffer *dmab_rx_bdl); +#else +static inline int hda_sdw_bpt_open(struct device *dev, + int link_id, + struct hdac_ext_stream **bpt_tx_stream, + struct snd_dma_buffer *dmab_tx_bdl, + u32 bpt_tx_num_bytes, + u32 tx_dma_bandwidth, + struct hdac_ext_stream **bpt_rx_stream, + struct snd_dma_buffer *dmab_rx_bdl, + u32 bpt_rx_num_bytes, + u32 rx_dma_bandwidth) +{ + return 0; +} + +static inline int hda_sdw_bpt_send_async(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct hdac_ext_stream *bpt_rx_stream) +{ + return 0; +} + +static inline int hda_sdw_bpt_wait(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct hdac_ext_stream *bpt_rx_stream) +{ + return 0; +} + +static inline int hda_sdw_bpt_close(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct snd_dma_buffer *dmab_tx_bdl, + struct hdac_ext_stream *bpt_rx_stream, + struct snd_dma_buffer *dmab_rx_bdl) +{ + return 0; +} +#endif diff --git a/sound/soc/sof/intel/Kconfig b/sound/soc/sof/intel/Kconfig index 3396bd46b778..d2650348987b 100644 --- a/sound/soc/sof/intel/Kconfig +++ b/sound/soc/sof/intel/Kconfig @@ -268,6 +268,7 @@ config SND_SOC_SOF_INTEL_LNL tristate select SND_SOC_SOF_HDA_GENERIC select SND_SOC_SOF_INTEL_SOUNDWIRE_LINK_BASELINE + select SND_SOF_SOF_HDA_SDW_BPT if SND_SOC_SOF_INTEL_SOUNDWIRE select SND_SOC_SOF_IPC4 select SND_SOC_SOF_INTEL_MTL
@@ -325,12 +326,17 @@ config SND_SOC_SOF_HDA_AUDIO_CODEC
endif ## SND_SOC_SOF_HDA_GENERIC
+config SND_SOF_SOF_HDA_SDW_BPT + tristate + help + This option is not user-selectable but automagically handled by + 'select' statements at a higher level. + config SND_SOC_SOF_HDA_LINK_BASELINE tristate select SND_SOC_SOF_HDA if SND_SOC_SOF_HDA_LINK help This option is not user-selectable but automagically handled by - 'select' statements at a higher level.
config SND_SOC_SOF_HDA tristate diff --git a/sound/soc/sof/intel/Makefile b/sound/soc/sof/intel/Makefile index 806df08e3fd5..de1e37a76c29 100644 --- a/sound/soc/sof/intel/Makefile +++ b/sound/soc/sof/intel/Makefile @@ -12,6 +12,8 @@ snd-sof-intel-hda-generic-objs := hda.o hda-common-ops.o
snd-sof-intel-hda-mlink-objs := hda-mlink.o
+snd-sof-intel-hda-sdw-bpt-objs := hda-sdw-bpt.o + snd-sof-intel-hda-common-$(CONFIG_SND_SOC_SOF_HDA_PROBES) += hda-probes.o
snd-sof-intel-hda-objs := hda-codec.o @@ -26,6 +28,8 @@ obj-$(CONFIG_SND_SOC_SOF_HDA_GENERIC) += snd-sof-intel-hda-generic.o obj-$(CONFIG_SND_SOC_SOF_HDA_MLINK) += snd-sof-intel-hda-mlink.o obj-$(CONFIG_SND_SOC_SOF_HDA) += snd-sof-intel-hda.o
+obj-$(CONFIG_SND_SOF_SOF_HDA_SDW_BPT) += snd-sof-intel-hda-sdw-bpt.o + snd-sof-pci-intel-tng-objs := pci-tng.o snd-sof-pci-intel-skl-objs := pci-skl.o skl.o hda-loader-skl.o snd-sof-pci-intel-apl-objs := pci-apl.o apl.o diff --git a/sound/soc/sof/intel/hda-sdw-bpt.c b/sound/soc/sof/intel/hda-sdw-bpt.c new file mode 100644 index 000000000000..980c75c3a6bd --- /dev/null +++ b/sound/soc/sof/intel/hda-sdw-bpt.c @@ -0,0 +1,328 @@ +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) +// +// This file is provided under a dual BSD/GPLv2 license. When using or +// redistributing this file, you may do so under either license. +// +// Copyright(c) 2024 Intel Corporation. All rights reserved. +// + +/* + * Hardware interface for SoundWire BPT support with HDA DMA + */ + +#include <sound/hdaudio_ext.h> +#include <sound/hda-mlink.h> +#include <sound/hda-sdw-bpt.h> +#include <sound/sof.h> +#include "../ops.h" +#include "../sof-priv.h" +#include "hda.h" + +static int hda_sdw_bpt_dma_prepare(struct device *dev, + struct hdac_ext_stream **sdw_bpt_stream, + struct snd_dma_buffer *dmab_bdl, + u32 bpt_num_bytes, + unsigned int num_channels, + int direction) +{ + struct snd_sof_dev *sdev = dev_get_drvdata(dev); + struct hdac_ext_stream *bpt_stream; + unsigned int format = HDA_CL_STREAM_FORMAT; + + /* + * the baseline format needs to be adjusted to + * bandwidth requirements + */ + format |= (num_channels - 1); + + dev_dbg(dev, "direction %d format_val %x\n", direction, format); + + bpt_stream = hda_cl_prepare(dev, format, + bpt_num_bytes, + dmab_bdl, + direction, + false); + if (IS_ERR(bpt_stream)) { + dev_err(sdev->dev, "%s: SDW BPT DMA prepare failed: dir %d\n", + __func__, direction); + return PTR_ERR(bpt_stream); + } + *sdw_bpt_stream = bpt_stream; + + if (hdac_stream(bpt_stream)->direction == SNDRV_PCM_STREAM_PLAYBACK) { + struct hdac_bus *bus = sof_to_bus(sdev); + struct hdac_ext_link *hlink; + int stream_tag; + + stream_tag = hdac_stream(bpt_stream)->stream_tag; + hlink = hdac_bus_eml_sdw_get_hlink(bus); + + snd_hdac_ext_bus_link_set_stream_id(hlink, stream_tag); + } + return 0; +} + +static int hda_sdw_bpt_dma_deprepare(struct device *dev, + struct hdac_ext_stream *sdw_bpt_stream, + struct snd_dma_buffer *dmab_bdl) +{ + struct snd_sof_dev *sdev = dev_get_drvdata(dev); + int ret; + + ret = hda_cl_cleanup(sdev->dev, dmab_bdl, sdw_bpt_stream); + if (ret < 0) { + dev_err(sdev->dev, "%s: SDW BPT DMA cleanup failed\n", + __func__); + return ret; + } + + if (hdac_stream(sdw_bpt_stream)->direction == SNDRV_PCM_STREAM_PLAYBACK) { + struct hdac_bus *bus = sof_to_bus(sdev); + struct hdac_ext_link *hlink; + int stream_tag; + + stream_tag = hdac_stream(sdw_bpt_stream)->stream_tag; + hlink = hdac_bus_eml_sdw_get_hlink(bus); + + snd_hdac_ext_bus_link_clear_stream_id(hlink, stream_tag); + } + + return 0; +} + +static int hda_sdw_bpt_dma_enable(struct device *dev, + struct hdac_ext_stream *sdw_bpt_stream) +{ + struct snd_sof_dev *sdev = dev_get_drvdata(dev); + int ret; + + ret = hda_cl_trigger(sdev->dev, sdw_bpt_stream, SNDRV_PCM_TRIGGER_START); + if (ret < 0) { + dev_err(sdev->dev, "%s: SDW BPT DMA trigger start failed\n", __func__); + return ret; + } + return 0; +} + +static int hda_sdw_bpt_dma_disable(struct device *dev, + struct hdac_ext_stream *sdw_bpt_stream) +{ + struct snd_sof_dev *sdev = dev_get_drvdata(dev); + int ret; + + ret = hda_cl_trigger(sdev->dev, sdw_bpt_stream, SNDRV_PCM_TRIGGER_STOP); + if (ret < 0) { + dev_err(sdev->dev, "%s: SDW BPT DMA trigger stop failed\n", __func__); + return ret; + } + return 0; +} + +int hda_sdw_bpt_open(struct device *dev, + int link_id, + struct hdac_ext_stream **bpt_tx_stream, + struct snd_dma_buffer *dmab_tx_bdl, + u32 bpt_tx_num_bytes, + u32 tx_dma_bandwidth, + struct hdac_ext_stream **bpt_rx_stream, + struct snd_dma_buffer *dmab_rx_bdl, + u32 bpt_rx_num_bytes, + u32 rx_dma_bandwidth) +{ + struct snd_sof_dev *sdev = dev_get_drvdata(dev); + unsigned int num_channels_tx; + unsigned int num_channels_rx; + int ret2; + int ret1; + int ret; + + num_channels_tx = DIV_ROUND_UP(tx_dma_bandwidth, 48000 * 32); + + ret = hda_sdw_bpt_dma_prepare(dev, + bpt_tx_stream, + dmab_tx_bdl, + bpt_tx_num_bytes, + num_channels_tx, + SNDRV_PCM_STREAM_PLAYBACK); + if (ret < 0) { + dev_err(dev, "%s: hda_sdw_bpt_dma_prepare failed for TX: %d\n", + __func__, ret); + return ret; + } + + num_channels_rx = DIV_ROUND_UP(rx_dma_bandwidth, 48000 * 32); + + ret = hda_sdw_bpt_dma_prepare(dev, + bpt_rx_stream, + dmab_rx_bdl, + bpt_rx_num_bytes, + num_channels_rx, + SNDRV_PCM_STREAM_CAPTURE); + if (ret < 0) { + dev_err(dev, "%s: hda_sdw_bpt_dma_prepare failed for RX: %d\n", + __func__, ret); + + ret1 = hda_sdw_bpt_dma_deprepare(dev, + *bpt_tx_stream, + dmab_tx_bdl); + if (ret1 < 0) + dev_err(dev, "%s: hda_sdw_bpt_dma_deprepare failed for TX: %d\n", + __func__, ret1); + return ret; + } + + /* we need to map the channels in PCMSyCM registers */ + ret = hdac_bus_eml_sdw_map_stream_ch(sof_to_bus(sdev), + link_id, + 0, /* cpu_dai->id -> PDI0 */ + GENMASK(num_channels_tx - 1, 0), + hdac_stream(*bpt_tx_stream)->stream_tag, + SNDRV_PCM_STREAM_PLAYBACK); + if (ret < 0) + dev_err(dev, "%s: hdac_bus_eml_sdw_map_stream_ch failed for TX: %d\n", + __func__, ret); + + ret1 = hdac_bus_eml_sdw_map_stream_ch(sof_to_bus(sdev), + link_id, + 1, /* cpu_dai->id -> PDI1 */ + GENMASK(num_channels_rx - 1, 0), + hdac_stream(*bpt_rx_stream)->stream_tag, + SNDRV_PCM_STREAM_CAPTURE); + if (ret1 < 0) + dev_err(dev, "%s: hdac_bus_eml_sdw_map_stream_ch failed for RX: %d\n", + __func__, ret1); + + if (!ret) + ret = ret1; + + if (ret < 0) { + ret2 = hda_sdw_bpt_close(dev, + *bpt_tx_stream, + dmab_tx_bdl, + *bpt_rx_stream, + dmab_rx_bdl); + if (ret2 < 0) + dev_err(dev, "%s: hda_sdw_bpt_close failed: %d\n", + __func__, ret2); + return ret; + } + + return 0; +} +EXPORT_SYMBOL_NS(hda_sdw_bpt_open, SND_SOC_SOF_INTEL_HDA_SDW_BPT); + +int hda_sdw_bpt_send_async(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct hdac_ext_stream *bpt_rx_stream) +{ + int ret1; + int ret; + + ret = hda_sdw_bpt_dma_enable(dev, bpt_tx_stream); + if (ret < 0) { + dev_err(dev, "%s: hda_sdw_bpt_dma_enable failed for TX: %d\n", + __func__, ret); + return ret; + } + + ret = hda_sdw_bpt_dma_enable(dev, bpt_rx_stream); + if (ret < 0) { + dev_err(dev, "%s: hda_sdw_bpt_dma_enable failed for RX: %d\n", + __func__, ret); + + ret1 = hda_sdw_bpt_dma_disable(dev, bpt_tx_stream); + if (ret1 < 0) + dev_err(dev, "%s: hda_sdw_bpt_dma_disable failed for TX: %d\n", + __func__, ret1); + return ret; + } + + return 0; +} +EXPORT_SYMBOL_NS(hda_sdw_bpt_send_async, SND_SOC_SOF_INTEL_HDA_SDW_BPT); + +/* + * 3s is several orders of magnitude larger than what is needed for a + * typical firmware download. + */ +#define HDA_BPT_IOC_TIMEOUT_MS 3000 + +int hda_sdw_bpt_wait(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct hdac_ext_stream *bpt_rx_stream) +{ + struct sof_intel_hda_stream *hda_tx_stream; + struct sof_intel_hda_stream *hda_rx_stream; + unsigned long time_tx_left; + unsigned long time_rx_left; + snd_pcm_uframes_t tx_position; + snd_pcm_uframes_t rx_position; + int ret = 0; + int ret1; + + hda_tx_stream = container_of(bpt_tx_stream, struct sof_intel_hda_stream, + hext_stream); + hda_rx_stream = container_of(bpt_rx_stream, struct sof_intel_hda_stream, + hext_stream); + + time_tx_left = wait_for_completion_timeout(&hda_tx_stream->ioc, + msecs_to_jiffies(HDA_BPT_IOC_TIMEOUT_MS)); + if (!time_tx_left) { + tx_position = hda_dsp_stream_get_position(hdac_stream(bpt_tx_stream), + SNDRV_PCM_STREAM_PLAYBACK, + false); + dev_err(dev, "%s: SDW BPT TX DMA did not complete: %ld\n", + __func__, tx_position); + ret = -ETIMEDOUT; + } + + /* the wait should be minimal here */ + time_rx_left = wait_for_completion_timeout(&hda_rx_stream->ioc, + msecs_to_jiffies(HDA_BPT_IOC_TIMEOUT_MS)); + if (!time_rx_left) { + rx_position = hda_dsp_stream_get_position(hdac_stream(bpt_rx_stream), + SNDRV_PCM_STREAM_CAPTURE, + false); + dev_err(dev, "%s: SDW BPT RX DMA did not complete: %ld\n", + __func__, rx_position); + ret = -ETIMEDOUT; + } + + ret1 = hda_sdw_bpt_dma_disable(dev, bpt_rx_stream); + if (!ret) + ret = ret1; + + ret1 = hda_sdw_bpt_dma_disable(dev, bpt_tx_stream); + if (!ret) + ret = ret1; + + return ret; +} +EXPORT_SYMBOL_NS(hda_sdw_bpt_wait, SND_SOC_SOF_INTEL_HDA_SDW_BPT); + +int hda_sdw_bpt_close(struct device *dev, + struct hdac_ext_stream *bpt_tx_stream, + struct snd_dma_buffer *dmab_tx_bdl, + struct hdac_ext_stream *bpt_rx_stream, + struct snd_dma_buffer *dmab_rx_bdl) +{ + int ret; + int ret1; + + ret = hda_sdw_bpt_dma_deprepare(dev, + bpt_tx_stream, + dmab_tx_bdl); + + ret1 = hda_sdw_bpt_dma_deprepare(dev, + bpt_rx_stream, + dmab_rx_bdl); + if (!ret) + ret = ret1; + + return ret; +} +EXPORT_SYMBOL_NS(hda_sdw_bpt_close, SND_SOC_SOF_INTEL_HDA_SDW_BPT); + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_IMPORT_NS(SND_SOC_SOF_INTEL_HDA_COMMON); +MODULE_IMPORT_NS(SND_SOC_SOF_HDA_MLINK);
This is needed to be shared between open/send_async/close.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/intel.h | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/soundwire/intel.h b/drivers/soundwire/intel.h index 511932c55216..2289719f318c 100644 --- a/drivers/soundwire/intel.h +++ b/drivers/soundwire/intel.h @@ -48,11 +48,23 @@ struct sdw_intel_link_res { struct hdac_bus *hbus; };
+struct sdw_intel_bpt { + struct hdac_ext_stream *bpt_tx_stream; + struct snd_dma_buffer dmab_tx_bdl; + struct hdac_ext_stream *bpt_rx_stream; + struct snd_dma_buffer dmab_rx_bdl; + unsigned int pdi0_buffer_size; + unsigned int pdi1_buffer_size; + unsigned int num_frames; + unsigned int data_per_frame; +}; + struct sdw_intel { struct sdw_cdns cdns; int instance; struct sdw_intel_link_res *link_res; bool startup_done; + struct sdw_intel_bpt bpt_ctx; #ifdef CONFIG_DEBUG_FS struct dentry *debugfs; #endif
Add support for BTP API using Cadence and hda-sdw-bpt helpers.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/intel_ace2x.c | 377 ++++++++++++++++++++++++++++++++ 1 file changed, 377 insertions(+)
diff --git a/drivers/soundwire/intel_ace2x.c b/drivers/soundwire/intel_ace2x.c index fb64c6d37ab0..f2f5dd503572 100644 --- a/drivers/soundwire/intel_ace2x.c +++ b/drivers/soundwire/intel_ace2x.c @@ -12,12 +12,383 @@ #include <linux/soundwire/sdw_intel.h> #include <sound/hdaudio.h> #include <sound/hda-mlink.h> +#include <sound/hda-sdw-bpt.h> #include <sound/hda_register.h> #include <sound/pcm_params.h> #include "cadence_master.h" #include "bus.h" #include "intel.h"
+static int sdw_slave_bpt_stream_add(struct sdw_slave *slave, + struct sdw_stream_runtime *stream) +{ + struct sdw_stream_config sconfig = {0}; + struct sdw_port_config pconfig = {0}; + int ret; + + /* arbitrary configuration */ + sconfig.frame_rate = 16000; + sconfig.ch_count = 1; + sconfig.bps = 32; /* this is required for BPT/BRA */ + sconfig.direction = SDW_DATA_DIR_RX; + sconfig.type = SDW_STREAM_BPT; + + pconfig.num = 0; + pconfig.ch_mask = BIT(0); + + ret = sdw_stream_add_slave(slave, &sconfig, + &pconfig, 1, stream); + if (ret) { + dev_err(&slave->dev, "%s: failed: %d\n", __func__, ret); + return ret; + } + + return 0; +} + +static int intel_ace2x_bpt_open_stream(struct sdw_intel *sdw, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg) +{ + struct sdw_cdns *cdns = &sdw->cdns; + struct sdw_bus *bus = &cdns->bus; + struct sdw_master_prop *prop = &bus->prop; + struct sdw_stream_runtime *stream; + struct sdw_stream_config sconfig; + struct sdw_port_config *pconfig; + unsigned int pdi0_buffer_size; + unsigned int tx_dma_bandwidth; + unsigned int pdi1_buffer_size; + unsigned int rx_dma_bandwidth; + unsigned int num_frames; + unsigned int data_per_frame; + unsigned int tx_total_bytes; + struct sdw_cdns_pdi *pdi0; + struct sdw_cdns_pdi *pdi1; + const int ch = 1; + const int num_pdi = 2; + int command; + int ret1; + int ret; + int dir; + int i; + + if (mode != SDW_BRA) + return -EINVAL; + + stream = sdw_alloc_stream("BPT", SDW_STREAM_BPT); + if (!stream) + return -ENOMEM; + cdns->bus.bpt_stream = stream; + + ret = sdw_slave_bpt_stream_add(slave, stream); + if (ret < 0) + goto release_stream; + + /* handle PDI0 first */ + dir = SDW_DATA_DIR_TX; + + pdi0 = sdw_cdns_alloc_pdi(cdns, &cdns->pcm, ch, dir, 0); + if (!pdi0) { + dev_err(cdns->dev, "%s: sdw_cdns_alloc_pdi0 failed\n", __func__); + ret = -EINVAL; + goto remove_slave; + } + + sdw_cdns_config_stream(cdns, ch, dir, pdi0); + + /* handle PDI1 */ + dir = SDW_DATA_DIR_RX; + + pdi1 = sdw_cdns_alloc_pdi(cdns, &cdns->pcm, ch, dir, 1); + if (!pdi1) { + dev_err(cdns->dev, "%s: sdw_cdns_alloc_pdi1 failed\n", __func__); + ret = -EINVAL; + goto remove_slave; + } + + sdw_cdns_config_stream(cdns, ch, dir, pdi1); + + /* + * the port config direction, number of channels and frame + * rate is totally arbitrary + */ + sconfig.direction = dir; + sconfig.ch_count = ch; + sconfig.frame_rate = 16000; + sconfig.type = SDW_STREAM_BPT; + sconfig.bps = 32; /* this is required for BPT/BRA */ + + /* Port configuration */ + pconfig = kcalloc(num_pdi, sizeof(*pconfig), GFP_KERNEL); + if (!pconfig) { + ret = -ENOMEM; + goto remove_slave; + } + + for (i = 0; i < num_pdi; i++) { + pconfig[i].num = i; + pconfig[i].ch_mask = ch; + } + + ret = sdw_stream_add_master(&cdns->bus, &sconfig, + pconfig, num_pdi, stream); + kfree(pconfig); + + if (ret < 0) { + dev_err(cdns->dev, "add master to stream failed:%d\n", ret); + goto remove_slave; + } + + ret = sdw_prepare_stream(cdns->bus.bpt_stream); + if (ret < 0) + goto remove_master; + + command = (msg->flags & SDW_MSG_FLAG_WRITE) ? 0 : 1; + + ret = sdw_cdns_bpt_find_buffer_sizes(command, + cdns->bus.params.row, + cdns->bus.params.col, + msg->len, + SDW_BPT_MSG_MAX_BYTES, + &data_per_frame, + &pdi0_buffer_size, + &pdi1_buffer_size, + &num_frames); + if (ret < 0) + goto deprepare_stream; + + sdw->bpt_ctx.pdi0_buffer_size = pdi0_buffer_size; + sdw->bpt_ctx.pdi1_buffer_size = pdi1_buffer_size; + sdw->bpt_ctx.num_frames = num_frames; + sdw->bpt_ctx.data_per_frame = data_per_frame; + tx_dma_bandwidth = div_u64((u64)pdi0_buffer_size * 8 * (u64)prop->default_frame_rate, + num_frames); + rx_dma_bandwidth = div_u64((u64)pdi1_buffer_size * 8 * (u64)prop->default_frame_rate, + num_frames); + + dev_dbg(cdns->dev, "Message len %d transferred in %d frames (%d per frame)\n", + msg->len, num_frames, data_per_frame); + dev_dbg(cdns->dev, "sizes pdi0 %d pdi1 %d tx_bandwidth %d rx_bandwidth %d\n", + pdi0_buffer_size, pdi1_buffer_size, + tx_dma_bandwidth, + rx_dma_bandwidth); + + ret = hda_sdw_bpt_open(cdns->dev->parent, /* PCI device */ + sdw->instance, + &sdw->bpt_ctx.bpt_tx_stream, + &sdw->bpt_ctx.dmab_tx_bdl, + pdi0_buffer_size, + tx_dma_bandwidth, + &sdw->bpt_ctx.bpt_rx_stream, + &sdw->bpt_ctx.dmab_rx_bdl, + pdi1_buffer_size, + rx_dma_bandwidth); + if (ret < 0) { + dev_err(cdns->dev, "%s: hda_sdw_bpt_open failed %d\n", __func__, ret); + goto deprepare_stream; + } + + if (!command) { + ret = sdw_cdns_prepare_write_dma_buffer( + msg->dev_num, msg->addr, + msg->buf, msg->len, + data_per_frame, + sdw->bpt_ctx.dmab_tx_bdl.area, + pdi0_buffer_size, + &tx_total_bytes); + } else { + ret = sdw_cdns_prepare_read_dma_buffer( + msg->dev_num, msg->addr, + msg->len, + data_per_frame, + sdw->bpt_ctx.dmab_tx_bdl.area, + pdi0_buffer_size, + &tx_total_bytes); + + } + if (ret < 0) { + dev_err(cdns->dev, "%s: sdw_prepare_%s_dma_buffer failed %d\n", + __func__, command ? "read" : "write", ret); + goto bpt_close; + } + + return 0; + +bpt_close: + ret1 = hda_sdw_bpt_close(cdns->dev->parent, /* PCI device */ + sdw->bpt_ctx.bpt_tx_stream, + &sdw->bpt_ctx.dmab_tx_bdl, + sdw->bpt_ctx.bpt_rx_stream, + &sdw->bpt_ctx.dmab_rx_bdl); + if (ret1 < 0) + dev_err(cdns->dev, "%s: hda_sdw_bpt_close failed: ret %d\n", + __func__, ret1); + +deprepare_stream: + sdw_deprepare_stream(cdns->bus.bpt_stream); + +remove_master: + ret1 = sdw_stream_remove_master(&cdns->bus, cdns->bus.bpt_stream); + if (ret1 < 0) + dev_err(cdns->dev, "%s: remove master failed: %d\n", + __func__, ret1); + +remove_slave: + ret1 = sdw_stream_remove_slave(slave, cdns->bus.bpt_stream); + if (ret1 < 0) + dev_err(cdns->dev, "%s: remove slave failed: %d\n", + __func__, ret1); + +release_stream: + sdw_release_stream(cdns->bus.bpt_stream); + cdns->bus.bpt_stream = NULL; + + return ret; +} + +static int intel_ace2x_bpt_close_stream(struct sdw_intel *sdw, + struct sdw_slave *slave, + enum sdw_bpt_type mode, + struct sdw_bpt_msg *msg) +{ + struct sdw_cdns *cdns = &sdw->cdns; + int ret1; + int ret; + + if (mode != SDW_BRA) + return -EINVAL; + + ret = hda_sdw_bpt_close(cdns->dev->parent, /* PCI device */ + sdw->bpt_ctx.bpt_tx_stream, + &sdw->bpt_ctx.dmab_tx_bdl, + sdw->bpt_ctx.bpt_rx_stream, + &sdw->bpt_ctx.dmab_rx_bdl); + if (ret < 0) + dev_err(cdns->dev, "%s: hda_sdw_bpt_close failed: ret %d\n", + __func__, ret); + + ret1 = sdw_deprepare_stream(cdns->bus.bpt_stream); + if (ret1 < 0) { + dev_err(cdns->dev, "%s: sdw_deprepare_stream failed: ret %d\n", + __func__, ret1); + if (!ret) + ret = ret1; + } + + ret1 = sdw_stream_remove_master(&cdns->bus, cdns->bus.bpt_stream); + if (ret1 < 0) { + dev_err(cdns->dev, "%s: remove master failed: %d\n", + __func__, ret1); + if (!ret) + ret = ret1; + } + + ret1 = sdw_stream_remove_slave(slave, cdns->bus.bpt_stream); + if (ret1 < 0) { + dev_err(cdns->dev, "%s: remove slave failed: %d\n", + __func__, ret1); + if (!ret) + ret = ret1; + } + + cdns->bus.bpt_stream = NULL; + + return ret; +} + +static int intel_ace2x_bpt_send_async(struct sdw_intel *sdw, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg) +{ + struct sdw_cdns *cdns = &sdw->cdns; + int ret; + + dev_dbg(cdns->dev, "BPT Transfer start\n"); + + ret = hda_sdw_bpt_send_async(cdns->dev->parent, /* PCI device */ + sdw->bpt_ctx.bpt_tx_stream, + sdw->bpt_ctx.bpt_rx_stream); + if (ret < 0) + return ret; + + ret = sdw_enable_stream(cdns->bus.bpt_stream); + if (ret < 0) { + dev_err(cdns->dev, "%s: sdw_stream_enable failed: %d\n", + __func__, ret); + return ret; + } + + dev_info(cdns->dev, "%s: done\n", __func__); + + return 0; +} + +static int intel_ace2x_bpt_wait(struct sdw_intel *sdw, + struct sdw_slave *slave, + struct sdw_bpt_msg *msg) +{ + struct sdw_cdns *cdns = &sdw->cdns; + int ret; + + dev_dbg(cdns->dev, "BTP Transfer wait\n"); + + ret = hda_sdw_bpt_wait(cdns->dev->parent, /* PCI device */ + sdw->bpt_ctx.bpt_tx_stream, + sdw->bpt_ctx.bpt_rx_stream); + if (ret < 0) + dev_err(cdns->dev, "%s: hda_sdw_bpt_wait failed: %d\n", __func__, ret); + + ret = sdw_disable_stream(cdns->bus.bpt_stream); + if (ret < 0) { + dev_err(cdns->dev, "%s: sdw_stream_enable failed: %d\n", + __func__, ret); + return ret; + } + + if (0) { + /* dump DMA TX buffer */ + print_hex_dump_debug("DMA out: ", DUMP_PREFIX_OFFSET, + 4, 4, + sdw->bpt_ctx.dmab_tx_bdl.area, + sdw->bpt_ctx.pdi0_buffer_size, + false); + + print_hex_dump_debug("DMA in: ", DUMP_PREFIX_OFFSET, + 4, 4, + sdw->bpt_ctx.dmab_rx_bdl.area, + sdw->bpt_ctx.pdi1_buffer_size, + false); + } + + if (msg->flags & SDW_MSG_FLAG_WRITE) { + ret = sdw_cdns_check_write_response(cdns->dev, + sdw->bpt_ctx.dmab_rx_bdl.area, + sdw->bpt_ctx.pdi1_buffer_size, + sdw->bpt_ctx.num_frames); + if (ret < 0) { + dev_err(cdns->dev, "%s: BPT Write failed %d\n", __func__, ret); + return ret; + } + } else { + ret = sdw_cdns_check_read_response(cdns->dev, + sdw->bpt_ctx.dmab_rx_bdl.area, + sdw->bpt_ctx.pdi1_buffer_size, + msg->buf, msg->len, + sdw->bpt_ctx.num_frames, + sdw->bpt_ctx.data_per_frame); + if (ret < 0) { + dev_err(cdns->dev, "%s: BPT Read failed %d\n", __func__, ret); + return ret; + } + } + + dev_dbg(cdns->dev, "BPT Transfer done\n"); + + return 0; +} + /* * shim vendor-specific (vs) ops */ @@ -692,7 +1063,13 @@ const struct sdw_intel_hw_ops sdw_intel_lnl_hw_ops = { .sync_check_cmdsync_unlocked = intel_check_cmdsync_unlocked,
.program_sdi = intel_program_sdi, + + .bpt_open_stream = intel_ace2x_bpt_open_stream, + .bpt_close_stream = intel_ace2x_bpt_close_stream, + .bpt_send_async = intel_ace2x_bpt_send_async, + .bpt_wait = intel_ace2x_bpt_wait, }; EXPORT_SYMBOL_NS(sdw_intel_lnl_hw_ops, SOUNDWIRE_INTEL);
MODULE_IMPORT_NS(SND_SOC_SOF_HDA_MLINK); +MODULE_IMPORT_NS(SND_SOC_SOF_INTEL_HDA_SDW_BPT);
Add code to show what codec drivers will need to do to enable BPT/BRA transfers. The only difference is to set the 'command_type' file to 1. A zero-value will rely on regular read/write commands in Column0.
For now the code will only return error codes since the Manager side is not enabled yet.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/debugfs.c | 122 +++++++++++++++++++++++++++++++----- 1 file changed, 106 insertions(+), 16 deletions(-)
diff --git a/drivers/soundwire/debugfs.c b/drivers/soundwire/debugfs.c index 6d253d69871d..de35f5beec4d 100644 --- a/drivers/soundwire/debugfs.c +++ b/drivers/soundwire/debugfs.c @@ -138,9 +138,10 @@ static int sdw_slave_reg_show(struct seq_file *s_file, void *data) } DEFINE_SHOW_ATTRIBUTE(sdw_slave_reg);
-#define MAX_CMD_BYTES 256 +#define MAX_CMD_BYTES (1024 * 1024)
static int cmd; +static int cmd_type; static u32 start_addr; static size_t num_bytes; static u8 read_buffer[MAX_CMD_BYTES]; @@ -164,6 +165,25 @@ static int set_command(void *data, u64 value) DEFINE_DEBUGFS_ATTRIBUTE(set_command_fops, NULL, set_command, "%llu\n");
+static int set_command_type(void *data, u64 value) +{ + struct sdw_slave *slave = data; + + if (value > 1) + return -EINVAL; + + /* Userspace changed the hardware state behind the kernel's back */ + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + + dev_dbg(&slave->dev, "command type: %s\n", value ? "BRA" : "Column0"); + + cmd_type = value; + + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(set_command_type_fops, NULL, + set_command_type, "%llu\n"); + static int set_start_address(void *data, u64 value) { struct sdw_slave *slave = data; @@ -199,9 +219,64 @@ static int set_num_bytes(void *data, u64 value) DEFINE_DEBUGFS_ATTRIBUTE(set_num_bytes_fops, NULL, set_num_bytes, "%llu\n");
+static int do_bpt_sequence(struct sdw_slave *slave, bool write, + u8 *buffer) +{ + struct sdw_bpt_msg msg = {0}; + int ret1; + int ret; + + msg.addr = start_addr; + msg.len = num_bytes; + msg.dev_num = slave->dev_num; + if (write) + msg.flags = SDW_MSG_FLAG_WRITE; + else + msg.flags = SDW_MSG_FLAG_READ; + msg.buf = buffer; + + ret = sdw_bpt_open_stream(slave->bus, slave, SDW_BRA, &msg); + if (ret < 0) { + dev_err(&slave->dev, "bpt_open_stream failed: %d\n", ret); + return ret; + } + + ret = sdw_bpt_send_async(slave->bus, + slave, + &msg); + if (ret < 0) { + dev_err(&slave->dev, "bpt_send_async failed: %d\n", ret); + goto close; + } + + dev_dbg(&slave->dev, "send_async done\n"); + + ret = sdw_bpt_wait(slave->bus, + slave, + &msg); + if (ret < 0) + dev_err(&slave->dev, "%s: bpt_wait failed: %d\n", __func__, ret); + + dev_dbg(&slave->dev, "wait done\n"); + +close: + ret1 = sdw_bpt_close_stream(slave->bus, slave, SDW_BRA, &msg); + if (ret1 < 0) { + dev_err(&slave->dev, "%s: bpt_close_stream failed: %d\n", + __func__, ret1); + if (!ret) + ret = ret1; + } + + return ret; +} + static int cmd_go(void *data, u64 value) { + const struct firmware *fw = NULL; struct sdw_slave *slave = data; + ktime_t start_t; + ktime_t finish_t; int ret;
if (value != 1) @@ -218,40 +293,54 @@ static int cmd_go(void *data, u64 value) return ret; }
- /* Userspace changed the hardware state behind the kernel's back */ - add_taint(TAINT_USER, LOCKDEP_STILL_OK); - - dev_dbg(&slave->dev, "starting command\n"); - if (cmd == 0) { - const struct firmware *fw; - ret = request_firmware(&fw, firmware_file, &slave->dev); if (ret < 0) { dev_err(&slave->dev, "firmware %s not found\n", firmware_file); goto out; } - - if (fw->size != num_bytes) { + if (fw->size < num_bytes) { dev_err(&slave->dev, - "firmware %s: unexpected size %zd, desired %zd\n", + "firmware %s: firmware size %zd, desired %zd\n", firmware_file, fw->size, num_bytes); - release_firmware(fw); goto out; } + }
- ret = sdw_nwrite_no_pm(slave, start_addr, num_bytes, fw->data); - release_firmware(fw); + /* Userspace changed the hardware state behind the kernel's back */ + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + + dev_dbg(&slave->dev, "starting command\n"); + start_t = ktime_get(); + + if (cmd == 0) { + if (cmd_type) { + ret = do_bpt_sequence(slave, true, (u8 *)fw->data); + } else { + ret = sdw_nwrite_no_pm(slave, start_addr, num_bytes, fw->data); + } } else { - ret = sdw_nread_no_pm(slave, start_addr, num_bytes, read_buffer); + memset(read_buffer, 0, sizeof(read_buffer)); + + if (cmd_type) { + ret = do_bpt_sequence(slave, false, read_buffer); + } else { + ret = sdw_nread_no_pm(slave, start_addr, num_bytes, read_buffer); + } }
- dev_dbg(&slave->dev, "command completed %d\n", ret); + finish_t = ktime_get();
out: + if (fw) + release_firmware(fw); + pm_runtime_mark_last_busy(&slave->dev); pm_runtime_put(&slave->dev);
+ dev_dbg(&slave->dev, "command completed, status %d, time %lld ms\n", + ret, div_u64(finish_t - start_t, NSEC_PER_MSEC)); + return ret; } DEFINE_DEBUGFS_ATTRIBUTE(cmd_go_fops, NULL, @@ -293,6 +382,7 @@ void sdw_slave_debugfs_init(struct sdw_slave *slave)
/* interface to send arbitrary commands */ debugfs_create_file("command", 0200, d, slave, &set_command_fops); + debugfs_create_file("command_type", 0200, d, slave, &set_command_type_fops); debugfs_create_file("start_address", 0200, d, slave, &set_start_address_fops); debugfs_create_file("num_bytes", 0200, d, slave, &set_num_bytes_fops); debugfs_create_file("go", 0200, d, slave, &cmd_go_fops);
DP0 is required for BPT/BRA transport.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- sound/soc/codecs/rt711-sdca-sdw.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/soc/codecs/rt711-sdca-sdw.c b/sound/soc/codecs/rt711-sdca-sdw.c index 935e597022d3..583c839e0e49 100644 --- a/sound/soc/codecs/rt711-sdca-sdw.c +++ b/sound/soc/codecs/rt711-sdca-sdw.c @@ -225,6 +225,14 @@ static int rt711_sdca_read_prop(struct sdw_slave *slave) j++; }
+ prop->dp0_prop = devm_kzalloc(&slave->dev, sizeof(*prop->dp0_prop), + GFP_KERNEL); + if (!prop->dp0_prop) + return -ENOMEM; + + prop->dp0_prop->simple_ch_prep_sm = true; + prop->dp0_prop->ch_prep_timeout = 10; + /* set the timeout values */ prop->clk_stop_timeout = 700;
On Thu, Dec 07, 2023 at 04:29:28PM -0600, Pierre-Louis Bossart wrote:
The MIPI specification and most of the new codecs support the Bulk Transfer Protocol (BTP) and specifically the Bulk Register Access (BRA) configuration. This mode reclaims the 'audio' data space of the SoundWire frame to send firmware/coefficients over the DataPort 0 (DP0).
So the bulk register access is accessing registers that are also visible through the one register at at time interface, just faster?
On 12/7/23 16:56, Mark Brown wrote:
On Thu, Dec 07, 2023 at 04:29:28PM -0600, Pierre-Louis Bossart wrote:
The MIPI specification and most of the new codecs support the Bulk Transfer Protocol (BTP) and specifically the Bulk Register Access (BRA) configuration. This mode reclaims the 'audio' data space of the SoundWire frame to send firmware/coefficients over the DataPort 0 (DP0).
So the bulk register access is accessing registers that are also visible through the one register at at time interface, just faster?
Yes, each frame can transmit a packet with a start address, length and a bunch of data bytes protected with a CRC. With the default 50x4 frame size we use, we can send 8 contiguous bytes per frame instead of 1. With a larger frame you get even more bytes per frame.
Also because we program a large buffer with all the packets pre-formatted by software, we don't have much software overhead. The packets are streamed over DMA and inserted in the frame by hardware at the relevant time. That means waiting for one DMA complete event instead of dealing with thousands of command/responses with interrupts.
There are limitations though, if the frame is already transmitting audio data then obviously we have a conflict.
participants (4)
-
Charles Keepax
-
Mark Brown
-
Pierre-Louis Bossart
-
Vinod Koul