Re: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support in spi-nor
Hi,
Based on the inputs/suggestions from Tudor, i am planning to add a new layer between the SPI-NOR and MTD layers to support stacked and parallel configurations. This new layer will be part of the spi-nor and located in mtd/spi-nor/
Will AMD submit to maintain this layer? What happens if the maintainer will leave AMD? TBH, personally, I don't like to maintain such a niche feature. I'd really like to see some use cases and performance reports for this, like actual boards (and no evaluation boards don't count). Why wouldn't someone just use an octal flash?
And as already mentioned there is also mtdcat, which seems to duplicate some features?
-michael
Hi Michael,
On 7/26/24 14:55, Michael Walle wrote:
Hi,
Based on the inputs/suggestions from Tudor, i am planning to add a new layer between the SPI-NOR and MTD layers to support stacked and parallel configurations. This new layer will be part of the spi-nor and located in mtd/spi-nor/
Will AMD submit to maintain this layer? What happens if the maintainer will leave AMD? TBH, personally, I don't like to maintain such a niche feature. I'd really like to see some use cases and performance reports for this, like actual boards (and no evaluation boards don't count). Why wouldn't someone just use an octal flash?
AMD/Xilinx is not creating products that's why we don't have data on actual boards but I don't really understand why evaluation boards don't count. A lot of customers are taking schematics from us and removing parts which they don't need and add their custom part.
But one product for parallel configuration which is publicly saying that it is using it is for example this SOM. https://shop.trenz-electronic.de/en/TE0820-05-2AI21MA-MPSoC-Module-with-AMD-...
I am not marketing guy to tell if there is any other which publicly saying we are using this feature but we can only develop/support/maintain support for these configurations on our evaluation boards because that's what we have access to and what we know how it is done.
Also performance numbers from us can be only provided against our evaluation boards.
Thanks, Michal
Hi Michal,
Based on the inputs/suggestions from Tudor, i am planning to add a new layer between the SPI-NOR and MTD layers to support stacked and parallel configurations. This new layer will be part of the spi-nor and located in mtd/spi-nor/
Will AMD submit to maintain this layer? What happens if the maintainer will leave AMD? TBH, personally, I don't like to maintain such a niche feature. I'd really like to see some use cases and performance reports for this, like actual boards (and no evaluation boards don't count). Why wouldn't someone just use an octal flash?
AMD/Xilinx is not creating products that's why we don't have data on actual boards but I don't really understand why evaluation boards don't count.
Because on an eval board the vendor just puts everything possible on the board.
A lot of customers are taking schematics from us and removing parts which they don't need and add their custom part.
But one product for parallel configuration which is publicly saying that it is using it is for example this SOM. https://shop.trenz-electronic.de/en/TE0820-05-2AI21MA-MPSoC-Module-with-AMD-...
I am not marketing guy to tell if there is any other which publicly saying we are using this feature but we can only develop/support/maintain support for these configurations on our evaluation boards because that's what we have access to and what we know how it is done.
Also performance numbers from us can be only provided against our evaluation boards.
Which is good enough.
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
-michael
On 7/31/24 11:19, Michael Walle wrote:
Hi Michal,
Based on the inputs/suggestions from Tudor, i am planning to add a new layer between the SPI-NOR and MTD layers to support stacked and parallel configurations. This new layer will be part of the spi-nor and located in mtd/spi-nor/
Will AMD submit to maintain this layer? What happens if the maintainer will leave AMD? TBH, personally, I don't like to maintain such a niche feature. I'd really like to see some use cases and performance reports for this, like actual boards (and no evaluation boards don't count). Why wouldn't someone just use an octal flash?
AMD/Xilinx is not creating products that's why we don't have data on actual boards but I don't really understand why evaluation boards don't count.
Because on an eval board the vendor just puts everything possible on the board.
A lot of customers are taking schematics from us and removing parts which they don't need and add their custom part.
But one product for parallel configuration which is publicly saying that it is using it is for example this SOM. https://shop.trenz-electronic.de/en/TE0820-05-2AI21MA-MPSoC-Module-with-AMD-...
I am not marketing guy to tell if there is any other which publicly saying we are using this feature but we can only develop/support/maintain support for these configurations on our evaluation boards because that's what we have access to and what we know how it is done.
Also performance numbers from us can be only provided against our evaluation boards.
Which is good enough.
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
No. It is not really just feature of our evaluation boards. Customers are using it. I was talking to one guy from field and he confirms me that these configurations are used by his multiple customers in real products.
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel. And Amit is trying to figure it out which way to go. I don't want to predict where that code should be going or how it should look like because don't have spi-nor experience. But I hope we finally move forward on this topic to see the path how to upstream support for it.
Thanks, Michal
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
No. It is not really just feature of our evaluation boards. Customers are using it. I was talking to one guy from field and he confirms me that these configurations are used by his multiple customers in real products.
Which begs the question, do we really have to support every feature in the core (I'd like to hear Tudors and Pratyush opinion here). Honestly, this just looks like a concatenation of two QSPI controllers. Why didn't you just use a normal octal controller which is a protocol also backed by the JEDEC standard. Is it any faster? Do you get more capacity? Does anyone really use large SPI-NOR flashes? If so, why? I mean you've put that controller on your SoC, you must have some convincing arguments why a customer should use it.
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
And Amit is trying to figure it out which way to go. I don't want to predict where that code should be going or how it should look like because don't have spi-nor experience. But I hope we finally move forward on this topic to see the path how to upstream support for it.
You still didn't answer my initial question. Will AMD support and maintain that code? I was pushing you towards putting that code into your own driver because then that's up to you what you are doing there.
So how do we move forward? I'd like to see as little as core changes possible to get your dual flash setup working. I'm fine with having a layer in spi-nor/ (not sure that's how it will work with mtdcat which looks like it's similar as your stacked flash) as long as it can be a module (selected by the driver).
-michael
On 7/31/24 16:11, Michael Walle wrote:
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
No. It is not really just feature of our evaluation boards. Customers are using it. I was talking to one guy from field and he confirms me that these configurations are used by his multiple customers in real products.
Which begs the question, do we really have to support every feature in the core (I'd like to hear Tudors and Pratyush opinion here). Honestly, this just looks like a concatenation of two QSPI controllers.
Based on my understanding for stacked yes. For parallel no.
Why didn't you just use a normal octal controller which is a protocol also backed by the JEDEC standard.
On newer SOC octal IP core is used. Amit please comment.
Is it any faster?
Amit: please provide numbers.
Do you get more capacity? Does anyone really use large SPI-NOR flashes? If so, why?
You get twice more capacity based on that configuration. I can't answer the second question because not working with field. But both of that configurations are used by customers. Adding Neal if he wants to add something more to it.
I mean you've put that controller on your SoC, you must have some convincing arguments why a customer should use it.
I expect recommendation is to use single configuration but if you need bigger space for your application the only way to extend it is to use stacked configuration with two the same flashes next to each other. If you want to have bigger size and also be faster answer is parallel configuration.
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I am just saying that this is not any adhoc new feature but configuration which has been already discussed and some steps made. If DT binding is wrong it can be deprecated and use new one but for that it has be clear which way to go.
And Amit is trying to figure it out which way to go. I don't want to predict where that code should be going or how it should look like because don't have spi-nor experience. But I hope we finally move forward on this topic to see the path how to upstream support for it.
You still didn't answer my initial question. Will AMD support and maintain that code? I was pushing you towards putting that code into your own driver because then that's up to you what you are doing there.
Of course. We care about our code and about supporting our SOC and features which are related to it. It means yes, we will be regularly testing it and taking care about it.
So how do we move forward? I'd like to see as little as core changes possible to get your dual flash setup working. I'm fine with having a layer in spi-nor/ (not sure that's how it will work with mtdcat which looks like it's similar as your stacked flash) as long as it can be a module (selected by the driver).
ok.
Thanks, Michal
Hi Michael,
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
No. It is not really just feature of our evaluation boards. Customers are using it. I was talking to one guy from field and he confirms me that these configurations are used by his multiple customers in real products.
Which begs the question, do we really have to support every feature in the core (I'd like to hear Tudors and Pratyush opinion here). Honestly, this just looks like a concatenation of two QSPI controllers.
Based on my understanding for stacked yes. For parallel no.
Why didn't you just use a normal octal controller which is a protocol also backed by the JEDEC standard.
On newer SOC octal IP core is used. Amit please comment.
Is it any faster?
Amit: please provide numbers.
Do you get more capacity? Does anyone really use large SPI-NOR flashes? If so, why?
You get twice more capacity based on that configuration. I can't answer the second question because not working with field. But both of that configurations are used by customers. Adding Neal if he wants to add something more to it.
Just to add a comment as I work directly with our customers. The main reason this support is important is for our older SoCs, zynq and zynqmp.
Most of our customers are using QSPI flash as the first boot memory to get from the boot ROM to u-boot. They then typically use other memories, such as eMMC for the Linux kernel, OS and file system.
The issue we have on the zynq and zynqmp SoCs is that the boot ROM (code that cannot be changed) will not boot from an OSPI flash. It will only boot from a QSPI flash. This is what is forcing many of our customers down the QSPI path. Since many of these customers are interested in additional speed and memory size, they then end up using a parallel or stacked configuration because they cannot use an OSPI with zynq or zynqmp.
All of our newer SoCs can boot from OSPI. I agree with you that if someone could choose OSPI for performance, they would, so I do not expect parallel or stacked configurations with our newer SoCs.
I get why you see this configuration as a niche, but for us, it is a very large niche because zynq and zynqmp are two of our most successful SoC families.
I mean you've put that controller on your SoC, you must have some convincing arguments why a customer should use it.
I expect recommendation is to use single configuration but if you need bigger space for your application the only way to extend it is to use stacked configuration with two the same flashes next to each other. If you want to have bigger size and also be faster answer is parallel configuration.
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I am just saying that this is not any adhoc new feature but configuration which has been already discussed and some steps made. If DT binding is wrong it can be deprecated and use new one but for that it has be clear which way to go.
And Amit is trying to figure it out which way to go. I don't want to predict where that code should be going or how it should look like because don't have spi-nor experience. But I hope we finally move forward on this topic to see the path how to upstream support for it.
You still didn't answer my initial question. Will AMD support and maintain that code? I was pushing you towards putting that code into your own driver because then that's up to you what you are doing there.
Of course. We care about our code and about supporting our SOC and features which are related to it. It means yes, we will be regularly testing it and taking care about it.
So how do we move forward? I'd like to see as little as core changes possible to get your dual flash setup working. I'm fine with having a layer in spi-nor/ (not sure that's how it will work with mtdcat which looks like it's similar as your stacked flash) as long as it can be a module (selected by the driver).
ok.
Best regards, Neal Frager AMD
Hello,
-----Original Message----- From: Frager, Neal neal.frager@amd.com Sent: Thursday, August 1, 2024 12:07 PM To: Simek, Michal michal.simek@amd.com; Michael Walle michael@walle.cc; Mahapatra, Amit Kumar <amit.kumar- mahapatra@amd.com>; Tudor Ambarus tudor.ambarus@linaro.org; broonie@kernel.org; pratyush@kernel.org; miquel.raynal@bootlin.com; richard@nod.at; vigneshr@ti.com; sbinding@opensource.cirrus.com; lee@kernel.org; james.schulman@cirrus.com; david.rhodes@cirrus.com; rf@opensource.cirrus.com; perex@perex.cz; tiwai@suse.com Cc: linux-spi@vger.kernel.org; linux-kernel@vger.kernel.org; linux- mtd@lists.infradead.org; nicolas.ferre@microchip.com; alexandre.belloni@bootlin.com; claudiu.beznea@tuxon.dev; linux-arm- kernel@lists.infradead.org; alsa-devel@alsa-project.org; patches@opensource.cirrus.com; linux-sound@vger.kernel.org; git (AMD- Xilinx) git@amd.com; amitrkcian2002@gmail.com; Conor Dooley conor.dooley@microchip.com; beanhuo@micron.com Subject: RE: [PATCH v11 07/10] mtd: spi-nor: Add stacked memories support in spi-nor
Hi Michael,
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
No. It is not really just feature of our evaluation boards. Customers are using it. I was talking to one guy from field and he confirms me that these configurations are used by his multiple customers in real
products.
Which begs the question, do we really have to support every feature in the core (I'd like to hear Tudors and Pratyush opinion here). Honestly, this just looks like a concatenation of two QSPI controllers.
Based on my understanding for stacked yes. For parallel no.
Why didn't you just use a normal octal controller which is a protocol also backed by the JEDEC standard.
On newer SOC octal IP core is used. Amit please comment.
Is it any faster?
Amit: please provide numbers.
Here are some QSPI performance numbers comparing a single flash mode and two flashes connected in parallel mode. I ran the test on a VCK190 Eval Board https://www.xilinx.com/products/boards-and-kits/vck190.html, measuring the timing for mtd_debug erase, write, and read operations. The QSPI bus clock was set to 150MHz, and the data size was 32MB, comparing a single flash setup with a two-flash parallel mode setup.
Single Flash mode:
xilinx-vck190-20242:/home/petalinux# time mtd_debug erase /dev/mtd2 0x00 0x1e00000 Erased 31457280 bytes from address 0x00000000 in flash
real 0m54.713s user 0m0.000s sys 0m32.639s xilinx-vck190-20242:/home/petalinux# time mtd_debug write /dev/mtd2 0x00 0x1e00000 test.bin Copied 31457280 bytes from test.bin to address 0x00000000 in flash
real 0m30.187s user 0m0.000s sys 0m16.359s xilinx-vck190-20242:/home/petalinux# time mtd_debug read /dev/mtd2 0x00 0x1e00000 test-read.bin Copied 31457280 bytes from address 0x00000000 in flash to test-read.bin
real 0m0.472s user 0m0.001s sys 0m0.040s
Two flashes connected in parallel mode:
xilinx-vck190-20242:/home/petalinux# time mtd_debug erase /dev/mtd2 0x00 0x1e00000 Erased 31457280 bytes from address 0x00000000 in flash
real 0m27.727s user 0m0.004s sys 0m14.923s xilinx-vck190-20242:/home/petalinux# time mtd_debug write /dev/mtd2 0x00 0x1e00000 test.bin Copied 31457280 bytes from test.bin to address 0x00000000 in flash
real 0m16.538s user 0m0.000s sys 0m8.512s xilinx-vck190-20242:/home/petalinux# time mtd_debug read /dev/mtd2 0x00 0x1e00000 test-read.bin Copied 31457280 bytes from address 0x00000000 in flash to test-read.bin
real 0m0.263s user 0m0.000s sys 0m0.044s
Regards, Amit
Do you get more capacity? Does anyone really use large SPI-NOR flashes? If so, why?
You get twice more capacity based on that configuration. I can't answer the second question because not working with field. But both of that configurations are used by customers. Adding Neal if he wants to add
something more to it.
Just to add a comment as I work directly with our customers. The main reason this support is important is for our older SoCs, zynq and zynqmp.
Most of our customers are using QSPI flash as the first boot memory to get from the boot ROM to u-boot. They then typically use other memories, such as eMMC for the Linux kernel, OS and file system.
The issue we have on the zynq and zynqmp SoCs is that the boot ROM (code that cannot be changed) will not boot from an OSPI flash. It will only boot from a QSPI flash. This is what is forcing many of our customers down the QSPI path. Since many of these customers are interested in additional speed and memory size, they then end up using a parallel or stacked configuration because they cannot use an OSPI with zynq or zynqmp.
All of our newer SoCs can boot from OSPI. I agree with you that if someone could choose OSPI for performance, they would, so I do not expect parallel or stacked configurations with our newer SoCs.
I get why you see this configuration as a niche, but for us, it is a very large niche because zynq and zynqmp are two of our most successful SoC families.
I mean you've put that controller on your SoC, you must have some convincing arguments why a customer should use it.
I expect recommendation is to use single configuration but if you need bigger space for your application the only way to extend it is to use stacked configuration with two the same flashes next to each other. If you want to have bigger size and also be faster answer is parallel configuration.
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I am just saying that this is not any adhoc new feature but configuration which has been already discussed and some steps made. If DT binding is wrong it can be deprecated and use new one but for that it
has be clear which way to go.
And Amit is trying to figure it out which way to go. I don't want to predict where that code should be going or how it should look like because don't have spi-nor experience. But I hope we finally move forward on this topic to see the path how to upstream
support for it.
You still didn't answer my initial question. Will AMD support and maintain that code? I was pushing you towards putting that code into your own driver because then that's up to you what you are doing there.
Of course. We care about our code and about supporting our SOC and features which are related to it. It means yes, we will be regularly testing it and taking care about it.
So how do we move forward? I'd like to see as little as core changes possible to get your dual flash setup working. I'm fine with having a layer in spi-nor/ (not sure that's how it will work with mtdcat which looks like it's similar as your stacked flash) as long as it can be a module (selected by the driver).
ok.
Best regards, Neal Frager AMD
Hi,
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
No. It is not really just feature of our evaluation boards. Customers are using it. I was talking to one guy from field and he confirms me that these configurations are used by his multiple customers in real products.
Which begs the question, do we really have to support every feature in the core (I'd like to hear Tudors and Pratyush opinion here). Honestly, this just looks like a concatenation of two QSPI controllers.
Based on my understanding for stacked yes. For parallel no.
See below.
Why didn't you just use a normal octal controller which is a protocol also backed by the JEDEC standard.
On newer SOC octal IP core is used. Amit please comment.
Is it any faster?
Amit: please provide numbers.
Do you get more capacity? Does anyone really use large SPI-NOR flashes? If so, why?
You get twice more capacity based on that configuration. I can't answer the second question because not working with field. But both of that configurations are used by customers. Adding Neal if he wants to add something more to it.
I mean you've put that controller on your SoC, you must have some convincing arguments why a customer should use it.
I expect recommendation is to use single configuration but if you need bigger space for your application the only way to extend it is to use stacked configuration with two the same flashes next to each other. If you want to have bigger size and also be faster answer is parallel configuration.
But who is using expensive NOR flash for bulk storage anyway? You're only mentioning parallel mode. Also the performance numbers were just about the parallel mode. What about stacked mode? Because there's a chance that parallel mode works without modification of the core (?).
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I am just saying that this is not any adhoc new feature but configuration which has been already discussed and some steps made. If DT binding is wrong it can be deprecated and use new one but for that it has be clear which way to go.
Well, AMD could have side stepped all this if they had just integrated a normal OSPI flash controller, which would have the same requirements regarding the pins (if not even less) and it would have been *easy* to integrate it into the already available ecosystem. That was what my initial question was about. Why did you choose two QSPI ports instead of one OSPI port.
And Amit is trying to figure it out which way to go. I don't want to predict where that code should be going or how it should look like because don't have spi-nor experience. But I hope we finally move forward on this topic to see the path how to upstream support for it.
You still didn't answer my initial question. Will AMD support and maintain that code? I was pushing you towards putting that code into your own driver because then that's up to you what you are doing there.
Of course. We care about our code and about supporting our SOC and features which are related to it. It means yes, we will be regularly testing it and taking care about it.
Great!
-michael
Hi,
On 8/5/24 10:27, Michael Walle wrote:
Hi,
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
No. It is not really just feature of our evaluation boards. Customers are using it. I was talking to one guy from field and he confirms me that these configurations are used by his multiple customers in real products.
Which begs the question, do we really have to support every feature in the core (I'd like to hear Tudors and Pratyush opinion here). Honestly, this just looks like a concatenation of two QSPI controllers.
Based on my understanding for stacked yes. For parallel no.
See below.
Why didn't you just use a normal octal controller which is a protocol also backed by the JEDEC standard.
On newer SOC octal IP core is used. Amit please comment.
Is it any faster?
Amit: please provide numbers.
Do you get more capacity? Does anyone really use large SPI-NOR flashes? If so, why?
You get twice more capacity based on that configuration. I can't answer the second question because not working with field. But both of that configurations are used by customers. Adding Neal if he wants to add something more to it.
I mean you've put that controller on your SoC, you must have some convincing arguments why a customer should use it.
I expect recommendation is to use single configuration but if you need bigger space for your application the only way to extend it is to use stacked configuration with two the same flashes next to each other. If you want to have bigger size and also be faster answer is parallel configuration.
But who is using expensive NOR flash for bulk storage anyway?
I expect you understand that even if I know companies which does it I am not allow to share their names.
But customers don't need to have other free pins to connect for example emmc. That's why adding one more "expensive flash" can be for them only one option.
Also I bet that price for one more qspi flash is nothing compare to chip itself and other related expenses for low volume production.
You're only mentioning parallel mode. Also the performance numbers were just about the parallel mode. What about stacked mode? Because there's a chance that parallel mode works without modification of the core (?).
I will let Amit to comment it.
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I am just saying that this is not any adhoc new feature but configuration which has been already discussed and some steps made. If DT binding is wrong it can be deprecated and use new one but for that it has be clear which way to go.
Well, AMD could have side stepped all this if they had just integrated a normal OSPI flash controller, which would have the same requirements regarding the pins (if not even less) and it would have been *easy* to integrate it into the already available ecosystem. That was what my initial question was about. Why did you choose two QSPI ports instead of one OSPI port.
Keep in your mind that ZynqMP is 9years old SoC. Zynq 12+ years with a lot of internal development happening before. Not sure if ospi even exists at that time. Also if any IP was available for the price which they were targeting. I don't think make sense to discuss OSPI in this context because that's not in these SoCs. I have never worked with spi that's why don't know historical context to provide more details.
Thanks, Michal
Hello Michael,
On 8/5/24 10:27, Michael Walle wrote:
Hi,
All I'm saying is that you shouldn't put burden on us (the SPI NOR maintainers) for what seems to me at least as a niche. Thus I was asking for performance numbers and users. Convince me that I'm wrong and that is worth our time.
No. It is not really just feature of our evaluation boards. Customers are using it. I was talking to one guy from field and he confirms me that these configurations are used by his multiple
customers in real products.
Which begs the question, do we really have to support every feature in the core (I'd like to hear Tudors and Pratyush opinion here). Honestly, this just looks like a concatenation of two QSPI controllers.
Based on my understanding for stacked yes. For parallel no.
See below.
Why didn't you just use a normal octal controller which is a protocol also backed by the JEDEC standard.
On newer SOC octal IP core is used. Amit please comment.
Is it any faster?
Amit: please provide numbers.
Do you get more capacity? Does anyone really use large SPI-NOR flashes? If so, why?
You get twice more capacity based on that configuration. I can't answer the second question because not working with field. But both of that configurations are used by customers. Adding Neal if he wants to
add something more to it.
I mean you've put that controller on your SoC, you must have some convincing arguments why a customer should use it.
I expect recommendation is to use single configuration but if you need bigger space for your application the only way to extend it is to use stacked configuration with two the same flashes next to each other. If you want to have bigger size and also be faster answer is parallel configuration.
But who is using expensive NOR flash for bulk storage anyway?
I expect you understand that even if I know companies which does it I am not allow to share their names.
But customers don't need to have other free pins to connect for example emmc. That's why adding one more "expensive flash" can be for them only one option.
Also I bet that price for one more qspi flash is nothing compare to chip itself and other related expenses for low volume production.
You're only mentioning parallel mode. Also the performance numbers were just about the parallel mode. What about stacked mode? Because there's a chance that parallel mode works without modification of the core (?).
I will let Amit to comment it.
The performance of the stacked configuration will be the same as that of the single mode. As Michal mentioned earlier, stacked mode is used for scenarios where the customer requires larger flash space while maintaining the same performance.
I want to provide some background on why I choose to handle stacked and parallel modes through an additional layer or file, such as /mtd/spi-nor/stacked.c, rather than mtd-concat. Initially, when Miquel began upstreaming stacked support by extending the mtd-concat driver, the DT binding was not accepted. He proposed a couple of DT bindings [1] & [2] to support stacking through mtd-concat, but none were accepted. Additionally, after reviewing the MTD core code, he found that adding stacked support through mtd-concat could be complicated and involve many corner cases, which he mentioned in his RFC [3]. He then suggested concatenating the flashes instead of the mtd partitions, and eventually, the current DT bindings were added. This is why I propose handling the stacked and parallel configurations through an additional layer or file, as the mtd-concat approach was already discussed and rejected.
[1] https://lore.kernel.org/all/20191113171505.26128-4-miquel.raynal@bootlin.com... [2] https://lore.kernel.org/all/20191127105522.31445-5-miquel.raynal@bootlin.com... [3]https://lore.kernel.org/all/20211112152411.818321-1-miquel.raynal@bootlin.co...
Regards, Amit
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not
sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the
kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I am just saying that this is not any adhoc new feature but configuration which has been already discussed and some steps made. If DT binding is wrong it can be deprecated and use new one but for that it
has be clear which way to go.
Well, AMD could have side stepped all this if they had just integrated a normal OSPI flash controller, which would have the same requirements regarding the pins (if not even less) and it would have been *easy* to integrate it into the already available ecosystem. That was what my initial question was about. Why did you choose two QSPI ports instead of one OSPI port.
Keep in your mind that ZynqMP is 9years old SoC. Zynq 12+ years with a lot of internal development happening before. Not sure if ospi even exists at that time. Also if any IP was available for the price which they were targeting. I don't think make sense to discuss OSPI in this context because that's not in these SoCs. I have never worked with spi that's why don't know historical context to provide more details.
Thanks, Michal
Hi Michael,
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I'm sorry I'm now catching up, can you point at the thread explaining what is wrong in the bindings? I didn't find where this was detailed. Or otherwise summarize quickly what needs to change?
Thanks! Miquèl
Hi,
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I'm sorry I'm now catching up, can you point at the thread explaining what is wrong in the bindings? I didn't find where this was detailed. Or otherwise summarize quickly what needs to change?
Somewhere in this mega thread Tudor had some remarks about the bindings. Amit also mentioned it here:
https://lore.kernel.org/r/IA0PR12MB769944254171C39FF4171B52DCB42@IA0PR12MB76...
-michael
Hi Michael,
michael@walle.cc wrote on Mon, 12 Aug 2024 09:37:06 +0200:
Hi,
The first round of patches were really invasive regarding the core code. So if there is a clean layering approach which can be enabled as a module and you are maintaining it I'm fine with that (even if the core code needs some changes then like hooks or so, not sure).
That discussion started with Miquel some years ago when he was trying to to solve description in DT which is merged for a while in the kernel.
What's your point here? From what I can tell the DT binding is wrong and needs to be reworked anyway.
I'm sorry I'm now catching up, can you point at the thread explaining what is wrong in the bindings? I didn't find where this was detailed. Or otherwise summarize quickly what needs to change?
Somewhere in this mega thread Tudor had some remarks about the bindings. Amit also mentioned it here:
https://lore.kernel.org/r/IA0PR12MB769944254171C39FF4171B52DCB42@IA0PR12MB76...
Great. I jumped-in there. Thanks!
Miquèl
participants (5)
-
Frager, Neal
-
Mahapatra, Amit Kumar
-
Michael Walle
-
Michal Simek
-
Miquel Raynal