Enable Big/Super Pages for v3d#7285
Open
mairacanal wants to merge 702 commits intoraspberrypi:rpi-6.18.yfrom
Open
Enable Big/Super Pages for v3d#7285mairacanal wants to merge 702 commits intoraspberrypi:rpi-6.18.yfrom
mairacanal wants to merge 702 commits intoraspberrypi:rpi-6.18.yfrom
Conversation
2e9acf3 to
6ad963a
Compare
c60b428 to
8298673
Compare
There are no MEDIA_BUS_FMT_* defines for GRB or BRG, and adding them is a pain. Add a DT override to allow setting the order. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Seeing as the HVS can be configured with regard the scaling filter, and DRM now supports selecting scaling filters at a per CRTC or per plane level, we can implement it. Default remains as the Mitchell/Netravali filter, but nearest neighbour is now also implemented. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The documentation says that the TPZ filter can not upscale, and requesting a scaling factor > 1:1 will output the original image in the top left, and repeat the right/bottom most pixels thereafter. That fits perfectly with upscaling a 1x1 image which is done a fair amount by some compositors to give solid colour, and it saves a large amount of LBM (TPZ is based on src size, whilst PPF is based on dest size). Select TPZ filter for images with source rectangle <=1. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The register to enable/disable background fill was being set from atomic flush, however that will be applied immediately and can be a while before the vblank. If it was required for the current frame but not for the next one, that can result in corruption for part of the current frame. Store the state in vc4_hvs, and update it on vblank. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The HVS can accept an arbitrary number of planes, provided that the overall pixel read load is within limits, and the display list can fit into the dlist memory. Now that DRM will support 64 planes per device, increase the number of overlay planes from 16 to 48 so that the dlist complexity can be increased (eg 4x4 video wall on each of 3 displays). Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Instead of having 48 generic overlay planes, assign 32 to the writeback connector so that there is no ambiguity in wlroots when trying to find a plane for composition using the writeback connector vs display. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The transposer/writeback connector should be running with a lower priority, so shouldn't be factored into the load calculations. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
As the writeback connector doesn't have the same realtime constraints of a live display, drop the panic priority for it. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The txp block can implement transpose as it writes out the image data, so expose that through the new connector rotation property. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm: vc4: txp: Do not allow 24bpp formats when transposing The hardware doesn't support transposing to 24bpp (RGB888/BGR888) formats. There's no way to advertise this through DRM, so block it from atomic_check instead. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Currently, booting with no hdmi connected has: pi@pi4:~ $ vcgencmd measure_clock hdmi pixel frequency(9)=120010256 frequency(29)=74988280 After connecting hdmi we get: pi@pi4:~ $ vcgencmd measure_clock hdmi pixel frequency(9)=300005856 frequency(29)=149989744 and that persists after disconnecting hdmi I can measure this on a power supply as 10mA@5.2V (52mW). We should always remove clk_set_min_rate requests when we no longer need them. Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Whilst BCM2712 does fix using odd horizontal timings, it doesn't work with interlaced modes. Drop the workaround for interlaced modes and revert to the same behaviour as BCM2711. raspberrypi#6281 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
This is a squash of all firmware-kms related patches from previous branches, up to and including "drm/vc4: Set the possible crtcs mask correctly for planes with FKMS" plus a couple of minor fixups for the 5.9 branch. Please refer to earlier branches for full history. This patch includes work by Eric Anholt, James Hughes, Phil Elwell, Dave Stevenson, Dom Cobley, and Jonathon Bell. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm/vc4: Fixup firmware-kms after "drm/atomic: Pass the full state to CRTC atomic enable/disable" Prototype for those calls changed, so amend fkms (which isn't upstream) to match. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm/vc4: Fixup fkms for API change Atomic flush and check changed API, so fix up the downstream-only FKMS driver. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm/vc4: Make normalize_zpos conditional on using fkms Eric's view was that there was no point in having zpos support on vc4 as all the planes had the same functionality. Can be later squashed into (and fixes): drm/vc4: Add firmware-kms mode Signed-off-by: Dom Cobley <popcornmix@gmail.com> drm/vc4: FKMS: Change of Broadcast RGB mode needs a mode change The Broadcast RGB (aka HDMI limited/full range) property is only notified to the firmware on mode change, so this needs to be signalled when set. raspberrypi/firmware#1580 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> vc4/drv: Only notify firmware of display done with kms fkms driver still wants firmware display to be active Signed-off-by: Dom Cobley <popcornmix@gmail.com> ydrm/vc4: fkms: Fix margin calculations for the right/bottom edges The calculations clipped the right/bottom edge of the clipped range based on the left/top margins. raspberrypi#4447 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm/vc4: fkms: Use new devm_rpi_firmware_get api drm/kms: Add allow_fb_modifiers Signed-off-by: Dom Cobley <popcornmix@gmail.com> drm/vc4: Add async update support for cursor planes Now that cursors are implemented as regular planes, all cursor movements result in atomic updates. As the firmware-kms driver doesn't support asynchronous updates, these are synchronous, which limits the update rate to the screen refresh rate. Xorg seems unaware of this (or at least of the effect of this), because if the mouse is configured with a higher update rate than the screen then continuous mouse movement results in an increasing backlog of mouse events - cue extreme lag. Add minimal support for asynchronous updates - limited to cursor planes - to eliminate the lag. See: raspberrypi#4971 raspberrypi#4988 Signed-off-by: Phil Elwell <phil@raspberrypi.com> drivers/gpu/drm/vc4: Add missing 32-bit RGB formats The missing 32-bit per pixel ABGR and various "RGB with an X value" formats are added. Change sent by Dave Stevenson. Signed-off-by: David Plowman <david.plowman@raspberrypi.com> drm: vc4: Fixup duplicated macro definition in vc4_firmware_kms Both vc4_drv.h and vc4_firmware_kms.c had definitions for to_vc4_crtc. Rename the fkms one to make it unique, and drop the magic define vc4_crtc vc4_kms_crtc define to_vc4_crtc to_vc4_kms_crtc that renamed half the variable and function names in a slightly unexpected way. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm/vc4: Fix FKMS for when the YUV chroma planes are different buffers The code was assuming that it was a single buffer with offsets, when kmstest uses separate buffers and 0 offsets for each plane. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm/vc4: fkms: Rename plane related functions The name collide with the Full KMS functions that are going to be made public. Signed-off-by: Maxime Ripard <maxime@cerno.tech> drm/vc4_fkms: Fix up interrupt handler for both 2835/2711 and 2712 2712 has switched from using the SMI peripheral to another interrupt source for the vsync interrupt, so handle both sources cleanly. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm/vc4: fkms: No SMI abuse needed on BCM2712 Since we don't use the (absent) SMI block to create interrupts on BCM2712, there's no need to map any registers. Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Testing whether the VideoCore generation we want to mock is vc5 or vc4 worked so far, but will be difficult to extend to support BCM2712 (VC6). Convert to a switch. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
The DRM device pointer and the DRM encoder pointer are redundant, since the latter is attached to the former and we can just follow the drm_encoder->dev pointer. Let's remove the drm_device pointer argument. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Some tests will need to retrieve the output that was just allocated by vc4_mock_atomic_add_output(). Instead of making them look them up in the DRM device, we can simply make vc4_mock_atomic_add_output() return an error pointer that holds the allocated output instead of the error code. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
The BCM2712 has a simpler pipeline that can only output to a writeback connector and two HDMI controllers. Let's allow our kunit tests to create a mock of that pipeline. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
The BCM2712 has a simpler pipeline than the BCM2711, and thus the muxing requirements are different. Create some tests to make sure we get proper muxing decisions. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
The current mock planes were just using the regular drm_plane_state, while the driver expect struct vc4_plane_state that subclasses drm_plane_state. Hook the proper implementations of reset, duplicate_state, destroy and atomic_check to create vc4_plane_state. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Some tests will need to find a plane to run a test on for a given CRTC. Let's create a small helper to do that. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
We'll start to add some tests for the plane state logic, so let's create a helper to add a plane to an existing atomic state. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
We'll start testing our planes code in situations where we will use more than XRGB8888, so let's add a few common pixel formats. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
The BCM2712 comes with a different LBM size computation than the previous generations, so let's add the few examples provided as kunit tests to make sure we always satisfy those requirements. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
The tests on vc4 (BCM2835-7) were checking for DSI1 muxing being to restricted channel 2, and therefore muxing with TXP was impossible. As we no longer have that restriction, update the capabilities defined for DSI1, move the tests that used to be impossible to the valid list, and extend for additional combinations that are now possible. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Following the example of [1], move the state allocation out of the init function to make it thread safe. [1] commit 7e0351a ("drm/vc4: tests: Stop allocating the state in test init") Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Get the KUnit tests passing. Signed-off-by: Maíra Canal <mcanal@igalia.com>
LBM is only relevant for each active dlist, so there is no need to double-buffer the allocations. Cache the allocations per plane so that we can ensure the allocations are possible. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
On pi this was getting set to 0 which was hanging the firmware Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Commit de9e2b3d88af upstream. Currently DIV_ROUND_CLOSEST() is only available for the kernel via include/linux/math.h. Expose it to userland as well by adding __KERNEL_DIV_ROUND_CLOSEST() as a common definition in uapi. Additionally, ensure it allows building ISO C applications by switching from the 'typeof' GNU extension to the ISO-friendly __typeof__. Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Diederik de Haas <diederik@cknow-tech.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260303-rk3588-bgcolor-v8-1-fee377037ad1@collabora.com Signed-off-by: Daniel Stone <daniels@collabora.com>
Commit 4c684596cde4 upstream. Some display controllers can be hardware programmed to show non-black colors for pixels that are either not covered by any plane or are exposed through transparent regions of higher planes. This feature can help reduce memory bandwidth usage, e.g. in compositors managing a UI with a solid background color while using smaller planes to render the remaining content. To support this capability, introduce the BACKGROUND_COLOR standard DRM mode property, which can be attached to a CRTC through the drm_crtc_attach_background_color_property() helper function. Additionally, define a 64-bit ARGB format value to be built with the help of a couple of dedicated DRM_ARGB64_PREP*() helpers. Individual color components can be extracted with desired precision using the corresponding DRM_ARGB64_GET*() macros. Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Diederik de Haas <diederik@cknow-tech.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260303-rk3588-bgcolor-v8-2-fee377037ad1@collabora.com Signed-off-by: Daniel Stone <daniels@collabora.com>
When adding the register definitions for the GEN_6D hardware, 6 defines managed to get added twice. Remove that duplication. Fixes: 3ca2940 ("drm/vc4: hvs: Add in support for 2712 D-step.") Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Since a previous patch introduced the BACKGROUND_COLOR CRTC property, which defaults to solid black, take it into account when programming the hardware. The exact registers used varies between the hardware generations, but is supported by all of them. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The downstream implementation of power management for v3d used a 100ms delay and it has been tested for many years with success. Use the same delay with the runtime PM implementation. Although the shorter 50ms delay is not problematic in RPi 5, it can cause occasional GPU resets on RPi 4 during intensive workloads, due to the overhead of negotiating with the ASB bridge during frequent power domain transitions. Signed-off-by: Maíra Canal <mcanal@igalia.com>
The PHY supports swapping the pairs within a lane, so expose this. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The driver uses a couple of crypto_shash_* functions, which are not always available, potentially leading to build errors: arm-linux-ld: drivers/spi/spi-rp2040-gpio-bridge.o: in function `rp2040_gbdg_block_hash': spi-rp2040-gpio-bridge.c:(.text+0x274): undefined reference to `crypto_shash_update' spi-rp2040-gpio-bridge.c:(.text+0x2c4): undefined reference to `crypto_shash_update' spi-rp2040-gpio-bridge.c:(.text+0x2e4): undefined reference to `crypto_shash_final' spi-rp2040-gpio-bridge.c:(.text+0x2ec): undefined reference to `crypto_shash_digest' spi-rp2040-gpio-bridge.c:(.text+0x2fc): undefined reference to `crypto_shash_update' arm-linux-ld: drivers/spi/spi-rp2040-gpio-bridge.o: in function `rp2040_gbdg_probe': spi-rp2040-gpio-bridge.c:(.text+0x510): undefined reference to `crypto_alloc_shash' Fixes: fe24eda ("spi: Add a driver for the RPI RP2040 GPIO bridge") Signed-off-by: Corubba Smith <corubba@gmx.de>
PHY devices had lack of hwtstamp_get callback even though most of them are tracking configuration info. Introduce new call back to mii_timestamper. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251124181151.277256-3-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit f467777)
The driver stores configuration information and can technically report it. Implement hwtstamp_get callback to report the configuration. Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251124181151.277256-4-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 1cff839)
Some hub hardware does not differentiate pipe direction in the non-periodic split handler, so transfers to the same endpoint index will collide. A simple fix is to limit the non-periodic masquerade only to IN transfers, which are also the most affected by interrupt latency. A recurrence of raspberrypi#2024 Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
…PHYs" This reverts commit bd61974.
Add brcm,powerdown-enable to the external PHY nodes on Pi 4B and CM4. This puts the BCM54210PE PHY into a low-power state when the link is down, reducing power consumption when no cable is connected. This is the same approach already used by Pi 5 and CM5. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
This reverts commit 712989f.
This reverts commit 09d1901.
This reverts commit dfad8d6.
* Fixes build with Clang.
drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c:1513:3: error: variable 'chan_flags_all' is uninitialized when used here [-Werror,-Wuninitialized]
1513 | chan_flags_all |= dw->chan_flags[i];
| ^~~~~~~~~~~~~~
drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c:1502:25: note: initialize the variable 'chan_flags_all' to silence this warning
1502 | uint32_t chan_flags_all;
| ^
| = 0
1 error generated.
* Fixes build with Clang.
drivers/gpu/drm/vc4/vc4_plane.c:2799:2: error: variable 'txp_crtc' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized]
2799 | drm_for_each_crtc(crtc, drm) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/drm/drm_crtc.h:1312:2: note: expanded from macro 'drm_for_each_crtc'
1312 | list_for_each_entry(crtc, &(dev)->mode_config.crtc_list, head)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/list.h:783:7: note: expanded from macro 'list_for_each_entry'
783 | !list_entry_is_head(pos, head, member); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/vc4/vc4_plane.c:2809:20: note: uninitialized use occurs here
2809 | drm_crtc_mask(txp_crtc);
| ^~~~~~~~
drivers/gpu/drm/vc4/vc4_plane.c:2799:2: note: remove the condition if it is always true
2799 | drm_for_each_crtc(crtc, drm) {
| ^
include/drm/drm_crtc.h:1312:2: note: expanded from macro 'drm_for_each_crtc'
1312 | list_for_each_entry(crtc, &(dev)->mode_config.crtc_list, head)
| ^
include/linux/list.h:783:7: note: expanded from macro 'list_for_each_entry'
783 | !list_entry_is_head(pos, head, member); \
| ^
drivers/gpu/drm/vc4/vc4_plane.c:2796:27: note: initialize the variable 'txp_crtc' to silence this warning
2796 | struct drm_crtc *txp_crtc;
| ^
| = NULL
1 error generated.
* Fixes build with Clang.
drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c:461:10: error: array index 18 is past the end of the array (that has type 'u8[1]' (aka 'unsigned char[1]')) [-Werror,-Warray-bounds]
461 | setbit(eventmask_msg->mask, BRCMF_E_ULP);
| ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/wireless/broadcom/brcm80211/brcmfmac/../include/brcmu_utils.h:37:30: note: expanded from macro 'setbit'
37 | #define setbit(a, i) (((u8 *)a)[(i)/NBBY] |= 1<<((i)%NBBY))
| ^ ~~~~~~~~
drivers/net/wireless/broadcom/brcm80211/brcmfmac/fweh.h:313:2: note: array 'mask' declared here
313 | u8 mask[1];
| ^
1 error generated.
…nk-up
* Fixes build with Clang.
drivers/pci/controller/pcie-brcmstb.c:1595:3: error: variable 'clkreq_cntl' is uninitialized when used here [-Werror,-Wuninitialized]
1595 | clkreq_cntl |= PCIE_MISC_HARD_PCIE_HARD_DEBUG_CLKREQ_DEBUG_ENABLE_MASK;
| ^~~~~~~~~~~
drivers/pci/controller/pcie-brcmstb.c:1578:17: note: initialize the variable 'clkreq_cntl' to silence this warning
1578 | u32 clkreq_cntl;
| ^
| = 0
1 error generated.
* Change SND_PIMIDI and SND_PISOUND_MICRO to select CRC8 instead of depend
on it.
* Fixes build with Clang.
sound/drivers/upisnd/upisnd_codec.c:741:6: error: variable 'ret' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
741 | if (adau->clk_src != ADAU1961_CLK_SRC_MCLK)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/drivers/upisnd/upisnd_codec.c:744:9: note: uninitialized use occurs here
744 | return ret;
| ^~~
sound/drivers/upisnd/upisnd_codec.c:741:2: note: remove the 'if' if its condition is always true
741 | if (adau->clk_src != ADAU1961_CLK_SRC_MCLK)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
742 | ret = snd_soc_dapm_add_routes(dapm, &adau1961_dapm_pll_route, 1);
sound/drivers/upisnd/upisnd_codec.c:739:9: note: initialize the variable 'ret' to silence this warning
739 | int ret;
| ^
| = 0
1 error generated.
* Fixes build with Clang.
sound/soc/bcm/hifiberry_studio_dac8x.c:770:41: error: equality comparison with extraneous parentheses [-Werror,-Wparentheses-equality]
770 | if ((priv->card_info.card_clk_options == 0x02)) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
sound/soc/bcm/hifiberry_studio_dac8x.c:770:41: note: remove extraneous parentheses around the comparison to silence this warning
770 | if ((priv->card_info.card_clk_options == 0x02)) {
| ~ ^ ~
sound/soc/bcm/hifiberry_studio_dac8x.c:770:41: note: use '=' to turn this equality comparison into an assignment
770 | if ((priv->card_info.card_clk_options == 0x02)) {
| ^~
| =
1 error generated.
Avoid some compiler warnings by adding explicit narrowing casts. See: raspberrypi#7309 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
…tomic_check Since incorrect conditional operator was used in vc4_txp_atomic_check(), the check may be bypassed if only one of the width or height does not match. To prevent this, the conditional operator must be corrected. Fixes: c5d3a57 ("drm/vc4: txp: Add a rotation property to the writeback connector") Signed-off-by: Jeongjun Park <aha310510@gmail.com>
…m function In the previous commit, we added a rotation parameter to be used in the connector, but because we are still using the default reset function without implementing a custom reset function to properly initialize it, the rotation variable remains NULL until it is initialized directly in userspace. To prevent this, we must implement a custom reset function that properly initializes the rotation parameter. Fixes: 30c7044 ("drm: Add a rotation parameter to connectors.") Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Considering that the Raspberry Pi is an embedded device with limited memory, memory fragmentation is an important aspect for performance. Using Big/Super Pages has clear benefits when it comes to reducing TLB misses, but also has an impact on memory fragmentation as we need to allocate aligned contiguous memory, increasing compaction pressure and memory waste for small BOs. As Big/Super Pages only have benefits for larger BOs, create a minimum BO size to use the THP partition. After testing different thresholds, 512KB provides the most balanced results with clear improvements and no significant regressions. This means that Big/Super Pages will only be used for BOs of at least 512KB. Signed-off-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
8298673 to
48996f0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Considering that the Raspberry Pi is an embedded device with limited memory, memory fragmentation is an important aspect for performance. Using Big/Super Pages has clear benefits when it comes to reducing TLB misses, but also has an impact on memory fragmentation as we need to allocate aligned contiguous memory, increasing compaction pressure and memory waste for small BOs.
As Big/Super Pages only have benefits for larger BOs, create a minimum BO size to use the THP partition. After testing different thresholds, 512KB provides the most balanced results with clear improvements and no significant regressions. This means that Big/Super Pages will only be used for BOs of at least 512KB.
Here are some benchmark results. Each trace has been run twice to gather the results.