Skip to content

[linux-6.6.y][Feature]Backport some mm related patches from recent (202504) Linux upstream #775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

wojiaohanliyang
Copy link

@wojiaohanliyang wojiaohanliyang commented Apr 30, 2025

Backport 7 mm related patches from Linux upstream:

2eaa6c2abb9d mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
7e066cb9b71a KVM: SEV: Use long-term pin when registering encrypted memory regions
24ac6fb6e364 mm/cma: using per-CMA locks to improve concurrent allocation performance
04f13d241b8b mm: replace free hugepage folios after migration
67bab13307c8 mm/hugetlb: wait for hugetlb folios to be freed
e19a3f595ae4 mm/compaction: factor out code to test if we should run compaction for target order
6268f0a166eb mm: compaction: use the proper flag to determine watermarks

Revert 3 non-upstream patches which conflict with the backported patches:

Revert "KVM: SEV: Pin SEV guest memory out of CMA area"
Revert "x86/mm: CSV allows CMA allocation concurrently"
Revert "mm/cma: add API to enable concurrent allocation from the CMA"

Summary by Sourcery

Backport memory management (mm) related patches from Linux upstream, improving memory allocation, compaction, and handling of huge pages and encrypted memory regions

New Features:

  • Introduced a new function to replace free hugepage folios
  • Added a wait mechanism for freeing hugepage folios

Bug Fixes:

  • Fixed hugetlb page number decrease on movable nodes
  • Corrected watermark determination for compaction
  • Resolved issues with memory isolation and migration

Enhancements:

  • Improved compaction logic for memory allocation
  • Enhanced KVM SEV memory pinning mechanism
  • Optimized CMA (Contiguous Memory Allocator) allocation performance

Chores:

  • Reverted non-upstream patches that conflicted with backported changes

Yuan Can and others added 10 commits April 30, 2025 10:58
…le nodes

mainline inclusion
from mainline-v6.7-rc1
category: bugfix

---------------------------

commit 2eaa6c2 upstream.

The decreasing of hugetlb pages number failed with the following message
given:

 sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
 CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... deepin-community#45
 Hardware name: linux,dummy-virt (DT)
 Call trace:
  dump_backtrace.part.6+0x84/0xe4
  show_stack+0x18/0x24
  dump_stack_lvl+0x48/0x60
  dump_stack+0x18/0x24
  warn_alloc+0x100/0x1bc
  __alloc_pages_slowpath.constprop.107+0xa40/0xad8
  __alloc_pages+0x244/0x2d0
  hugetlb_vmemmap_restore+0x104/0x1e4
  __update_and_free_hugetlb_folio+0x44/0x1f4
  update_and_free_hugetlb_folio+0x20/0x68
  update_and_free_pages_bulk+0x4c/0xac
  set_max_huge_pages+0x198/0x334
  nr_hugepages_store_common+0x118/0x178
  nr_hugepages_store+0x18/0x24
  kobj_attr_store+0x18/0x2c
  sysfs_kf_write+0x40/0x54
  kernfs_fop_write_iter+0x164/0x1dc
  vfs_write+0x3a8/0x460
  ksys_write+0x6c/0x100
  __arm64_sys_write+0x1c/0x28
  invoke_syscall+0x44/0x100
  el0_svc_common.constprop.1+0x6c/0xe4
  do_el0_svc+0x38/0x94
  el0_svc+0x28/0x74
  el0t_64_sync_handler+0xa0/0xc4
  el0t_64_sync+0x174/0x178
 Mem-Info:
  ...

The reason is that the hugetlb pages being released are allocated from
movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
need to be allocated from the same node during the hugetlb pages
releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
node is always failed. Fix this problem by removing __GFP_THISNODE.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: ad2fa37 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Yuan Can <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
hygon inclusion
category: bugfix
CVE: NA

---------------------------

The commit 31314f6 ("KVM: SEV: Pin SEV guest memory out of CMA
area") has its corresponding upstream version. We'll backport the
upstream version to this repo.

Fixes: 31314f6 ("KVM: SEV: Pin SEV guest memory out of CMA area")
Signed-off-by: hanliyang <[email protected]>
mainline inclusion
from mainline-v6.15-rc1
category: feature
CVE: NA

---------------------------

commit 7e066cb upstream.

When registering an encrypted memory region for SEV-MEM/SEV-ES guests,
pin the pages with FOLL_TERM so that the pages are migrated out of
MIGRATE_CMA/ZONE_MOVABLE.  Failure to do so violates the CMA/MOVABLE
mechanisms and can result in fragmentation due to unmovable pages, e.g.
can make CMA allocations fail.

Signed-off-by: Ge Yang <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[sean: massage changelog, make @flags an unsigned int]
Signed-off-by: Sean Christopherson <[email protected]>
hygon inclusion
category: bugfix
CVE: NA

---------------------------

The commit 4d71a47 ("x86/mm: CSV allows CMA allocation
concurrently") has its corresponding upstream version. We'll backport the
upstream version to this repo.

Fixes: 4d71a47 ("x86/mm: CSV allows CMA allocation concurrently")
Signed-off-by: hanliyang <[email protected]>
hygon inclusion
category: bugfix
CVE: NA

---------------------------

The commit 83138d4 ("mm/cma: add API to enable concurrent allocation
from the CMA") has its corresponding upstream version. We'll backport the
upstream version to this repo.

Fixes: 83138d4 ("mm/cma: add API to enable concurrent allocation from the CMA")
Signed-off-by: hanliyang <[email protected]>
mainline inclusion
from mainline-v6.15-rc1
category: feature
CVE: NA

---------------------------

commit 24ac6fb upstream.

For different CMAs, concurrent allocation of CMA memory ideally should not
require synchronization using locks.  Currently, a global cma_mutex lock
is employed to synchronize all CMA allocations, which can impact the
performance of concurrent allocations across different CMAs.

To test the performance impact, follow these steps:
1. Boot the kernel with the command line argument hugetlb_cma=30G to
   allocate a 30GB CMA area specifically for huge page allocations. (note:
   on my machine, which has 3 nodes, each node is initialized with 10G of
   CMA)
2. Use the dd command with parameters if=/dev/zero of=/dev/shm/file bs=1G
   count=30 to fully utilize the CMA area by writing zeroes to a file in
   /dev/shm.
3. Open three terminals and execute the following commands simultaneously:
   (Note: Each of these commands attempts to allocate 10GB [2621440 * 4KB
   pages] of CMA memory.)
   On Terminal 1: time echo 2621440 > /sys/kernel/debug/cma/hugetlb1/alloc
   On Terminal 2: time echo 2621440 > /sys/kernel/debug/cma/hugetlb2/alloc
   On Terminal 3: time echo 2621440 > /sys/kernel/debug/cma/hugetlb3/alloc

We attempt to allocate pages through the CMA debug interface and use the
time command to measure the duration of each allocation.
Performance comparison:
             Without this patch      With this patch
Terminal1        ~7s                     ~7s
Terminal2       ~14s                     ~8s
Terminal3       ~21s                     ~7s

To solve problem above, we could use per-CMA locks to improve concurrent
allocation performance.  This would allow each CMA to be managed
independently, reducing the need for a global lock and thus improving
scalability and performance.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ge Yang <[email protected]>
Reviewed-by: Barry Song <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Aisheng Dong <[email protected]>
Cc: Baolin Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
mainline inclusion
from mainline-v6.14-rc1
category: feature
CVE: NA

---------------------------

commit 04f13d2 upstream.

My machine has 4 NUMA nodes, each equipped with 32GB of memory.  I have
configured each NUMA node with 16GB of CMA and 16GB of in-use hugetlb
pages.  The allocation of contiguous memory via cma_alloc() can fail
probabilistically.

When there are free hugetlb folios in the hugetlb pool, during the
migration of in-use hugetlb folios, new folios are allocated from the free
hugetlb pool.  After the migration is completed, the old folios are
released back to the free hugetlb pool instead of being returned to the
buddy system.  This can cause test_pages_isolated() check to fail,
ultimately leading to the failure of cma_alloc().

Call trace:

cma_alloc()
    __alloc_contig_migrate_range() // migrate in-use hugepage
    test_pages_isolated()
        __test_page_isolated_in_pageblock()
             PageBuddy(page) // check if the page is in buddy

To address this issue, we introduce a function named
replace_free_hugepage_folios().  This function will replace the hugepage
in the free hugepage pool with a new one and release the old one to the
buddy system.  After the migration of in-use hugetlb pages is completed,
we will invoke replace_free_hugepage_folios() to ensure that these
hugepages are properly released to the buddy system.  Following this step,
when test_pages_isolated() is executed for inspection, it will
successfully pass.

Additionally, when alloc_contig_range() is used to migrate multiple in-use
hugetlb pages, it can result in some in-use hugetlb pages being released
back to the free hugetlb pool and subsequently being reallocated and used
again.  For example:

[huge 0] [huge 1]

To migrate huge 0, we obtain huge x from the pool.  After the migration is
completed, we return the now-freed huge 0 back to the pool.  When it's
time to migrate huge 1, we can simply reuse the now-freed huge 0 from the
pool.  As a result, when replace_free_hugepage_folios() is executed, it
cannot release huge 0 back to the buddy system.  To address this issue, we
should prevent the reuse of isolated free hugepages during the migration
process.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: yangge <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
mainline inclusion
from mainline-v6.14-rc6
category: feature
CVE: NA

---------------------------

commit 67bab13 upstream.

Since the introduction of commit c77c0a8 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.

In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location.  When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system.  A temporary state is set
on the newly allocated folio.  Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios.  Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system.  However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().

Here is a simplified call trace illustrating the process:
cma_alloc()
    ->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
        ->unmap_and_move_huge_page()
            ->folio_putback_hugetlb() // Free old folios
    ->test_pages_isolated()
        ->__test_page_isolated_in_pageblock()
             ->PageBuddy(page) // Check if the page is in buddy

To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios().  This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed.  By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: c77c0a8 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
…r target order

mainline inclusion
from mainline-v6.7-rc1
category: feature
CVE: NA

--------------------------------

commit e19a3f5 upstream.

We always do zone_watermark_ok check and compaction_suitable check
together to test if compaction for target order should be ran.  Factor
these code out to remove repeat code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
mainline inclusion
from mainline-v6.14-rc1
category: feature
CVE: NA

--------------------------------

commit 6268f0a upstream.

There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of
memory.  I have configured 16GB of CMA memory on each NUMA node, and
starting a 32GB virtual machine with device passthrough is extremely slow,
taking almost an hour.

Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB
of no-CMA memory on a NUMA node can be used as virtual machine memory.
There is 16GB of free CMA memory on a NUMA node, which is sufficient to
pass the order-0 watermark check, causing the __compaction_suitable()
function to consistently return true.

For costly allocations, if the __compaction_suitable() function always
returns true, it causes the __alloc_pages_slowpath() function to fail to
exit at the appropriate point.  This prevents timely fallback to
allocating memory on other nodes, ultimately resulting in excessively long
virtual machine startup times.

Call trace:
__alloc_pages_slowpath
    if (compact_result == COMPACT_SKIPPED ||
        compact_result == COMPACT_DEFERRED)
        goto nopage; // should exit __alloc_pages_slowpath() from here

We could use the real unmovable allocation context to have
__zone_watermark_unusable_free() subtract CMA pages, and thus we won't
pass the order-0 check anymore once the non-CMA part is exhausted.  There
is some risk that in some different scenario the compaction could in fact
migrate pages from the exhausted non-CMA part of the zone to the CMA part
and succeed, and we'll skip it instead.  But only __GFP_NORETRY
allocations should be affected in the immediate "goto nopage" when
compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY
anyway and won't fail without trying to compact-migrate the non-CMA
pageblocks into CMA pageblocks first, so it should be fine.

After this fix, it only takes a few tens of seconds to start a 32GB
virtual machine with device passthrough functionality.

Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: yangge <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Barry Song <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Copy link

sourcery-ai bot commented Apr 30, 2025

Reviewer's Guide

This pull request backports seven memory management (mm) patches from upstream Linux and reverts three conflicting local patches. Implementation involves introducing new functions for compaction suitability checks and hugetlb folio handling after migration, modifying KVM/SEV memory pinning to use standard flags, replacing the global CMA mutex with per-CMA mutexes, and adjusting memory allocation flags.

File-Level Changes

Change Details Files
Refactored compaction suitability logic.
  • Introduced compaction_suit_allocation_order function.
  • Replaced existing suitability checks in compact_zone, kcompactd_node_suitable, and kcompactd_do_work with calls to the new function.
mm/compaction.c
Improved handling of hugetlb folios during migration and isolation.
  • Added replace_free_hugepage_folios to handle freed folios post-migration in alloc_contig_range.
  • Added wait_for_freed_hugetlb_folios to wait for deferred freeing before isolation checks in test_pages_isolated.
  • Added check for isolated pages (is_migrate_isolate_page) in dequeue_hugetlb_folio_node_exact.
  • Added function declarations and stubs to header.
mm/hugetlb.c
include/linux/hugetlb.h
mm/page_alloc.c
mm/page_isolation.c
Modified KVM/SEV/CSV memory pinning to use standard flags.
  • Changed sev_pin_memory signature to accept unsigned int flags instead of int write.
  • Updated callers to pass appropriate FOLL_* flags (e.g., FOLL_WRITE, FOLL_LONGTERM).
  • Updated corresponding function pointer signature in hygon_kvm_hooks_table.
arch/x86/kvm/svm/sev.c
arch/x86/kvm/svm/csv.c
arch/x86/kvm/svm/csv.h
Replaced global CMA lock with per-CMA locks to improve concurrency.
  • Removed global cma_mutex.
  • Added alloc_mutex (struct mutex) to struct cma.
  • Initialized and used per-CMA alloc_mutex in cma_alloc.
  • Removed no_mutex field.
  • Removed cma_enable_concurrency function and its callers.
mm/cma.c
include/linux/cma.h
mm/cma.h
arch/x86/mm/mem_encrypt_hygon.c
Adjusted hugetlb vmemmap page allocation flags.
  • Removed __GFP_THISNODE flag from gfp_mask in alloc_vmemmap_page_list.
mm/hugetlb_vmemmap.c
Reverted three conflicting non-upstream patches.
  • Removed code related to the reverted patches concerning KVM/SEV pinning out of CMA and enabling CMA concurrency via cma_enable_concurrency.
mm/cma.c
include/linux/cma.h
arch/x86/mm/mem_encrypt_hygon.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

Hi @wojiaohanliyang. Thanks for your PR.

I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @wojiaohanliyang - I've reviewed your changes - here's some feedback:

  • Consider splitting these backports into smaller PRs based on the subsystem (e.g., compaction, hugetlb, CMA/KVM) for easier review and potential bisection.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

* Return COMPACT_SKIPPED if compaction for target order is likely to fail
* Return COMPACT_CONTINUE if compaction for target order should be ran
*/
static enum compact_result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider flattening the conditionals in compaction_suit_allocation_order by extracting the async condition checks into a helper function and using early returns.

Consider flattening the conditionals by extracting the async condition checks into a small helper and using early returns. This can make the control flow clearer without inlining every check. For example:

static inline enum compact_result
check_async_compaction(struct zone *zone, unsigned int order, int highest_zoneidx,
                       unsigned int alloc_flags, bool async)
{
    if (order > PAGE_ALLOC_COSTLY_ORDER && async && !(alloc_flags & ALLOC_CMA)) {
        unsigned long watermark = low_wmark_pages(zone) + compact_gap(order);
        if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
                                  0, zone_page_state(zone, NR_FREE_PAGES)))
            return COMPACT_SKIPPED;
    }
    return COMPACT_CONTINUE;
}

Then simplify compaction_suit_allocation_order by using early returns:

static enum compact_result compaction_suit_allocation_order(struct zone *zone,
        unsigned int order, int highest_zoneidx, unsigned int alloc_flags, bool async)
{
    unsigned long watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
    if (zone_watermark_ok(zone, order, watermark, highest_zoneidx, alloc_flags))
        return COMPACT_SUCCESS;

    if (check_async_compaction(zone, order, highest_zoneidx, alloc_flags, async) != COMPACT_CONTINUE)
        return COMPACT_SKIPPED;

    if (!compaction_suitable(zone, order, highest_zoneidx))
        return COMPACT_SKIPPED;

    return COMPACT_CONTINUE;
}

This approach retains all functionality while clarifying the flow.

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opsiff

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@opsiff opsiff merged commit 1d45126 into deepin-community:linux-6.6.y May 8, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants