Skip to content

[ThinLTO] Use a set rather than a map to track exported ValueInfos. #97360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 3, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Jul 1, 2024

#95482 is a reland of #88024. #95482 keeps indexing memory usage reasonable by using unordered_map and doesn't make other changes to originally reviewed code.

While discussing possible ways to minimize indexing memory usage, Teresa asked whether I need ExportSetTy as a map or a set is sufficient. This PR implements the idea. It uses a set rather than a map to track exposed ValueInfos.

Currently, ExportLists has two use cases, and neither needs to track a ValueInfo's import/export status. So using a set is sufficient and correct.

  1. In both in-process and distributed ThinLTO, it's used to decide if a function or global variable is visible from another module after importing creates additional cross-module references.
    • If a cross-module call edge is seen today, the callee must be visible to another module without keeping track of its export status already. For instance, this is how callees of direct calls get exported.
  2. For in-process ThinLTO, it's used to compute lto cache key.
    • The cache key computation already hashes 'ImportList' , and 'ExportList' is determined by 'ImportList'. So it's fine to not track 'import type' for export list.

@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO] Track definitions only in export-set [ThinLTO] Use a set rather than a map to track exported VIs. Jul 2, 2024
@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO] Use a set rather than a map to track exported VIs. [ThinLTO] Use a set rather than a map to track exported ValueInfos. Jul 2, 2024
@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review July 2, 2024 00:20
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Jul 2, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2024

@llvm/pr-subscribers-lto

@llvm/pr-subscribers-llvm-transforms

Author: Mingming Liu (minglotus-6)

Changes

#95482 is a reland of #88024. #95482 keeps indexing memory usage reasonable by using unordered_map and doesn't make other changes to originally reviewed code.

While discussing possible ways to minimize indexing memory usage, Teresa asked whether I need ExportSetTy as a map or a set is sufficient. This PR implements the idea. It uses a set rather than a map to track exposed ValueInfos.

Currently, ExportLists has two use cases, and neither needs to track a ValueInfo's import/export status. So using a set is sufficient and correct.

  1. In both in-process and distributed ThinLTO, it's used to decide if a function or global variable is visible from another module.
    • If a cross-module call edge is seen today, the callee must be visible to another module without keeping track of its export status already. For instance, this is how callees of direct calls get exported.
  2. For in-process ThinLTO only, it's used to compute lto cache key (which is in-process thinlto only).
    • The cache key computation already hashes 'ImportList' , and 'ExportList' is determined by 'ImportList'. So it's fine to not track 'import type' for export list.

Full diff: https://github.com/llvm/llvm-project/pull/97360.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/FunctionImport.h (+4-7)
  • (modified) llvm/lib/LTO/LTO.cpp (+6-8)
  • (modified) llvm/lib/Transforms/IPO/FunctionImport.cpp (+31-21)
diff --git a/llvm/include/llvm/Transforms/IPO/FunctionImport.h b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
index d8c142ec89d82..3b03ba82b9272 100644
--- a/llvm/include/llvm/Transforms/IPO/FunctionImport.h
+++ b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
@@ -104,13 +104,10 @@ class FunctionImporter {
   /// index's module path string table).
   using ImportMapTy = DenseMap<StringRef, FunctionsToImportTy>;
 
-  /// The map contains an entry for every global value the module exports.
-  /// The key is ValueInfo, and the value indicates whether the definition
-  /// or declaration is visible to another module. If a function's definition is
-  /// visible to other modules, the global values this function referenced are
-  /// visible and shouldn't be internalized.
-  /// TODO: Rename to `ExportMapTy`.
-  using ExportSetTy = DenseMap<ValueInfo, GlobalValueSummary::ImportKind>;
+  /// The set contains an entry for every global value that the module exports.
+  /// Depending on the user context, this container is allowed to contain
+  /// definitions, declarations or a mix of both.
+  using ExportSetTy = DenseSet<ValueInfo>;
 
   /// A function of this type is used to load modules referenced by the index.
   using ModuleLoaderTy =
diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index 6bbec535d8e98..5382b1158cb04 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -161,19 +161,17 @@ void llvm::computeLTOCacheKey(
   auto ModHash = Index.getModuleHash(ModuleID);
   Hasher.update(ArrayRef<uint8_t>((uint8_t *)&ModHash[0], sizeof(ModHash)));
 
-  std::vector<std::pair<uint64_t, uint8_t>> ExportsGUID;
+  // TODO: `ExportList` is determined by `ImportList`. Since `ImportList` is
+  // used to compute cache key, we could omit hashing `ExportList` here.
+  std::vector<uint64_t> ExportsGUID;
   ExportsGUID.reserve(ExportList.size());
-  for (const auto &[VI, ExportType] : ExportList)
-    ExportsGUID.push_back(
-        std::make_pair(VI.getGUID(), static_cast<uint8_t>(ExportType)));
+  for (const auto &VI : ExportList)
+    ExportsGUID.push_back(VI.getGUID());
 
   // Sort the export list elements GUIDs.
   llvm::sort(ExportsGUID);
-  for (auto [GUID, ExportType] : ExportsGUID) {
-    // The export list can impact the internalization, be conservative here
+  for (auto GUID : ExportsGUID)
     Hasher.update(ArrayRef<uint8_t>((uint8_t *)&GUID, sizeof(GUID)));
-    AddUint8(ExportType);
-  }
 
   // Include the hash for every module we import functions from. The set of
   // imported symbols for each module may affect code generation and is
diff --git a/llvm/lib/Transforms/IPO/FunctionImport.cpp b/llvm/lib/Transforms/IPO/FunctionImport.cpp
index ec5294b9512cf..f2e67aa998606 100644
--- a/llvm/lib/Transforms/IPO/FunctionImport.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionImport.cpp
@@ -400,8 +400,7 @@ class GlobalsImporter final {
         // later, in ComputeCrossModuleImport, after import decisions are
         // complete, which is more efficient than adding them here.
         if (ExportLists)
-          (*ExportLists)[RefSummary->modulePath()][VI] =
-              GlobalValueSummary::Definition;
+          (*ExportLists)[RefSummary->modulePath()].insert(VI);
 
         // If variable is not writeonly we attempt to recursively analyze
         // its references in order to import referenced constants.
@@ -582,7 +581,7 @@ class WorkloadImportsManager : public ModuleImportsManager {
           GlobalValueSummary::Definition;
       GVI.onImportingSummary(*GVS);
       if (ExportLists)
-        (*ExportLists)[ExportingModule][VI] = GlobalValueSummary::Definition;
+        (*ExportLists)[ExportingModule].insert(VI);
     }
     LLVM_DEBUG(dbgs() << "[Workload] Done\n");
   }
@@ -818,10 +817,8 @@ static void computeImportForFunction(
           // Since definition takes precedence over declaration for the same VI,
           // try emplace <VI, declaration> pair without checking insert result.
           // If insert doesn't happen, there must be an existing entry keyed by
-          // VI.
-          if (ExportLists)
-            (*ExportLists)[DeclSourceModule].try_emplace(
-                VI, GlobalValueSummary::Declaration);
+          // VI. Note `ExportLists` only keeps track of definitions so VI won't
+          // be inserted.
           ImportList[DeclSourceModule].try_emplace(
               VI.getGUID(), GlobalValueSummary::Declaration);
         }
@@ -892,7 +889,7 @@ static void computeImportForFunction(
       // later, in ComputeCrossModuleImport, after import decisions are
       // complete, which is more efficient than adding them here.
       if (ExportLists)
-        (*ExportLists)[ExportModulePath][VI] = GlobalValueSummary::Definition;
+        (*ExportLists)[ExportModulePath].insert(VI);
     }
 
     auto GetAdjustedThreshold = [](unsigned Threshold, bool IsHotCallsite) {
@@ -998,14 +995,29 @@ static bool isGlobalVarSummary(const ModuleSummaryIndex &Index,
   return false;
 }
 
-template <class T>
-static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index, T &Cont,
+static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index,
+                                      FunctionImporter::ExportSetTy &ExportSet,
                                       unsigned &DefinedGVS,
                                       unsigned &DefinedFS) {
+  DefinedGVS = 0;
+  DefinedFS = 0;
+  for (auto &VI : ExportSet) {
+    if (isGlobalVarSummary(Index, VI.getGUID())) {
+      ++DefinedGVS;
+    } else
+      ++DefinedFS;
+  }
+  return DefinedGVS;
+}
+
+static unsigned
+numGlobalVarSummaries(const ModuleSummaryIndex &Index,
+                      FunctionImporter::FunctionsToImportTy &ImportMap,
+                      unsigned &DefinedGVS, unsigned &DefinedFS) {
   unsigned NumGVS = 0;
   DefinedGVS = 0;
   DefinedFS = 0;
-  for (auto &[GUID, Type] : Cont) {
+  for (auto &[GUID, Type] : ImportMap) {
     if (isGlobalVarSummary(Index, GUID)) {
       if (Type == GlobalValueSummary::Definition)
         ++DefinedGVS;
@@ -1046,7 +1058,7 @@ static bool checkVariableImport(
   };
 
   for (auto &ExportPerModule : ExportLists)
-    for (auto &[VI, Unused] : ExportPerModule.second)
+    for (auto &VI : ExportPerModule.second)
       if (!FlattenedImports.count(VI.getGUID()) &&
           IsReadOrWriteOnlyVarNeedingImporting(ExportPerModule.first, VI))
         return false;
@@ -1079,14 +1091,12 @@ void llvm::ComputeCrossModuleImport(
   // since we may import the same values multiple times into different modules
   // during the import computation.
   for (auto &ELI : ExportLists) {
+    // `NewExports` tracks the VI that gets exported because the full definition
+    // of its user/referencer gets exported.
     FunctionImporter::ExportSetTy NewExports;
     const auto &DefinedGVSummaries =
         ModuleToDefinedGVSummaries.lookup(ELI.first);
-    for (auto &[EI, Type] : ELI.second) {
-      // If a variable is exported as a declaration, its 'refs' and 'calls' are
-      // not further exported.
-      if (Type == GlobalValueSummary::Declaration)
-        continue;
+    for (auto &EI : ELI.second) {
       // Find the copy defined in the exporting module so that we can mark the
       // values it references in that specific definition as exported.
       // Below we will add all references and called values, without regard to
@@ -1108,19 +1118,19 @@ void llvm::ComputeCrossModuleImport(
           for (const auto &VI : GVS->refs()) {
             // Try to emplace the declaration entry. If a definition entry
             // already exists for key `VI`, this is a no-op.
-            NewExports.try_emplace(VI, GlobalValueSummary::Declaration);
+            NewExports.insert(VI);
           }
       } else {
         auto *FS = cast<FunctionSummary>(S);
         for (const auto &Edge : FS->calls()) {
           // Try to emplace the declaration entry. If a definition entry
           // already exists for key `VI`, this is a no-op.
-          NewExports.try_emplace(Edge.first, GlobalValueSummary::Declaration);
+          NewExports.insert(Edge.first);
         }
         for (const auto &Ref : FS->refs()) {
           // Try to emplace the declaration entry. If a definition entry
           // already exists for key `VI`, this is a no-op.
-          NewExports.try_emplace(Ref, GlobalValueSummary::Declaration);
+          NewExports.insert(Ref);
         }
       }
     }
@@ -1129,7 +1139,7 @@ void llvm::ComputeCrossModuleImport(
     // the same ref/call target multiple times in above loop, and it is more
     // efficient to avoid a set lookup each time.
     for (auto EI = NewExports.begin(); EI != NewExports.end();) {
-      if (!DefinedGVSummaries.count(EI->first.getGUID()))
+      if (!DefinedGVSummaries.count(EI->getGUID()))
         NewExports.erase(EI++);
       else
         ++EI;

Copy link
Contributor

@jvoung jvoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1108,19 +1118,19 @@ void llvm::ComputeCrossModuleImport(
for (const auto &VI : GVS->refs()) {
// Try to emplace the declaration entry. If a definition entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can remove the comment "// Try to emplace the declaration entry ..."?

Same below at lines 1128 and 1133

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your description the link you have for "For instance, this is how callees of direct calls get exported."

is not really correct - that links to CFI handling that needs to do some additional marking of exported GUIDs due to its jump tables. By definition, anything that before importing had a direct cross-module call would have had to have had external linkage, and that is handled here:

for (auto &Res : *GlobalResolutions) {
// If the symbol does not have external references or it is not prevailing,
// then not need to mark it as exported from a ThinLTO partition.
if (Res.second.Partition != GlobalResolution::External ||
!Res.second.isPrevailingIRSymbol())
continue;
auto GUID = GlobalValue::getGUID(
GlobalValue::dropLLVMManglingEscape(Res.second.IRName));
// Mark exported unless index-based analysis determined it to be dead.
if (ThinLTO.CombinedIndex.isGUIDLive(GUID))
ExportedGUIDs.insert(GUID);
}

Also, I think it would also be clearer to make this point "In both in-process and distributed ThinLTO, it's used to decide if a function or global variable is visible from another module after importing creates additional cross-module references". (note the addition at the end of sentence).

@@ -998,14 +995,29 @@ static bool isGlobalVarSummary(const ModuleSummaryIndex &Index,
return false;
}

template <class T>
static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index, T &Cont,
static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified more - simply return the number that are GVs (i.e. no output reference parameters needed). The number that are functions can be deduced by the caller from the size of the ExportSet minus the returned GVs count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Given export containers doesn't track def or decl, I took the liberty to update this function so it counts the total number of imported or exported global variables but doesn't break down by def or decl.

if (ExportLists)
(*ExportLists)[DeclSourceModule].try_emplace(
VI, GlobalValueSummary::Declaration);
// VI. Note `ExportLists` only keeps track of definitions so VI won't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "only keeps track of exports due to imported definitions" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@mingmingl-llvm
Copy link
Contributor Author

In your description the link you have for "For instance, this is how callees of direct calls get exported."

is not really correct - that links to CFI handling that needs to do some additional marking of exported GUIDs due to its jump tables. By definition, anything that before importing had a direct cross-module call would have had to have had external linkage, and that is handled here:

for (auto &Res : *GlobalResolutions) {
// If the symbol does not have external references or it is not prevailing,
// then not need to mark it as exported from a ThinLTO partition.
if (Res.second.Partition != GlobalResolution::External ||
!Res.second.isPrevailingIRSymbol())
continue;
auto GUID = GlobalValue::getGUID(
GlobalValue::dropLLVMManglingEscape(Res.second.IRName));
// Mark exported unless index-based analysis determined it to be dead.
if (ThinLTO.CombinedIndex.isGUIDLive(GUID))
ExportedGUIDs.insert(GUID);
}

Also, I think it would also be clearer to make this point "In both in-process and distributed ThinLTO, it's used to decide if a function or global variable is visible from another module after importing creates additional cross-module references". (note the addition at the end of sentence).

I see. I corrected the description and the link. PTAL, thanks!

} else
if (isGlobalVarSummary(Index, VI.getGUID()))
++NumGVS;
else
++DefinedFS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't DefinedFS be deduced from ExportSet.size() - NumGVS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the number of function summaries could be deduced. I simplified the function.

unsigned NumGVS =
numGlobalVarSummaries(Index, Exports, DefinedGVS, DefinedFS);
unsigned DefinedFS = 0;
unsigned NumGVS = numGlobalVarSummaries(Index, Exports, DefinedFS);
LLVM_DEBUG(dbgs() << "* Module " << ModName << " exports " << DefinedFS
<< " function as definitions, "
<< Exports.size() - NumGVS - DefinedFS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will result in 0, since I believe NumGVS + DefinedFS should be the same as Exports.size() (see my comment above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Simplified the function and updated function comment.

@mingmingl-llvm
Copy link
Contributor Author

@teresajohnson @jvoung I'm also experimenting with a change make ExportSetTy an unordered_set rather than a DenseSet (by implementing std::hash for ValueInfo). If it reduces the memory usage of indexing across multiple binaries (because the distribution of exported values per module shows most container size is smaller than 64), I'm considering that change as a follow up of this one.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index,
FunctionImporter::ExportSetTy &ExportSet,
unsigned &DefinedFS) {
// Return the number of global summaries in ExportSet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"global variable summaries" (to distinguish from global value summaries).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// summaries as output parameter. This is the same as `numGlobalVarSummaries`
// except that it takes `FunctionImporter::FunctionsToImportTy` as input
// parameter.
// Given ImportMap, return the number of global summaries and record the number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@mingmingl-llvm mingmingl-llvm merged commit af784a5 into llvm:main Jul 3, 2024
5 of 6 checks passed
@mingmingl-llvm mingmingl-llvm deleted the userblock branch July 3, 2024 20:15
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
…lvm#97360)

llvm#95482 is a reland of
llvm#88024.
llvm#95482 keeps indexing memory
usage reasonable by using unordered_map and doesn't make other changes
to originally reviewed code.

While discussing possible ways to minimize indexing memory usage, Teresa
asked whether I need `ExportSetTy` as a map or a set is sufficient. This
PR implements the idea. It uses a set rather than a map to track exposed
ValueInfos.

Currently, `ExportLists` has two use cases, and neither needs to track a
ValueInfo's import/export status. So using a set is sufficient and
correct.
1) In both in-process and distributed ThinLTO, it's used to decide if a
function or global variable is visible [1] from another module after importing
creates additional cross-module references.
     * If a cross-module call edge is seen today, the callee must be visible
       to another module without keeping track of its export status already.
       For instance, this [2] is how callees of direct calls get exported.
2) For in-process ThinLTO [3], it's used to compute lto cache key.
     * The cache key computation already hashes [4] 'ImportList' , and 'ExportList' is
        determined by 'ImportList'. So it's fine to not track 'import type' for export list.

[1] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1815-L1819
[2] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1783-L1794
[3] https://github.com/llvm/llvm-project/blob/66cd8ec4c08252ebc73c82e4883a8da247ed146b/llvm/lib/LTO/LTO.cpp#L1494-L1496
[4] https://github.com/llvm/llvm-project/blob/b76100e220591fab2bf0a4917b216439f7aa4b09/llvm/lib/LTO/LTO.cpp#L194-L222
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants