Compare commits

...

132 commits

Author SHA1 Message Date
XMRig
f9e990d0f0
v6.22.2 2024-11-03 14:38:44 +07:00
XMRig
200f23bba7
Merge branch 'dev' 2024-11-03 14:38:00 +07:00
xmrig
4234b20e21
Update CHANGELOG.md 2024-11-03 14:31:17 +07:00
xmrig
c5d8b8265b
Merge pull request from SChernykh/dev
Fix number of threads on the new Intel Core Ultra CPUs
2024-10-25 20:55:35 +07:00
SChernykh
77c14c8362 Fix number of threads on the new Intel Core Ultra CPUs 2024-10-25 13:44:24 +02:00
xmrig
8b03750806
Merge pull request from SChernykh/dev
Fix: don't use NaN in hashrate calculations
2024-10-23 17:18:36 +07:00
SChernykh
40949f2767 Fix: don't use NaN in hashrate calculations 2024-10-23 11:40:27 +02:00
XMRig
56c447e02a
v6.22.2-dev 2024-10-23 13:36:56 +07:00
XMRig
21c206f05d
Merge branch 'master' into dev 2024-10-23 13:36:19 +07:00
XMRig
ee65b3d159
v6.22.1 2024-10-23 12:53:06 +07:00
XMRig
1f75d198d8
Merge branch 'dev' 2024-10-23 12:52:16 +07:00
xmrig
5cf2422766
Update CHANGELOG.md 2024-10-22 17:34:07 +07:00
XMRig
a32f9b5b04
Fixed --version output on ARM. 2024-10-21 08:48:58 +07:00
XMRig
8a4792f638
Update hwloc for MSVC. 2024-10-21 08:31:52 +07:00
XMRig
e32731b60b
Update deps 2024-10-20 09:49:06 +07:00
xmrig
e1ae367084
Merge pull request from SChernykh/dev
Detect AMD engineering samples in randomx_boost.sh
2024-08-29 19:50:43 +07:00
SChernykh
bc1c8358c4 Detect AMD engineering samples in randomx_boost.sh 2024-08-29 14:47:30 +02:00
xmrig
e0af8f0c6b
Merge pull request from SChernykh/dev
Added Zen5 to randomx_boost.sh
2024-08-28 18:51:39 +07:00
SChernykh
29f9c8cf4c Added Zen5 to randomx_boost.sh 2024-08-28 13:49:27 +02:00
xmrig
26f4936f6f
Merge pull request from SChernykh/dev
RandomX: tweaks for Zen5
2024-08-20 06:47:30 +07:00
SChernykh
a411ee3565 RandomX: tweaks for Zen5 2024-08-19 21:01:49 +02:00
xmrig
01bd0d48a1
Merge pull request from SChernykh/dev
Fixed threads auto-config on Zen5
2024-08-17 06:23:49 +07:00
SChernykh
20d555668b Fixed threads auto-config on Zen5 2024-08-16 23:36:22 +02:00
xmrig
56baec762f
Merge pull request from SChernykh/dev
Always reset nonce on RandomX dataset change
2024-08-14 22:16:34 +07:00
SChernykh
17a52fb418 Always reset nonce on RandomX dataset change
Also never get a new job when mining is paused
2024-08-14 16:41:03 +02:00
XMRig
7e4caa8929
Merge remote-tracking branch 'remotes/origin/master' into dev 2024-08-12 03:02:19 +07:00
xmrig
ef14d55aa5
Merge pull request from eltociear/patch-1
docs: update ghostrider/README.md
2024-08-12 03:01:13 +07:00
XMRig
5776fdcc20
v6.22.1-dev 2024-08-12 02:15:08 +07:00
XMRig
fe0f69031b
Merge branch 'master' into dev 2024-08-12 02:14:40 +07:00
Ikko Eltociear Ashimine
e682f89298
docs: update ghostrider/README.md
nubmer -> number
2024-08-12 03:54:26 +09:00
XMRig
544c393f78
v6.22.0 2024-08-12 01:13:51 +07:00
XMRig
9da6ea07bd
Merge branch 'dev' 2024-08-12 01:13:29 +07:00
XMRig
62bcd6e5dc
v6.22.0-dev 2024-08-10 22:00:42 +07:00
xmrig
c5f98fc5c7
Merge pull request from SChernykh/dev
Added rx/yada OpenCL support
2024-08-07 13:36:55 +07:00
SChernykh
ecb3ec0317 Added rx/yada OpenCL support 2024-08-07 00:18:51 +02:00
XMRig
3dfeed475f
Sync changes with the proxy. 2024-08-06 23:32:20 +07:00
XMRig
98c775703e
Don't generate "rx/yada" profile, use the "rx" profile by default. 2024-08-04 20:00:12 +07:00
XMRig
8da49f2650
More clean target parse. 2024-08-04 19:51:11 +07:00
xmrig
4570187459
Merge pull request from SChernykh/dev
Added Zen5 detection
2024-08-03 22:58:00 +07:00
SChernykh
748365d6e3 Added Zen5 detection
Preliminary Zen5 support, MSR mod is not ready yet.
2024-08-03 11:01:18 +02:00
xmrig
dd7e0e520d
Merge pull request from SChernykh/dev
Fixed ARMv8 compilation
2024-08-02 23:47:21 +07:00
SChernykh
ef6fb728b5 Fixed ARMv8 compilation 2024-08-02 17:51:08 +02:00
xmrig
92ffcd34d6
Merge pull request from pdxwebdev/feature/yadacoin
Added support for Yada (rx/yada algorithm)
2024-08-02 16:22:50 +07:00
Matthew Vogel
b108845627 fix yada nonce offset 2024-08-01 15:10:20 -07:00
Matthew Vogel
046b2a17d3 finish updating for yadacoin 2024-08-01 00:01:09 -07:00
Matthew Vogel
5342f25fbf update constants for yadacoin 2024-07-31 23:45:34 -07:00
Matthew Vogel
5f6bcfe949 add yada constants 2024-07-31 23:26:37 -07:00
xmrig
ecef382326
Merge pull request from SChernykh/dev
Removed rx/keva
2024-07-31 15:41:25 +07:00
SChernykh
86f5db19d2 Removed rx/keva
Keva coin is too small now.
2024-07-31 08:28:05 +02:00
xmrig
b4a47d6ed0
Merge pull request from SChernykh/dev
Make Json::normalize more strict
2024-07-29 22:27:29 +07:00
SChernykh
f5095247e8 Make Json::normalize more strict
Rounding a regular FP value can give an invalid result - check the result too.
2024-07-29 17:14:21 +02:00
XMRig
2bb07fe633
Update build scripts for OpenSSL. 2024-07-24 21:02:53 +07:00
XMRig
a7be8cb80c
Remove chdir call after fork. 2024-06-05 03:45:37 +07:00
XMRig
2ce16df423
Create signal handles after fork() call, replace . 2024-06-05 03:23:58 +07:00
XMRig
5eaa6c152e
v6.21.4-dev 2024-04-23 16:51:58 +07:00
XMRig
6972f727c1
Merge branch 'master' into dev 2024-04-23 16:50:58 +07:00
XMRig
7897f10c48
v6.21.3 2024-04-23 16:27:24 +07:00
XMRig
da2fb331b3
Merge branch 'dev' 2024-04-23 16:26:18 +07:00
xmrig
57f3e9c3da
Update CHANGELOG.md 2024-04-23 16:17:26 +07:00
xmrig
1efe7e9562
Merge pull request from SChernykh/dev
RandomX: correct memcpy size for JIT initialization
2024-04-14 17:01:16 +07:00
SChernykh
caae7c64f0 RandomX: correct memcpy size for JIT initialization
No buffer overflow, better fix for `_FORTIFY_SOURCE`
2024-04-14 09:13:00 +02:00
xmrig
9fbdcc0ef0
Merge pull request from SChernykh/dev
RandomX: check pointer sizes during JIT initialization
2024-04-14 05:38:53 +07:00
SChernykh
c7c26d97fe RandomX: check pointer sizes during JIT initialization 2024-04-13 20:32:16 +02:00
XMRig
1f7e635b04
Use internal logger for error message. 2024-03-26 21:46:18 +07:00
XMRig
1c5786e3c5
v6.21.3-dev 2024-03-23 16:21:54 +07:00
XMRig
44eb4f0038
Merge branch 'master' into dev 2024-03-23 16:20:24 +07:00
XMRig
4ab9329dda
v6.21.2 2024-03-23 13:38:42 +07:00
XMRig
0c2ee013a7
Merge branch 'dev' 2024-03-23 13:38:05 +07:00
xmrig
3347537635
Update CHANGELOG.md 2024-03-23 00:46:15 +07:00
XMRig
7a85257ad4
Update hwloc for MSVC builds. 2024-03-22 18:14:39 +07:00
XMRig
850b43c079
Fix build with recent libuv. 2024-03-22 01:22:54 +07:00
XMRig
b8e4eaac87
Fix rapidjson assert. 2024-03-21 21:03:35 +07:00
xmrig
b9dd5e3eae
Merge pull request from SChernykh/dev
Fix RandomX crash when compiled with fortify_source
2024-03-21 04:09:05 +07:00
SChernykh
032c28d50a Merge remote-tracking branch 'upstream/dev' into dev 2024-03-20 21:24:58 +01:00
SChernykh
f6c50b5393 Fix RandomX crash when compiled with fortify_source 2024-03-20 21:24:02 +01:00
SChernykh
e65e283aac Merge remote-tracking branch 'upstream/dev' into dev 2024-03-20 21:22:11 +01:00
XMRig
5552e1f864
Fix scripts for systems without bash. 2024-03-21 02:13:01 +07:00
XMRig
3beccae136
Merge branch 'goodmost-master' into dev 2024-03-20 14:11:53 +07:00
XMRig
ef9bf2aa8c
Merge branch 'master' of https://github.com/goodmost/xmrig into goodmost-master 2024-03-20 14:11:28 +07:00
XMRig
42f645fa3b
Merge branch 'dev' of github.com:xmrig/xmrig into dev 2024-03-20 00:25:21 +07:00
XMRig
1fb5be6c1d
Update deps. 2024-03-20 00:24:46 +07:00
goodmost
08c43b7e58 chore: remove repetitive words
Signed-off-by: goodmost <zhaohaiyang@outlook.com>
2024-03-19 23:19:36 +08:00
xmrig
7b016fd9ce
Merge pull request from SChernykh/dev
Thread-safe FileLogWriter
2024-03-14 21:46:45 +07:00
SChernykh
688d4f5ee1 Thread-safe FileLogWriter 2024-03-04 08:45:22 +01:00
xmrig
64913e3163
Merge pull request from SChernykh/dev
Update bug_report.md
2024-02-29 14:33:07 +07:00
SChernykh
48fa095e3e Update bug_report.md 2024-02-29 08:31:16 +01:00
XMRig
c9b9ef51ee
Fixed donation with ghostrider algorithm for builds without KawPow algorithm. 2024-02-29 09:38:47 +07:00
xmrig
dd782c7001
Merge pull request from SChernykh/dev
Stratum: better check of the login response
2024-02-28 11:25:34 +07:00
SChernykh
b49197f808 Stratum: better check of the login response 2024-02-27 23:39:23 +01:00
XMRig
f9c4c57216
v6.21.2-dev 2024-02-25 23:00:45 +07:00
XMRig
a5b8b85967
Merge branch 'master' into dev 2024-02-25 23:00:11 +07:00
XMRig
a5aa2c9042
v6.21.1 2024-02-25 22:26:52 +07:00
XMRig
fa35a32eee
Merge branch 'dev' 2024-02-25 22:25:41 +07:00
XMRig
7b6ce59821
Update CHANGELOG.md. 2024-02-22 03:26:41 +07:00
XMRig
33315ba2ef
Merge branch 'Daviey-HTTPRebindSegFault' into dev 2024-02-12 14:51:34 +07:00
XMRig
2c9c40d623
Merge branch 'HTTPRebindSegFault' of https://github.com/Daviey/xmrig into Daviey-HTTPRebindSegFault 2024-02-12 14:50:48 +07:00
Dave Walker (Daviey)
daa6328418 Fix segfault in HTTP API rebind
Previously with HTTP API enabled on brenchmarking run, it is possible
to cause a segfault due to an issue handling the m_httpd pointer and
rebinding.

  - Initialize m_httpd to nullptr to indicate when it's not in use.
  - Safely delete m_httpd in Api's destructor to prevent use-after-free
    issues.
  - Add checks to ensure m_httpd is not nullptr before usage in start,
    stop, and tick methods.
  - Log errors for HTTP server start failures to aid in debugging.

Fixes 

Signed-off-by: Dave Walker (Daviey) <email@daviey.com>
2024-02-11 17:52:36 +00:00
XMRig
8afd4d5f2f
Cleanup. 2024-01-17 00:31:16 +07:00
xmrig
77e2f3a028
Merge pull request from SChernykh/dev
Fixed Zephyr mining (OpenCL)
2024-01-14 09:01:44 +07:00
SChernykh
206295c6cb Fixed Zephyr mining (OpenCL) 2024-01-13 20:14:08 +01:00
XMRig
07e1e77c4f
Code style cleanup. 2023-12-29 21:17:19 +07:00
xmrig
50a98a4bb1
Merge pull request from moneromooo-monero/tf-dev
add support for townforge (monero fork using randomx)
2023-12-27 23:13:54 +07:00
moneromooo-monero
c50369d65d
add support for townforge (monero fork using randomx) 2023-12-23 15:31:05 +00:00
XMRig
592b0c9c76
v6.21.1-dev 2023-11-23 21:19:36 +07:00
XMRig
89eab0eff2
Merge branch 'master' into dev 2023-11-23 21:18:21 +07:00
XMRig
8084ff37a5
v6.21.0 2023-11-23 20:40:58 +07:00
XMRig
7cf3db7750
Merge branch 'dev' 2023-11-23 20:40:34 +07:00
XMRig
4bda6e054d
v6.21.0-dev 2023-11-23 19:51:41 +07:00
xmrig
64a0ed413b
Merge pull request from SChernykh/dev
Zephyr solo mining: handle multiple outputs
2023-11-15 22:36:35 +07:00
SChernykh
0b59b7eb43 Zephyr solo mining: handle multiple outputs 2023-11-15 16:18:05 +01:00
xmrig
ae6b10b5a4
Merge pull request from SChernykh/dev
Updated pricing record size for Zephyr solo mining
2023-11-15 08:27:02 +07:00
SChernykh
705a7eac0c Updated pricing record size for Zephyr solo mining 2023-11-14 13:06:10 +01:00
xmrig
10bfffe033
Merge pull request from SChernykh/dev
Update to latest sse2neon.h
2023-10-31 11:52:38 +07:00
SChernykh
4131aa4754 Update sse2neon.h 2023-10-30 20:07:03 +01:00
xmrig
fee51b20fa
Merge pull request from SChernykh/dev
ARM64 JIT: don't use `x18` register
2023-10-20 07:36:12 +07:00
SChernykh
5e66efabcf ARM64 JIT: don't use x18 register
From https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms
> The platforms reserve register x18. Don’t use this register.

This PR fixes invalid hashes when running on Apple silicon with the latest macOS SDK.
2023-10-19 17:45:15 +02:00
XMRig
08901a9a4b
Merge branch 'JacksonZ03-main' into dev 2023-10-09 15:15:32 +07:00
XMRig
a19f590ee6
Merge branch 'main' of https://github.com/JacksonZ03/xmrig into JacksonZ03-main 2023-10-09 15:14:50 +07:00
Jackson Zheng
2fa754825d
Update cn_main_loop.asm
Found this line to be missing. I looked through the history and seemed like the original author of the commit missed it out.
2023-10-08 23:29:52 +01:00
Jackson Zheng
f3446c0a94
Update cn_main_loop.asm
I was scanning the code and found this line to be missing. Not sure if this was a mistake or if it was intentionally left out?
2023-10-08 23:12:58 +01:00
xmrig
71209d4cd7
Merge pull request from SChernykh/dev
Added SNI option for TLS connections
2023-09-29 19:15:29 +07:00
SChernykh
0a3313cb76 Added SNI option for TLS connections
Disabled by default, add `"sni": true,` to pool config to enable it.
2023-09-29 08:33:49 +02:00
xmrig
e855723cd9
Merge pull request from SChernykh/dev
Add "built for OS/architecture/bits" to "ABOUT"
2023-08-21 19:00:14 +07:00
SChernykh
6e294bd046 Add "built for OS/architecture/bits" to "ABOUT"
To make it more clear what binary it is on some XMRig screenshot.
2023-08-21 13:49:21 +02:00
XMRig
dfe70d9ea7
Fixed huge pages availability info on Linux. 2023-08-08 17:48:44 +07:00
XMRig
2ecf10cdcb
Make Platform::hasKeepalive() constexpr where always supported and code cleanup. 2023-08-06 20:26:07 +07:00
xmrig
b55ca8e547
Merge pull request from SChernykh/dev
Disable TCP keepalive before closing socket
2023-08-06 20:14:37 +07:00
SChernykh
12577df7ba Disable TCP keepalive before closing socket 2023-08-06 14:51:25 +02:00
xmrig
64f5bb467a
Merge pull request from SChernykh/dev
Enabled keepalive for Windows (>= Vista)
2023-07-17 17:17:39 +07:00
SChernykh
5717e72367 Enabled keepalive for Windows (>= Vista) 2023-07-17 09:49:10 +02:00
XMRig
e7de104d88
v6.20.1-dev 2023-07-03 18:47:55 +07:00
XMRig
3b5e04b1b7
Merge branch 'master' into dev 2023-07-03 18:47:22 +07:00
138 changed files with 6950 additions and 4508 deletions

View file

@ -17,6 +17,9 @@ Steps to reproduce the behavior.
A clear and concise description of what you expected to happen.
**Required data**
- XMRig version
- Either the exact link to a release you downloaded from https://github.com/xmrig/xmrig/releases
- Or the exact command lines that you used to build XMRig
- Miner log as text or screenshot
- Config file or command line (without wallets)
- OS: [e.g. Windows]

View file

@ -1,3 +1,46 @@
# v6.22.2
- [#3569](https://github.com/xmrig/xmrig/pull/3569) Fixed corrupted API output in some rare conditions.
- [#3571](https://github.com/xmrig/xmrig/pull/3571) Fixed number of threads on the new Intel Core Ultra CPUs.
# v6.22.1
- [#3531](https://github.com/xmrig/xmrig/pull/3531) Always reset nonce on RandomX dataset change.
- [#3534](https://github.com/xmrig/xmrig/pull/3534) Fixed threads auto-config on Zen5.
- [#3535](https://github.com/xmrig/xmrig/pull/3535) RandomX: tweaks for Zen5.
- [#3539](https://github.com/xmrig/xmrig/pull/3539) Added Zen5 to `randomx_boost.sh`.
- [#3540](https://github.com/xmrig/xmrig/pull/3540) Detect AMD engineering samples in `randomx_boost.sh`.
# v6.22.0
- [#2411](https://github.com/xmrig/xmrig/pull/2411) Added support for [Yada](https://yadacoin.io/) (`rx/yada` algorithm).
- [#3492](https://github.com/xmrig/xmrig/pull/3492) Fixed `--background` option on Unix systems.
- [#3518](https://github.com/xmrig/xmrig/pull/3518) Possible fix for corrupted API output in rare cases.
- [#3522](https://github.com/xmrig/xmrig/pull/3522) Removed `rx/keva` algorithm.
- [#3525](https://github.com/xmrig/xmrig/pull/3525) Added Zen5 detection.
- [#3528](https://github.com/xmrig/xmrig/pull/3528) Added `rx/yada` OpenCL support.
# v6.21.3
- [#3462](https://github.com/xmrig/xmrig/pull/3462) RandomX: correct memcpy size for JIT initialization.
# v6.21.2
- The dependencies of all prebuilt releases have been updated. Support for old Ubuntu releases has been dropped.
- [#2800](https://github.com/xmrig/xmrig/issues/2800) Fixed donation with GhostRider algorithm for builds without KawPow algorithm.
- [#3436](https://github.com/xmrig/xmrig/pull/3436) Fixed, the file log writer was not thread-safe.
- [#3450](https://github.com/xmrig/xmrig/pull/3450) Fixed RandomX crash when compiled with fortify_source.
# v6.21.1
- [#3391](https://github.com/xmrig/xmrig/pull/3391) Added support for townforge (monero fork using randomx).
- [#3399](https://github.com/xmrig/xmrig/pull/3399) Fixed Zephyr mining (OpenCL).
- [#3420](https://github.com/xmrig/xmrig/pull/3420) Fixed segfault in HTTP API rebind.
# v6.21.0
- [#3302](https://github.com/xmrig/xmrig/pull/3302) [#3312](https://github.com/xmrig/xmrig/pull/3312) Enabled keepalive for Windows (>= Vista).
- [#3320](https://github.com/xmrig/xmrig/pull/3320) Added "built for OS/architecture/bits" to "ABOUT".
- [#3339](https://github.com/xmrig/xmrig/pull/3339) Added SNI option for TLS connections.
- [#3342](https://github.com/xmrig/xmrig/pull/3342) Update `cn_main_loop.asm`.
- [#3346](https://github.com/xmrig/xmrig/pull/3346) ARM64 JIT: don't use `x18` register.
- [#3348](https://github.com/xmrig/xmrig/pull/3348) Update to latest `sse2neon.h`.
- [#3356](https://github.com/xmrig/xmrig/pull/3356) Updated pricing record size for **Zephyr** solo mining.
- [#3358](https://github.com/xmrig/xmrig/pull/3358) **Zephyr** solo mining: handle multiple outputs.
# v6.20.0
- Added new ARM CPU names.
- [#2394](https://github.com/xmrig/xmrig/pull/2394) Added new CMake options `ARM_V8` and `ARM_V7`.

View file

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.1)
cmake_minimum_required(VERSION 3.5)
project(xmrig)
option(WITH_HWLOC "Enable hwloc support" ON)
@ -162,7 +162,7 @@ if (XMRIG_OS_WIN)
src/crypto/common/VirtualMemory_win.cpp
)
set(EXTRA_LIBS ws2_32 psapi iphlpapi userenv)
set(EXTRA_LIBS ws2_32 psapi iphlpapi userenv dbghelp)
elseif (XMRIG_OS_APPLE)
list(APPEND SOURCES_OS
src/App_unix.cpp

View file

@ -13,7 +13,6 @@ Option `coin` useful for pools without [algorithm negotiation](https://xmrig.com
| Name | Memory | Version | Description | Notes |
|------|--------|---------|-------------|-------|
| `kawpow` | - | 6.0.0+ | KawPow (Ravencoin) | GPU only |
| `rx/keva` | 1 MB | 5.9.0+ | RandomKEVA (RandomX variant for Keva). | |
| `astrobwt` | 20 MB | 5.8.0+ | AstroBWT (Dero). | |
| `cn-pico/tlo` | 256 KB | 5.5.0+ | CryptoNight-Pico (Talleo). | |
| `rx/sfx` | 2 MB | 5.4.0+ | RandomSFX (RandomX variant for Safex). | |

View file

@ -256,7 +256,7 @@
# v2.8.0
- **[#753](https://github.com/xmrig/xmrig/issues/753) Added new algorithm [CryptoNight variant 2](https://github.com/xmrig/xmrig/issues/753) for Monero fork, thanks [@SChernykh](https://github.com/SChernykh).**
- Added global and per thread option `"asm"` and and command line equivalent.
- Added global and per thread option `"asm"` and command line equivalent.
- **[#758](https://github.com/xmrig/xmrig/issues/758) Added SSL/TLS support for secure connections to pools.**
- Added per pool options `"tls"` and `"tls-fingerprint"` and command line equivalents.
- [#767](https://github.com/xmrig/xmrig/issues/767) Added config autosave feature, same with GPU miners.

View file

@ -1,8 +1,8 @@
#!/bin/bash -e
#!/bin/sh -e
HWLOC_VERSION_MAJOR="2"
HWLOC_VERSION_MINOR="9"
HWLOC_VERSION_PATCH="0"
HWLOC_VERSION_MINOR="11"
HWLOC_VERSION_PATCH="2"
HWLOC_VERSION="${HWLOC_VERSION_MAJOR}.${HWLOC_VERSION_MINOR}.${HWLOC_VERSION_PATCH}"

View file

@ -1,4 +1,4 @@
#!/bin/bash -e
#!/bin/sh -e
HWLOC_VERSION="1.11.13"

View file

@ -1,4 +1,4 @@
#!/bin/bash -e
#!/bin/sh -e
LIBRESSL_VERSION="3.5.2"

View file

@ -1,6 +1,6 @@
#!/bin/bash -e
#!/bin/sh -e
OPENSSL_VERSION="1.1.1s"
OPENSSL_VERSION="1.1.1u"
mkdir -p deps
mkdir -p deps/include
@ -8,7 +8,7 @@ mkdir -p deps/lib
mkdir -p build && cd build
wget https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz -O openssl-${OPENSSL_VERSION}.tar.gz
wget https://openssl.org/source/old/1.1.1/openssl-${OPENSSL_VERSION}.tar.gz -O openssl-${OPENSSL_VERSION}.tar.gz
tar -xzf openssl-${OPENSSL_VERSION}.tar.gz
cd openssl-${OPENSSL_VERSION}

View file

@ -1,6 +1,6 @@
#!/bin/bash -e
#!/bin/sh -e
OPENSSL_VERSION="3.0.7"
OPENSSL_VERSION="3.0.15"
mkdir -p deps
mkdir -p deps/include
@ -8,7 +8,7 @@ mkdir -p deps/lib
mkdir -p build && cd build
wget https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz -O openssl-${OPENSSL_VERSION}.tar.gz
wget https://github.com/openssl/openssl/releases/download/openssl-${OPENSSL_VERSION}/openssl-${OPENSSL_VERSION}.tar.gz -O openssl-${OPENSSL_VERSION}.tar.gz
tar -xzf openssl-${OPENSSL_VERSION}.tar.gz
cd openssl-${OPENSSL_VERSION}

View file

@ -1,6 +1,6 @@
#!/bin/bash -e
#!/bin/sh -e
UV_VERSION="1.44.2"
UV_VERSION="1.49.2"
mkdir -p deps
mkdir -p deps/include
@ -8,10 +8,10 @@ mkdir -p deps/lib
mkdir -p build && cd build
wget https://github.com/libuv/libuv/archive/v${UV_VERSION}.tar.gz -O v${UV_VERSION}.tar.gz
wget https://dist.libuv.org/dist/v${UV_VERSION}/libuv-v${UV_VERSION}.tar.gz -O v${UV_VERSION}.tar.gz
tar -xzf v${UV_VERSION}.tar.gz
cd libuv-${UV_VERSION}
cd libuv-v${UV_VERSION}
sh autogen.sh
./configure --disable-shared
make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)

View file

@ -1,5 +1,5 @@
#!/bin/bash -e
#!/bin/sh -e
./build.uv.sh
./build.hwloc.sh
./build.openssl.sh
./build.openssl3.sh

View file

@ -1,4 +1,4 @@
#!/bin/bash -e
#!/bin/sh -e
# https://xmrig.com/docs/miner/hugepages#onegb-huge-pages

View file

@ -50,7 +50,6 @@ function rx()
'randomx_constants_monero.h',
'randomx_constants_wow.h',
'randomx_constants_arqma.h',
'randomx_constants_keva.h',
'randomx_constants_graft.h',
'aes.cl',
'blake2b.cl',

View file

@ -8,7 +8,7 @@ else
modprobe msr allow_writes=on
fi
if grep -E 'AMD Ryzen|AMD EPYC' /proc/cpuinfo > /dev/null;
if grep -E 'AMD Ryzen|AMD EPYC|AuthenticAMD' /proc/cpuinfo > /dev/null;
then
if grep "cpu family[[:space:]]\{1,\}:[[:space:]]25" /proc/cpuinfo > /dev/null;
then
@ -28,6 +28,14 @@ if grep -E 'AMD Ryzen|AMD EPYC' /proc/cpuinfo > /dev/null;
wrmsr -a 0xc001102b 0x2000cc10
echo "MSR register values for Zen3 applied"
fi
elif grep "cpu family[[:space:]]\{1,\}:[[:space:]]26" /proc/cpuinfo > /dev/null;
then
echo "Detected Zen5 CPU"
wrmsr -a 0xc0011020 0x4400000000000
wrmsr -a 0xc0011021 0x4000000000040
wrmsr -a 0xc0011022 0x8680000401570000
wrmsr -a 0xc001102b 0x2040cc10
echo "MSR register values for Zen5 applied"
else
echo "Detected Zen1/Zen2 CPU"
wrmsr -a 0xc0011020 0

View file

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.1)
cmake_minimum_required(VERSION 3.5)
project(argon2 C)
set(CMAKE_C_STANDARD 99)

View file

@ -1 +1 @@
epee - is a small library of helpers, wrappers, tools and and so on, used to make my life easier.
epee - is a small library of helpers, wrappers, tools and so on, used to make my life easier.

View file

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.1)
cmake_minimum_required(VERSION 3.5)
project (hwloc C)
include_directories(include)

View file

@ -1,5 +1,5 @@
Copyright © 2009 CNRS
Copyright © 2009-2022 Inria. All rights reserved.
Copyright © 2009-2024 Inria. All rights reserved.
Copyright © 2009-2013 Université Bordeaux
Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
Copyright © 2020 Hewlett Packard Enterprise. All rights reserved.
@ -17,6 +17,168 @@ bug fixes (and other actions) for each version of hwloc since version
0.9.
Version 2.11.2
--------------
* Add missing CPU info attrs on aarch64 on Linux.
* Use ACPI CPPC on Linux to get better information about cpukinds,
at least on AMD CPUs.
* Fix crash when manipulating cpukinds after topology
duplication, thanks to Hadrien Grasland for the report.
* Fix missing input target checks in memattr functions,
thanks to Hadrien Grasland for the report.
* Fix a memory leak when ignoring NUMA distances on FreeBSD.
* Fix build failure on old Linux distributions without accessat().
* Fix non-Windows importing of XML topologies and CPUID dumps exported
on Windows.
* hwloc-calc --cpuset-output-format systemd-dbus-api now allows
to generate AllowedCPUs information for systemd slices.
See the hwloc-calc manpage for examples. Thanks to Pierre Neyron.
* Some fixes in manpage EXAMPLES and split them into subsections.
Version 2.11.1
--------------
* Fix bash completions, thanks Tavis Rudd.
Version 2.11.0
--------------
* API
+ Add HWLOC_MEMBIND_WEIGHTED_INTERLEAVE memory binding policy on
Linux 6.9+. Thanks to Honggyu Kim for the patch.
- weighted_interleave_membind is added to membind support bits.
- The "weighted" policy is added to the hwloc-bind tool.
+ Add hwloc_obj_set_subtype(). Thanks to Hadrien Grasland for the report.
* GPU support
+ Don't hide the GPU NUMA node on NVIDIA Grace Hopper.
+ Get Intel GPU OpenCL device locality.
+ Add bandwidths between subdevices in the LevelZero XeLinkBandwidth
matrix.
+ Fix PCI Gen4+ link speed of NVIDIA GPU obtained from NVML,
thanks to Akram Sbaih for the report.
* Windows support
+ Fix Windows support when UNICODE is enabled, several hwloc features
were missing, thanks to Martin for the report.
+ Fix the enabling of CUDA in Windows CMake build,
Thanks to Moritz Kreutzer for the patch.
+ Fix CUDA/OpenCL test source path in Windows CMake.
* Tools
+ Option --best-memattr may now return multiple nodes. Additional
configuration flags may be given to tweak its behavior.
+ hwloc-info has a new --get-attr option to get a single attribute.
+ hwloc-info now supports "levels", "support" and "topology"
special keywords for backward compatibility for hwloc 3.0.
+ The --taskset command-line option is superseded by the new
--cpuset-output-format which also allows to export as list.
+ hwloc-calc may now import bitmasks described as a list of bits
with the new "--cpuset-input-format list".
* Misc
+ The MemoryTiersNr info attribute in the root object now says how many
memory tiers were built. Thanks to Antoine Morvan for the report.
+ Fix the management of infinite cpusets in the bitmap printf/sscanf
API as well as in command-line tools.
+ Add section "Compiling software on top of hwloc's C API" in the
documentation with examples for GNU Make and CMake,
thanks to Florent Pruvost for the help.
Version 2.10.0
--------------
* Heterogeneous Memory core improvements
+ Better heuristics to identify the subtype of memory such as HBM,
DRAM, NVM, CXL-DRAM, etc.
+ Build memory tiers, i.e. sets of NUMA nodes with the same subtype
and similar performance.
- NUMA node tier ranks are exposed in the new MemoryTier info
attribute (starts from 0 for highest bandwidth tier)..
+ See the new Heterogeneous Memory section in the documentation.
* API
+ Add hwloc_topology_free_group_object() to discard a Group created
by hwloc_topology_alloc_group_object().
* Linux backend
+ Fix cpukinds on NVIDIA Grace to report identical cores even if they
actually have very small frequency differences.
Thanks to John C. Linford for the report.
+ Add CXLDevice attributes to CXL DAX objects and NUMA nodes to show
which PCI device implements which window.
+ Ignore buggy memory-side caches and memory attributes when fake NUMA
emulation is enabled on the Linux kernel command-line.
+ Add more info attributes in MemoryModule Misc objects,
thanks to Zubiao Xiong for the patch.
+ Get CPUModel and CPUFamily info attributes on LoongArch platforms.
* x86 backend
+ Add support for new AMD CPUID leaf 0x80000026 for better detection
of Core Complex and Die on Zen4 processors.
+ Improve Zhaoxin CPU topology detection.
* Tools
+ Input locations and many command-line options (e.g. hwloc-calc -I -N -H,
lstopo --only) now accept filters such as "NUMA[HBM]" so that only
objects are that type and subtype are considered.
- NUMA[tier=1] is also accepted for selecting NUMA nodes depending
on their MemoryTier info attribute.
+ Add --object-output to hwloc-calc to report the type as a prefix to
object indexes, e.g. Core:2 instead of 2 in the output of -I.
+ hwloc-info --ancestor and --descendants now accepts kinds of objects
instead of single types.
- The new --first option only shows the first matching object.
+ Add --children-of-pid to hwloc-ps to show a hierarchy of processes.
Thanks to Antoine Morvan for the suggestion.
+ Add --misc-from to lstopo to add Misc objects described in a file.
- To be combined with the new hwloc-ps --lstopo-misc for a customizable
lstopo --top replacement.
* Misc
+ lstopo may now configure the layout of memory object placed above,
for instance with --children-order memory:above:vert.
+ Fix XML import from memory or stdin when using libxml2 2.12.
+ Fix installation failures when configuring with --target,
thanks to Clement Foyer for the patch.
+ Fix support for 128bit pointer architectures.
+ Remove Netloc.
Version 2.9.3
-------------
* Handle Linux glibc allocation errors in binding routines (CVE-2022-47022).
* Fix hwloc-calc when searching objects on heterogeneous memory platforms,
thanks to Antoine Morvan for the report.
* Fix hwloc_get_next_child() when there are some memory-side caches.
* Don't crash if the topology is empty because Linux cgroups are wrong.
* Improve some hwloc-bind warnings in case of command-line parsing errors.
* Many documentation improvements all over the place, including:
+ hwloc_topology_restrict() and hwloc_topology_insert_group() may reorder
children, causing the logical indexes of objects to change.
Version 2.9.2
-------------
* Don't forget L3i when defining filters for multiple levels of caches
with hwloc_topology_set_cache/icache_types_filter().
* Fix object total_memory after hwloc_topology_insert_group_object().
* Fix the (non-yet) exporting in synthetic description for complex memory
hierarchies with memory-side caches, etc.
* Fix some default size attributes when building synthetic topologies.
* Fix size units in hwloc-annotate.
* Improve bitmap reallocation error management in many functions.
* Documentation improvements:
+ Better document return values of functions.
+ Add "Error reporting" section (in hwloc.h and in the doxygen doc).
+ Add FAQ entry "What may I disable to make hwloc faster?"
+ Improve FAQ entries "Why is lstopo slow?" and
"I only need ..., why should I use hwloc?"
+ Clarify how to deal with cpukinds in hwloc-calc and hwloc-bind
manpages.
Version 2.9.1
-------------
* Don't forget to apply object type filters to "perflevel" caches detected
on recent Mac OS X releases, thanks to Michel Lesoinne for the report.
* Fix a failed assertion in hwloc_topology_restrict() when some NUMA nodes
are removed because of HWLOC_RESTRICT_FLAG_REMOVE_CPULESS but no PUs are.
Thanks to Mark Grondona for reporting the issue.
* Mark HPE Cray Slingshot NICs with subtype "Slingshot".
Version 2.9.0
-------------
* Backends
@ -61,6 +223,14 @@ Version 2.8.0
file from the documentation.
Version 2.7.2
-------------
* Fix a crash when LevelZero devices have multiple subdevices,
e.g. on PonteVecchio GPUs, thanks to Jonathan Peyton.
* Fix a leak when importing cpukinds from XML,
thanks to Hui Zhou.
Version 2.7.1
-------------
* Workaround crashes when virtual machines report incoherent x86 CPUID

View file

@ -1,4 +1,8 @@
Introduction
This is a truncated and poorly-formatted version of the documentation main page.
See https://www.open-mpi.org/projects/hwloc/doc/ for more.
hwloc Overview
The Hardware Locality (hwloc) software project aims at easing the process of
discovering hardware resources in parallel architectures. It offers
@ -8,66 +12,450 @@ high-performance computing (HPC) applications, but is also applicable to any
project seeking to exploit code and/or data locality on modern computing
platforms.
hwloc is actually made of two subprojects distributed together:
hwloc provides command line tools and a C API to obtain the hierarchical map of
key computing elements within a node, such as: NUMA memory nodes, shared
caches, processor packages, dies and cores, processing units (logical
processors or "threads") and even I/O devices. hwloc also gathers various
attributes such as cache and memory information, and is portable across a
variety of different operating systems and platforms.
* The original hwloc project for describing the internals of computing nodes.
It is described in details starting at section Hardware Locality (hwloc)
Introduction.
* The network-oriented companion called netloc (Network Locality), described
in details starting with section Network Locality (netloc).
hwloc primarily aims at helping high-performance computing (HPC) applications,
but is also applicable to any project seeking to exploit code and/or data
locality on modern computing platforms.
See also the Related pages tab above for links to other sections.
hwloc supports the following operating systems:
Netloc may be disabled, but the original hwloc cannot. Both hwloc and netloc
APIs are documented after these sections.
* Linux (with knowledge of cgroups and cpusets, memory targets/initiators,
etc.) on all supported hardware, including Intel Xeon Phi, ScaleMP vSMP,
and NumaScale NumaConnect.
* Solaris (with support for processor sets and logical domains)
* AIX
* Darwin / OS X
* FreeBSD and its variants (such as kFreeBSD/GNU)
* NetBSD
* HP-UX
* Microsoft Windows
* IBM BlueGene/Q Compute Node Kernel (CNK)
Installation
Since it uses standard Operating System information, hwloc's support is mostly
independant from the processor type (x86, powerpc, ...) and just relies on the
Operating System support. The main exception is BSD operating systems (NetBSD,
FreeBSD, etc.) because they do not provide support topology information, hence
hwloc uses an x86-only CPUID-based backend (which can be used for other OSes
too, see the Components and plugins section).
hwloc (https://www.open-mpi.org/projects/hwloc/) is available under the BSD
license. It is hosted as a sub-project of the overall Open MPI project (https:/
/www.open-mpi.org/). Note that hwloc does not require any functionality from
Open MPI -- it is a wholly separate (and much smaller!) project and code base.
It just happens to be hosted as part of the overall Open MPI project.
To check whether hwloc works on a particular machine, just try to build it and
run lstopo or lstopo-no-graphics. If some things do not look right (e.g. bogus
or missing cache information), see Questions and Bugs.
Basic Installation
hwloc only reports the number of processors on unsupported operating systems;
no topology information is available.
Installation is the fairly common GNU-based process:
For development and debugging purposes, hwloc also offers the ability to work
on "fake" topologies:
shell$ ./configure --prefix=...
shell$ make
shell$ make install
* Symmetrical tree of resources generated from a list of level arities, see
Synthetic topologies.
* Remote machine simulation through the gathering of topology as XML files,
see Importing and exporting topologies from/to XML files.
hwloc- and netloc-specific configure options and requirements are documented in
sections hwloc Installation and Netloc Installation respectively.
hwloc can display the topology in a human-readable format, either in graphical
mode (X11), or by exporting in one of several different formats, including:
plain text, LaTeX tikzpicture, PDF, PNG, and FIG (see Command-line Examples
below). Note that some of the export formats require additional support
libraries.
Also note that if you install supplemental libraries in non-standard locations,
hwloc's configure script may not be able to find them without some help. You
may need to specify additional CPPFLAGS, LDFLAGS, or PKG_CONFIG_PATH values on
the configure command line.
hwloc offers a programming interface for manipulating topologies and objects.
It also brings a powerful CPU bitmap API that is used to describe topology
objects location on physical/logical processors. See the Programming Interface
below. It may also be used to binding applications onto certain cores or memory
nodes. Several utility programs are also provided to ease command-line
manipulation of topology objects, binding of processes, and so on.
For example, if libpciaccess was installed into /opt/pciaccess, hwloc's
configure script may not find it be default. Try adding PKG_CONFIG_PATH to the
./configure command line, like this:
Bindings for several other languages are available from the project website.
./configure PKG_CONFIG_PATH=/opt/pciaccess/lib/pkgconfig ...
Command-line Examples
Running the "lstopo" tool is a good way to check as a graphical output whether
hwloc properly detected the architecture of your node. Netloc command-line
tools can be used to display the network topology interconnecting your nodes.
On a 4-package 2-core machine with hyper-threading, the lstopo tool may show
the following graphical output:
Installing from a Git clone
[dudley]
Additionally, the code can be directly cloned from Git:
Here's the equivalent output in textual form:
shell$ git clone https://github.com/open-mpi/hwloc.git
shell$ cd hwloc
shell$ ./autogen.sh
Machine
NUMANode L#0 (P#0)
Package L#0 + L3 L#0 (4096KB)
L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#8)
L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1
PU L#2 (P#4)
PU L#3 (P#12)
Package L#1 + L3 L#1 (4096KB)
L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2
PU L#4 (P#1)
PU L#5 (P#9)
L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3
PU L#6 (P#5)
PU L#7 (P#13)
Package L#2 + L3 L#2 (4096KB)
L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4
PU L#8 (P#2)
PU L#9 (P#10)
L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5
PU L#10 (P#6)
PU L#11 (P#14)
Package L#3 + L3 L#3 (4096KB)
L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6
PU L#12 (P#3)
PU L#13 (P#11)
L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#15)
Note that GNU Autoconf >=2.63, Automake >=1.11 and Libtool >=2.2.6 are required
when building from a Git clone.
Note that there is also an equivalent output in XML that is meant for exporting
/importing topologies but it is hardly readable to human-beings (see Importing
and exporting topologies from/to XML files for details).
Nightly development snapshots are available on the web site, they can be
configured and built without any need for Git or GNU Autotools.
On a 4-package 2-core Opteron NUMA machine (with two core cores disallowed by
the administrator), the lstopo tool may show the following graphical output
(with --disallowed for displaying disallowed objects):
[hagrid]
Here's the equivalent output in textual form:
Machine (32GB total)
Package L#0
NUMANode L#0 (P#0 8190MB)
L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1)
Package L#1
NUMANode L#1 (P#1 8192MB)
L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3)
Package L#2
NUMANode L#2 (P#2 8192MB)
L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5)
Package L#3
NUMANode L#3 (P#3 8192MB)
L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)
On a 2-package quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each
package):
[emmett]
Here's the same output in textual form:
Machine (total 16GB)
NUMANode L#0 (P#0 16GB)
Package L#0
L2 L#0 (4096KB)
L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4)
L2 L#1 (4096KB)
L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
Package L#1
L2 L#2 (4096KB)
L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
L2 L#3 (4096KB)
L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3)
L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
Programming Interface
The basic interface is available in hwloc.h. Some higher-level functions are
available in hwloc/helper.h to reduce the need to manually manipulate objects
and follow links between them. Documentation for all these is provided later in
this document. Developers may also want to look at hwloc/inlines.h which
contains the actual inline code of some hwloc.h routines, and at this document,
which provides good higher-level topology traversal examples.
To precisely define the vocabulary used by hwloc, a Terms and Definitions
section is available and should probably be read first.
Each hwloc object contains a cpuset describing the list of processing units
that it contains. These bitmaps may be used for CPU binding and Memory binding.
hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h.
Moreover, hwloc also comes with additional helpers for interoperability with
several commonly used environments. See the Interoperability With Other
Software section for details.
The complete API documentation is available in a full set of HTML pages, man
pages, and self-contained PDF files (formatted for both both US letter and A4
formats) in the source tarball in doc/doxygen-doc/.
NOTE: If you are building the documentation from a Git clone, you will need to
have Doxygen and pdflatex installed -- the documentation will be built during
the normal "make" process. The documentation is installed during "make install"
to $prefix/share/doc/hwloc/ and your systems default man page tree (under
$prefix, of course).
Portability
Operating System have varying support for CPU and memory binding, e.g. while
some Operating Systems provide interfaces for all kinds of CPU and memory
bindings, some others provide only interfaces for a limited number of kinds of
CPU and memory binding, and some do not provide any binding interface at all.
Hwloc's binding functions would then simply return the ENOSYS error (Function
not implemented), meaning that the underlying Operating System does not provide
any interface for them. CPU binding and Memory binding provide more information
on which hwloc binding functions should be preferred because interfaces for
them are usually available on the supported Operating Systems.
Similarly, the ability of reporting topology information varies from one
platform to another. As shown in Command-line Examples, hwloc can obtain
information on a wide variety of hardware topologies. However, some platforms
and/or operating system versions will only report a subset of this information.
For example, on an PPC64-based system with 8 cores (each with 2 hardware
threads) running a default 2.6.18-based kernel from RHEL 5.4, hwloc is only
able to glean information about NUMA nodes and processor units (PUs). No
information about caches, packages, or cores is available.
Here's the graphical output from lstopo on this platform when Simultaneous
Multi-Threading (SMT) is enabled:
[ppc64-with]
And here's the graphical output from lstopo on this platform when SMT is
disabled:
[ppc64-with]
Notice that hwloc only sees half the PUs when SMT is disabled. PU L#6, for
example, seems to change location from NUMA node #0 to #1. In reality, no PUs
"moved" -- they were simply re-numbered when hwloc only saw half as many (see
also Logical index in Indexes and Sets). Hence, PU L#6 in the SMT-disabled
picture probably corresponds to PU L#12 in the SMT-enabled picture.
This same "PUs have disappeared" effect can be seen on other platforms -- even
platforms / OSs that provide much more information than the above PPC64 system.
This is an unfortunate side-effect of how operating systems report information
to hwloc.
Note that upgrading the Linux kernel on the same PPC64 system mentioned above
to 2.6.34, hwloc is able to discover all the topology information. The
following picture shows the entire topology layout when SMT is enabled:
[ppc64-full]
Developers using the hwloc API or XML output for portable applications should
therefore be extremely careful to not make any assumptions about the structure
of data that is returned. For example, per the above reported PPC topology, it
is not safe to assume that PUs will always be descendants of cores.
Additionally, future hardware may insert new topology elements that are not
available in this version of hwloc. Long-lived applications that are meant to
span multiple different hardware platforms should also be careful about making
structure assumptions. For example, a new element may someday exist between a
core and a PU.
API Example
The following small C example (available in the source tree as ``doc/examples/
hwloc-hello.c'') prints the topology of the machine and performs some thread
and memory binding. More examples are available in the doc/examples/ directory
of the source tree.
/* Example hwloc API program.
*
* See other examples under doc/examples/ in the source tree
* for more details.
*
* Copyright (c) 2009-2016 Inria. All rights reserved.
* Copyright (c) 2009-2011 Universit?eacute; Bordeaux
* Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
*
* hwloc-hello.c
*/
#include "hwloc.h"
#include <errno.h>
#include <stdio.h>
#include <string.h>
static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
int depth)
{
char type[32], attr[1024];
unsigned i;
hwloc_obj_type_snprintf(type, sizeof(type), obj, 0);
printf("%*s%s", 2*depth, "", type);
if (obj->os_index != (unsigned) -1)
printf("#%u", obj->os_index);
hwloc_obj_attr_snprintf(attr, sizeof(attr), obj, " ", 0);
if (*attr)
printf("(%s)", attr);
printf("\n");
for (i = 0; i < obj->arity; i++) {
print_children(topology, obj->children[i], depth + 1);
}
}
int main(void)
{
int depth;
unsigned i, n;
unsigned long size;
int levels;
char string[128];
int topodepth;
void *m;
hwloc_topology_t topology;
hwloc_cpuset_t cpuset;
hwloc_obj_t obj;
/* Allocate and initialize topology object. */
hwloc_topology_init(&topology);
/* ... Optionally, put detection configuration here to ignore
some objects types, define a synthetic topology, etc....
The default is to detect all the objects of the machine that
the caller is allowed to access. See Configure Topology
Detection. */
/* Perform the topology detection. */
hwloc_topology_load(topology);
/* Optionally, get some additional topology information
in case we need the topology depth later. */
topodepth = hwloc_topology_get_depth(topology);
/*****************************************************************
* First example:
* Walk the topology with an array style, from level 0 (always
* the system level) to the lowest level (always the proc level).
*****************************************************************/
for (depth = 0; depth < topodepth; depth++) {
printf("*** Objects at level %d\n", depth);
for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
i++) {
hwloc_obj_type_snprintf(string, sizeof(string),
hwloc_get_obj_by_depth(topology, depth, i), 0);
printf("Index %u: %s\n", i, string);
}
}
/*****************************************************************
* Second example:
* Walk the topology with a tree style.
*****************************************************************/
printf("*** Printing overall tree\n");
print_children(topology, hwloc_get_root_obj(topology), 0);
/*****************************************************************
* Third example:
* Print the number of packages.
*****************************************************************/
depth = hwloc_get_type_depth(topology, HWLOC_OBJ_PACKAGE);
if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
printf("*** The number of packages is unknown\n");
} else {
printf("*** %u package(s)\n",
hwloc_get_nbobjs_by_depth(topology, depth));
}
/*****************************************************************
* Fourth example:
* Compute the amount of cache that the first logical processor
* has above it.
*****************************************************************/
levels = 0;
size = 0;
for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
obj;
obj = obj->parent)
if (hwloc_obj_type_is_cache(obj->type)) {
levels++;
size += obj->attr->cache.size;
}
printf("*** Logical processor 0 has %d caches totaling %luKB\n",
levels, size / 1024);
/*****************************************************************
* Fifth example:
* Bind to only one thread of the last core of the machine.
*
* First find out where cores are, or else smaller sets of CPUs if
* the OS doesn't have the notion of a "core".
*****************************************************************/
depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);
/* Get last core. */
obj = hwloc_get_obj_by_depth(topology, depth,
hwloc_get_nbobjs_by_depth(topology, depth) - 1);
if (obj) {
/* Get a copy of its cpuset that we may modify. */
cpuset = hwloc_bitmap_dup(obj->cpuset);
/* Get only one logical processor (in case the core is
SMT/hyper-threaded). */
hwloc_bitmap_singlify(cpuset);
/* And try to bind ourself there. */
if (hwloc_set_cpubind(topology, cpuset, 0)) {
char *str;
int error = errno;
hwloc_bitmap_asprintf(&str, obj->cpuset);
printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error));
free(str);
}
/* Free our cpuset copy */
hwloc_bitmap_free(cpuset);
}
/*****************************************************************
* Sixth example:
* Allocate some memory on the last NUMA node, bind some existing
* memory to the last NUMA node.
*****************************************************************/
/* Get last node. There's always at least one. */
n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NUMANODE);
obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, n - 1);
size = 1024*1024;
m = hwloc_alloc_membind(topology, size, obj->nodeset,
HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_BYNODESET);
hwloc_free(topology, m, size);
m = malloc(size);
hwloc_set_area_membind(topology, m, size, obj->nodeset,
HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_BYNODESET);
free(m);
/* Destroy topology object. */
hwloc_topology_destroy(topology);
return 0;
}
hwloc provides a pkg-config executable to obtain relevant compiler and linker
flags. See Compiling software on top of hwloc's C API for details on building
program on top of hwloc's API using GNU Make or CMake.
On a machine 2 processor packages -- each package of which has two processing
cores -- the output from running hwloc-hello could be something like the
following:
shell$ ./hwloc-hello
*** Objects at level 0
Index 0: Machine
*** Objects at level 1
Index 0: Package#0
Index 1: Package#1
*** Objects at level 2
Index 0: Core#0
Index 1: Core#1
Index 2: Core#3
Index 3: Core#2
*** Objects at level 3
Index 0: PU#0
Index 1: PU#1
Index 2: PU#2
Index 3: PU#3
*** Printing overall tree
Machine
Package#0
Core#0
PU#0
Core#1
PU#1
Package#1
Core#3
PU#2
Core#2
PU#3
*** 2 package(s)
*** Logical processor 0 has 0 caches totaling 0KB
shell$
Questions and Bugs
@ -80,6 +468,20 @@ www.open-mpi.org/community/lists/hwloc.php).
There is also a #hwloc IRC channel on Libera Chat (irc.libera.chat).
History / Credits
hwloc is the evolution and merger of the libtopology project and the Portable
Linux Processor Affinity (PLPA) (https://www.open-mpi.org/projects/plpa/)
project. Because of functional and ideological overlap, these two code bases
and ideas were merged and released under the name "hwloc" as an Open MPI
sub-project.
libtopology was initially developed by the Inria Runtime Team-Project. PLPA was
initially developed by the Open MPI development team as a sub-project. Both are
now deprecated in favor of hwloc, which is distributed as an Open MPI
sub-project.
See https://www.open-mpi.org/projects/hwloc/doc/ for more hwloc documentation.
See https://www.open-mpi.org/projects/hwloc/doc/ for more hwloc documentation,
actual links to related pages, images, etc.

View file

@ -8,8 +8,8 @@
# Please update HWLOC_VERSION* in contrib/windows/hwloc_config.h too.
major=2
minor=9
release=0
minor=11
release=2
# greek is used for alpha or beta release tags. If it is non-empty,
# it will be appended to the version number. It does not have to be
@ -22,7 +22,7 @@ greek=
# The date when this release was created
date="Dec 14, 2022"
date="Sep 26, 2024"
# If snapshot=1, then use the value from snapshot_version as the
# entire hwloc version (i.e., ignore major, minor, release, and
@ -41,7 +41,6 @@ snapshot_version=${major}.${minor}.${release}${greek}-git
# 2. Version numbers are described in the Libtool current:revision:age
# format.
libhwloc_so_version=21:1:6
libnetloc_so_version=0:0:0
libhwloc_so_version=23:1:8
# Please also update the <TargetName> lines in contrib/windows/libhwloc.vcxproj

File diff suppressed because it is too large Load diff

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2022 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2012 Université Bordeaux
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -11,10 +11,10 @@
#ifndef HWLOC_CONFIG_H
#define HWLOC_CONFIG_H
#define HWLOC_VERSION "2.9.0"
#define HWLOC_VERSION "2.11.2"
#define HWLOC_VERSION_MAJOR 2
#define HWLOC_VERSION_MINOR 9
#define HWLOC_VERSION_RELEASE 0
#define HWLOC_VERSION_MINOR 11
#define HWLOC_VERSION_RELEASE 2
#define HWLOC_VERSION_GREEK ""
#define __hwloc_restrict

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2022 Inria. All rights reserved.
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2012 Université Bordeaux
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -50,9 +50,10 @@ extern "C" {
* hwloc_bitmap_free(set);
* \endcode
*
* \note Most functions below return an int that may be negative in case of
* error. The usual error case would be an internal failure to realloc/extend
* \note Most functions below return 0 on success and -1 on error.
* The usual error case would be an internal failure to realloc/extend
* the storage of the bitmap (\p errno would be set to \c ENOMEM).
* See also \ref hwlocality_api_error_reporting.
*
* \note Several examples of using the bitmap API are available under the
* doc/examples/ directory in the source tree.
@ -83,7 +84,13 @@ typedef const struct hwloc_bitmap_s * hwloc_const_bitmap_t;
*/
HWLOC_DECLSPEC hwloc_bitmap_t hwloc_bitmap_alloc(void) __hwloc_attribute_malloc;
/** \brief Allocate a new full bitmap. */
/** \brief Allocate a new full bitmap.
*
* \returns A valid bitmap or \c NULL.
*
* The bitmap should be freed by a corresponding call to
* hwloc_bitmap_free().
*/
HWLOC_DECLSPEC hwloc_bitmap_t hwloc_bitmap_alloc_full(void) __hwloc_attribute_malloc;
/** \brief Free bitmap \p bitmap.
@ -119,11 +126,13 @@ HWLOC_DECLSPEC int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buf
/** \brief Stringify a bitmap into a newly allocated string.
*
* \return -1 on error.
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_bitmap_asprintf(char ** strp, hwloc_const_bitmap_t bitmap);
/** \brief Parse a bitmap string and stores it in bitmap \p bitmap.
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_bitmap_sscanf(hwloc_bitmap_t bitmap, const char * __hwloc_restrict string);
@ -144,11 +153,13 @@ HWLOC_DECLSPEC int hwloc_bitmap_list_snprintf(char * __hwloc_restrict buf, size_
/** \brief Stringify a bitmap into a newly allocated list string.
*
* \return -1 on error.
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_bitmap_list_asprintf(char ** strp, hwloc_const_bitmap_t bitmap);
/** \brief Parse a list string and stores it in bitmap \p bitmap.
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_bitmap_list_sscanf(hwloc_bitmap_t bitmap, const char * __hwloc_restrict string);
@ -168,11 +179,13 @@ HWLOC_DECLSPEC int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, si
/** \brief Stringify a bitmap into a newly allocated taskset-specific string.
*
* \return -1 on error.
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_bitmap_taskset_asprintf(char ** strp, hwloc_const_bitmap_t bitmap);
/** \brief Parse a taskset-specific bitmap string and stores it in bitmap \p bitmap.
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_bitmap_taskset_sscanf(hwloc_bitmap_t bitmap, const char * __hwloc_restrict string);
@ -279,6 +292,7 @@ HWLOC_DECLSPEC int hwloc_bitmap_to_ulongs(hwloc_const_bitmap_t bitmap, unsigned
* When called on the output of hwloc_topology_get_topology_cpuset(),
* the returned number is large enough for all cpusets of the topology.
*
* \return the number of unsigned longs required.
* \return -1 if \p bitmap is infinite.
*/
HWLOC_DECLSPEC int hwloc_bitmap_nr_ulongs(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure;
@ -305,21 +319,23 @@ HWLOC_DECLSPEC int hwloc_bitmap_isfull(hwloc_const_bitmap_t bitmap) __hwloc_attr
/** \brief Compute the first index (least significant bit) in bitmap \p bitmap
*
* \return -1 if no index is set in \p bitmap.
* \return the first index set in \p bitmap.
* \return -1 if \p bitmap is empty.
*/
HWLOC_DECLSPEC int hwloc_bitmap_first(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure;
/** \brief Compute the next index in bitmap \p bitmap which is after index \p prev
*
* If \p prev is -1, the first index is returned.
*
* \return the first index set in \p bitmap if \p prev is \c -1.
* \return the next index set in \p bitmap if \p prev is not \c -1.
* \return -1 if no index with higher index is set in \p bitmap.
*/
HWLOC_DECLSPEC int hwloc_bitmap_next(hwloc_const_bitmap_t bitmap, int prev) __hwloc_attribute_pure;
/** \brief Compute the last index (most significant bit) in bitmap \p bitmap
*
* \return -1 if no index is set in \p bitmap, or if \p bitmap is infinitely set.
* \return the last index set in \p bitmap.
* \return -1 if \p bitmap is empty, or if \p bitmap is infinitely set.
*/
HWLOC_DECLSPEC int hwloc_bitmap_last(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure;
@ -327,28 +343,29 @@ HWLOC_DECLSPEC int hwloc_bitmap_last(hwloc_const_bitmap_t bitmap) __hwloc_attrib
* indexes that are in the bitmap).
*
* \return the number of indexes that are in the bitmap.
*
* \return -1 if \p bitmap is infinitely set.
*/
HWLOC_DECLSPEC int hwloc_bitmap_weight(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure;
/** \brief Compute the first unset index (least significant bit) in bitmap \p bitmap
*
* \return -1 if no index is unset in \p bitmap.
* \return the first unset index in \p bitmap.
* \return -1 if \p bitmap is full.
*/
HWLOC_DECLSPEC int hwloc_bitmap_first_unset(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure;
/** \brief Compute the next unset index in bitmap \p bitmap which is after index \p prev
*
* If \p prev is -1, the first unset index is returned.
*
* \return the first index unset in \p bitmap if \p prev is \c -1.
* \return the next index unset in \p bitmap if \p prev is not \c -1.
* \return -1 if no index with higher index is unset in \p bitmap.
*/
HWLOC_DECLSPEC int hwloc_bitmap_next_unset(hwloc_const_bitmap_t bitmap, int prev) __hwloc_attribute_pure;
/** \brief Compute the last unset index (most significant bit) in bitmap \p bitmap
*
* \return -1 if no index is unset in \p bitmap, or if \p bitmap is infinitely set.
* \return the last index unset in \p bitmap.
* \return -1 if \p bitmap is full, or if \p bitmap is not infinitely set.
*/
HWLOC_DECLSPEC int hwloc_bitmap_last_unset(hwloc_const_bitmap_t bitmap) __hwloc_attribute_pure;
@ -428,6 +445,8 @@ HWLOC_DECLSPEC int hwloc_bitmap_not (hwloc_bitmap_t res, hwloc_const_bitmap_t bi
/** \brief Test whether bitmaps \p bitmap1 and \p bitmap2 intersects.
*
* \return 1 if bitmaps intersect, 0 otherwise.
*
* \note The empty bitmap does not intersect any other bitmap.
*/
HWLOC_DECLSPEC int hwloc_bitmap_intersects (hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2) __hwloc_attribute_pure;

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2010-2021 Inria. All rights reserved.
* Copyright © 2010-2023 Inria. All rights reserved.
* Copyright © 2010-2011 Université Bordeaux
* Copyright © 2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -42,6 +42,9 @@ extern "C" {
/** \brief Return the domain, bus and device IDs of the CUDA device \p cudevice.
*
* Device \p cudevice must match the local machine.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_cuda_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unused,
@ -87,6 +90,9 @@ hwloc_cuda_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unused
*
* This function is currently only implemented in a meaningful way for
* Linux; other systems will simply get a full cpuset.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_cuda_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2010-2021 Inria. All rights reserved.
* Copyright © 2010-2023 Inria. All rights reserved.
* Copyright © 2010-2011 Université Bordeaux
* Copyright © 2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -43,6 +43,9 @@ extern "C" {
/** \brief Return the domain, bus and device IDs of the CUDA device whose index is \p idx.
*
* Device index \p idx must match the local machine.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_cudart_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unused,
@ -84,6 +87,9 @@ hwloc_cudart_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unus
*
* This function is currently only implemented in a meaningful way for
* Linux; other systems will simply get a full cpuset.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_cudart_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2013-2020 Inria. All rights reserved.
* Copyright © 2013-2023 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -222,6 +222,8 @@ enum hwloc_topology_diff_apply_flags_e {
HWLOC_DECLSPEC int hwloc_topology_diff_apply(hwloc_topology_t topology, hwloc_topology_diff_t diff, unsigned long flags);
/** \brief Destroy a list of topology differences.
*
* \return 0.
*/
HWLOC_DECLSPEC int hwloc_topology_diff_destroy(hwloc_topology_diff_t diff);
@ -233,6 +235,8 @@ HWLOC_DECLSPEC int hwloc_topology_diff_destroy(hwloc_topology_diff_t diff);
* This identifier is usually the name of the other XML file
* that contains the reference topology.
*
* \return 0 on success, -1 on error.
*
* \note the pointer returned in refname should later be freed
* by the caller.
*/
@ -246,10 +250,17 @@ HWLOC_DECLSPEC int hwloc_topology_diff_load_xml(const char *xmlpath, hwloc_topol
* This identifier is usually the name of the other XML file
* that contains the reference topology.
* This attribute is given back when reading the diff from XML.
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_topology_diff_export_xml(hwloc_topology_diff_t diff, const char *refname, const char *xmlpath);
/** \brief Load a list of topology differences from a XML buffer.
*
* Build a list of differences from the XML memory buffer given
* at \p xmlbuffer and of length \p buflen (including an ending \0).
* This buffer may have been filled earlier with
* hwloc_topology_diff_export_xmlbuffer().
*
* If not \c NULL, \p refname will be filled with the identifier
* string of the reference topology for the difference file,
@ -257,6 +268,8 @@ HWLOC_DECLSPEC int hwloc_topology_diff_export_xml(hwloc_topology_diff_t diff, co
* This identifier is usually the name of the other XML file
* that contains the reference topology.
*
* \return 0 on success, -1 on error.
*
* \note the pointer returned in refname should later be freed
* by the caller.
*/
@ -274,6 +287,8 @@ HWLOC_DECLSPEC int hwloc_topology_diff_load_xmlbuffer(const char *xmlbuffer, int
* The returned buffer ends with a \0 that is included in the returned
* length.
*
* \return 0 on success, -1 on error.
*
* \note The XML buffer should later be freed with hwloc_free_xmlbuffer().
*/
HWLOC_DECLSPEC int hwloc_topology_diff_export_xmlbuffer(hwloc_topology_diff_t diff, const char *refname, char **xmlbuffer, int *buflen);

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2010-2022 Inria. All rights reserved.
* Copyright © 2010-2024 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -28,18 +28,18 @@ extern "C" {
/** \brief Matrix of distances between a set of objects.
*
* This matrix often contains latencies between NUMA nodes
* The most common matrix contains latencies between NUMA nodes
* (as reported in the System Locality Distance Information Table (SLIT)
* in the ACPI specification), which may or may not be physically accurate.
* It corresponds to the latency for accessing the memory of one node
* from a core in another node.
* The corresponding kind is ::HWLOC_DISTANCES_KIND_FROM_OS | ::HWLOC_DISTANCES_KIND_FROM_USER.
* The corresponding kind is ::HWLOC_DISTANCES_KIND_MEANS_LATENCY | ::HWLOC_DISTANCES_KIND_FROM_USER.
* The name of this distances structure is "NUMALatency".
* Others distance structures include and "XGMIBandwidth", "XGMIHops",
* "XeLinkBandwidth" and "NVLinkBandwidth".
*
* The matrix may also contain bandwidths between random sets of objects,
* possibly provided by the user, as specified in the \p kind attribute.
* Others common distance structures include and "XGMIBandwidth", "XGMIHops",
* "XeLinkBandwidth" and "NVLinkBandwidth".
*
* Pointers \p objs and \p values should not be replaced, reallocated, freed, etc.
* However callers are allowed to modify \p kind as well as the contents
@ -70,11 +70,10 @@ struct hwloc_distances_s {
* The \p kind attribute of struct hwloc_distances_s is a OR'ed set
* of kinds.
*
* A kind of format HWLOC_DISTANCES_KIND_FROM_* specifies where the
* distance information comes from, if known.
*
* A kind of format HWLOC_DISTANCES_KIND_MEANS_* specifies whether
* values are latencies or bandwidths, if applicable.
* Each distance matrix may have only one kind among HWLOC_DISTANCES_KIND_FROM_*
* specifying where distance information comes from,
* and one kind among HWLOC_DISTANCES_KIND_MEANS_* specifying
* whether values are latencies or bandwidths.
*/
enum hwloc_distances_kind_e {
/** \brief These distances were obtained from the operating system or hardware.
@ -131,6 +130,8 @@ enum hwloc_distances_kind_e {
*
* Each distance matrix returned in the \p distances array should be released
* by the caller using hwloc_distances_release().
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int
hwloc_distances_get(hwloc_topology_t topology,
@ -140,6 +141,8 @@ hwloc_distances_get(hwloc_topology_t topology,
/** \brief Retrieve distance matrices for object at a specific depth in the topology.
*
* Identical to hwloc_distances_get() with the additional \p depth filter.
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int
hwloc_distances_get_by_depth(hwloc_topology_t topology, int depth,
@ -149,6 +152,8 @@ hwloc_distances_get_by_depth(hwloc_topology_t topology, int depth,
/** \brief Retrieve distance matrices for object of a specific type.
*
* Identical to hwloc_distances_get() with the additional \p type filter.
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int
hwloc_distances_get_by_type(hwloc_topology_t topology, hwloc_obj_type_t type,
@ -162,6 +167,8 @@ hwloc_distances_get_by_type(hwloc_topology_t topology, hwloc_obj_type_t type,
* The name of the most common structure is "NUMALatency".
* Others include "XGMIBandwidth", "XGMIHops", "XeLinkBandwidth",
* and "NVLinkBandwidth".
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int
hwloc_distances_get_by_name(hwloc_topology_t topology, const char *name,
@ -171,7 +178,12 @@ hwloc_distances_get_by_name(hwloc_topology_t topology, const char *name,
/** \brief Get a description of what a distances structure contains.
*
* For instance "NUMALatency" for hardware-provided NUMA distances (ACPI SLIT),
* or NULL if unknown.
* or \c NULL if unknown.
*
* \return the constant string with the name of the distance structure.
*
* \note The returned name should not be freed by the caller,
* it belongs to the hwloc library.
*/
HWLOC_DECLSPEC const char *
hwloc_distances_get_name(hwloc_topology_t topology, struct hwloc_distances_s *distances);
@ -252,6 +264,8 @@ enum hwloc_distances_transform_e {
*
* \p flags must be \c 0 for now.
*
* \return 0 on success, -1 on error for instance if flags are invalid.
*
* \note Objects in distances array \p objs may be directly modified
* in place without using hwloc_distances_transform().
* One may use hwloc_get_obj_with_same_locality() to easily convert
@ -272,6 +286,7 @@ HWLOC_DECLSPEC int hwloc_distances_transform(hwloc_topology_t topology, struct h
/** \brief Find the index of an object in a distances structure.
*
* \return the index of the object in the distances structure if any.
* \return -1 if object \p obj is not involved in structure \p distances.
*/
static __hwloc_inline int
@ -289,6 +304,7 @@ hwloc_distances_obj_index(struct hwloc_distances_s *distances, hwloc_obj_t obj)
* The distance from \p obj1 to \p obj2 is stored in the value pointed by
* \p value1to2 and reciprocally.
*
* \return 0 on success.
* \return -1 if object \p obj1 or \p obj2 is not involved in structure \p distances.
*/
static __hwloc_inline int
@ -340,6 +356,8 @@ typedef void * hwloc_distances_add_handle_t;
* Otherwise, it will be copied internally and may later be freed by the caller.
*
* \p kind specifies the kind of distance as a OR'ed set of ::hwloc_distances_kind_e.
* Only one kind of meaning and one kind of provenance may be given if appropriate
* (e.g. ::HWLOC_DISTANCES_KIND_MEANS_BANDWIDTH and ::HWLOC_DISTANCES_KIND_FROM_USER).
* Kind ::HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES will be automatically set
* according to objects having different types in hwloc_distances_add_values().
*
@ -374,8 +392,8 @@ hwloc_distances_add_create(hwloc_topology_t topology,
*
* \p flags must be \c 0 for now.
*
* \return \c 0 on success.
* \return \c -1 on error.
* \return 0 on success.
* \return -1 on error.
*/
HWLOC_DECLSPEC int hwloc_distances_add_values(hwloc_topology_t topology,
hwloc_distances_add_handle_t handle,
@ -386,7 +404,8 @@ HWLOC_DECLSPEC int hwloc_distances_add_values(hwloc_topology_t topology,
/** \brief Flags for adding a new distances to a topology. */
enum hwloc_distances_add_flag_e {
/** \brief Try to group objects based on the newly provided distance information.
* This is ignored for distances between objects of different types.
* Grouping is only performed when the distances structure contains latencies,
* and when all objects are of the same type.
* \hideinitializer
*/
HWLOC_DISTANCES_ADD_FLAG_GROUP = (1UL<<0),
@ -411,8 +430,8 @@ enum hwloc_distances_add_flag_e {
*
* On error, the temporary distances structure and its content are destroyed.
*
* \return \c 0 on success.
* \return \c -1 on error.
* \return 0 on success.
* \return -1 on error.
*/
HWLOC_DECLSPEC int hwloc_distances_add_commit(hwloc_topology_t topology,
hwloc_distances_add_handle_t handle,
@ -433,18 +452,24 @@ HWLOC_DECLSPEC int hwloc_distances_add_commit(hwloc_topology_t topology,
*
* If these distances were used to group objects, these additional
* Group objects are not removed from the topology.
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_distances_remove(hwloc_topology_t topology);
/** \brief Remove distance matrices for objects at a specific depth in the topology.
*
* Identical to hwloc_distances_remove() but only applies to one level of the topology.
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_distances_remove_by_depth(hwloc_topology_t topology, int depth);
/** \brief Remove distance matrices for objects of a specific type in the topology.
*
* Identical to hwloc_distances_remove() but only applies to one level of the topology.
*
* \return 0 on success, -1 on error.
*/
static __hwloc_inline int
hwloc_distances_remove_by_type(hwloc_topology_t topology, hwloc_obj_type_t type)
@ -458,6 +483,8 @@ hwloc_distances_remove_by_type(hwloc_topology_t topology, hwloc_obj_type_t type)
/** \brief Release and remove the given distance matrice from the topology.
*
* This function includes a call to hwloc_distances_release().
*
* \return 0 on success, -1 on error.
*/
HWLOC_DECLSPEC int hwloc_distances_release_remove(hwloc_topology_t topology, struct hwloc_distances_s *distances);

View file

@ -55,7 +55,7 @@ enum hwloc_topology_export_xml_flags_e {
*
* \p flags is a OR'ed set of ::hwloc_topology_export_xml_flags_e.
*
* \return -1 if a failure occured.
* \return 0 on success, or -1 on error.
*
* \note See also hwloc_topology_set_userdata_export_callback()
* for exporting application-specific object userdata.
@ -91,7 +91,7 @@ HWLOC_DECLSPEC int hwloc_topology_export_xml(hwloc_topology_t topology, const ch
*
* \p flags is a OR'ed set of ::hwloc_topology_export_xml_flags_e.
*
* \return -1 if a failure occured.
* \return 0 on success, or -1 on error.
*
* \note See also hwloc_topology_set_userdata_export_callback()
* for exporting application-specific object userdata.
@ -145,13 +145,15 @@ HWLOC_DECLSPEC void hwloc_topology_set_userdata_export_callback(hwloc_topology_t
* that were given to the export callback.
*
* Only printable characters may be exported to XML string attributes.
* If a non-printable character is passed in \p name or \p buffer,
* the function returns -1 with errno set to EINVAL.
*
* If exporting binary data, the application should first encode into
* printable characters only (or use hwloc_export_obj_userdata_base64()).
* It should also take care of portability issues if the export may
* be reimported on a different architecture.
*
* \return 0 on success.
* \return -1 with errno set to \c EINVAL if a non-printable character is
* passed in \p name or \b buffer.
*/
HWLOC_DECLSPEC int hwloc_export_obj_userdata(void *reserved, hwloc_topology_t topology, hwloc_obj_t obj, const char *name, const void *buffer, size_t length);
@ -165,8 +167,14 @@ HWLOC_DECLSPEC int hwloc_export_obj_userdata(void *reserved, hwloc_topology_t to
* This function may only be called from within the export() callback passed
* to hwloc_topology_set_userdata_export_callback().
*
* The name must be made of printable characters for export to XML string attributes.
*
* The function does not take care of portability issues if the export
* may be reimported on a different architecture.
*
* \return 0 on success.
* \return -1 with errno set to \c EINVAL if a non-printable character is
* passed in \p name.
*/
HWLOC_DECLSPEC int hwloc_export_obj_userdata_base64(void *reserved, hwloc_topology_t topology, hwloc_obj_t obj, const char *name, const void *buffer, size_t length);

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2012 Blue Brain Project, EPFL. All rights reserved.
* Copyright © 2012-2021 Inria. All rights reserved.
* Copyright © 2012-2023 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -102,7 +102,8 @@ hwloc_gl_get_display_osdev_by_name(hwloc_topology_t topology,
* Retrieves the OpenGL display port (server) in \p port and device (screen)
* in \p screen that correspond to the given hwloc OS device object.
*
* \return \c -1 if none could be found.
* \return 0 on success.
* \return -1 if none could be found.
*
* The topology \p topology does not necessarily have to match the current
* machine. For instance the topology may be an XML import of a remote host.

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2020 Inria. All rights reserved.
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2011 Université Bordeaux
* Copyright © 2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -52,6 +52,8 @@ extern "C" {
* that takes a cpu_set_t as input parameter.
*
* \p schedsetsize should be sizeof(cpu_set_t) unless \p schedset was dynamically allocated with CPU_ALLOC
*
* \return 0.
*/
static __hwloc_inline int
hwloc_cpuset_to_glibc_sched_affinity(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t hwlocset,
@ -80,6 +82,9 @@ hwloc_cpuset_to_glibc_sched_affinity(hwloc_topology_t topology __hwloc_attribute
* that takes a cpu_set_t as input parameter.
*
* \p schedsetsize should be sizeof(cpu_set_t) unless \p schedset was dynamically allocated with CPU_ALLOC
*
* \return 0 on success.
* \return -1 with errno set to \c ENOMEM if some internal reallocation failed.
*/
static __hwloc_inline int
hwloc_cpuset_from_glibc_sched_affinity(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_cpuset_t hwlocset,
@ -95,7 +100,8 @@ hwloc_cpuset_from_glibc_sched_affinity(hwloc_topology_t topology __hwloc_attribu
cpu = 0;
while (count) {
if (CPU_ISSET_S(cpu, schedsetsize, schedset)) {
hwloc_bitmap_set(hwlocset, cpu);
if (hwloc_bitmap_set(hwlocset, cpu) < 0)
return -1;
count--;
}
cpu++;
@ -107,7 +113,8 @@ hwloc_cpuset_from_glibc_sched_affinity(hwloc_topology_t topology __hwloc_attribu
assert(schedsetsize == sizeof(cpu_set_t));
for(cpu=0; cpu<CPU_SETSIZE; cpu++)
if (CPU_ISSET(cpu, schedset))
hwloc_bitmap_set(hwlocset, cpu);
if (hwloc_bitmap_set(hwlocset, cpu) < 0)
return -1;
#endif /* !CPU_ZERO_S */
return 0;
}

File diff suppressed because it is too large Load diff

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2021 Inria. All rights reserved.
* Copyright © 2021-2023 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -44,8 +44,9 @@ extern "C" {
* the Level Zero device \p device.
*
* Topology \p topology and device \p device must match the local machine.
* The Level Zero must have been initialized with Sysman enabled
* (ZES_ENABLE_SYSMAN=1 in the environment).
* The Level Zero library must have been initialized with Sysman enabled
* (by calling zesInit(0) if supported,
* or by setting ZES_ENABLE_SYSMAN=1 in the environment).
* I/O devices detection and the Level Zero component are not needed in the
* topology.
*
@ -55,6 +56,9 @@ extern "C" {
*
* This function is currently only implemented in a meaningful way for
* Linux; other systems will simply get a full cpuset.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_levelzero_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2017 Inria. All rights reserved.
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2010, 2012 Université Bordeaux
* See COPYING in top-level directory.
*/
@ -50,6 +50,8 @@ extern "C" {
* This function may be used before calling set_mempolicy, mbind, migrate_pages
* or any other function that takes an array of unsigned long and a maximal
* node number as input parameter.
*
* \return 0.
*/
static __hwloc_inline int
hwloc_cpuset_to_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset,
@ -84,6 +86,8 @@ hwloc_cpuset_to_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_const_cpus
* This function may be used before calling set_mempolicy, mbind, migrate_pages
* or any other function that takes an array of unsigned long and a maximal
* node number as input parameter.
*
* \return 0.
*/
static __hwloc_inline int
hwloc_nodeset_to_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset,
@ -119,6 +123,9 @@ hwloc_nodeset_to_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_const_nod
* This function may be used after calling get_mempolicy or any other function
* that takes an array of unsigned long as output parameter (and possibly
* a maximal node number as input parameter).
*
* \return 0 on success.
* \return -1 on error, for instance if failing an internal reallocation.
*/
static __hwloc_inline int
hwloc_cpuset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_cpuset_t cpuset,
@ -130,7 +137,8 @@ hwloc_cpuset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_cpuset_t
while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL)
if (node->os_index < maxnode
&& (mask[node->os_index/sizeof(*mask)/8] & (1UL << (node->os_index % (sizeof(*mask)*8)))))
hwloc_bitmap_or(cpuset, cpuset, node->cpuset);
if (hwloc_bitmap_or(cpuset, cpuset, node->cpuset) < 0)
return -1;
return 0;
}
@ -142,6 +150,9 @@ hwloc_cpuset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_cpuset_t
* This function may be used after calling get_mempolicy or any other function
* that takes an array of unsigned long as output parameter (and possibly
* a maximal node number as input parameter).
*
* \return 0 on success.
* \return -1 with errno set to \c ENOMEM if some internal reallocation failed.
*/
static __hwloc_inline int
hwloc_nodeset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_nodeset_t nodeset,
@ -153,7 +164,8 @@ hwloc_nodeset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_nodeset
while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL)
if (node->os_index < maxnode
&& (mask[node->os_index/sizeof(*mask)/8] & (1UL << (node->os_index % (sizeof(*mask)*8)))))
hwloc_bitmap_set(nodeset, node->os_index);
if (hwloc_bitmap_set(nodeset, node->os_index) < 0)
return -1;
return 0;
}
@ -184,7 +196,7 @@ hwloc_nodeset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_nodeset
* This function may be used before calling many numa_ functions
* that use a struct bitmask as an input parameter.
*
* \return newly allocated struct bitmask.
* \return newly allocated struct bitmask, or \c NULL on error.
*/
static __hwloc_inline struct bitmask *
hwloc_cpuset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset) __hwloc_attribute_malloc;
@ -209,7 +221,7 @@ hwloc_cpuset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_cpu
* This function may be used before calling many numa_ functions
* that use a struct bitmask as an input parameter.
*
* \return newly allocated struct bitmask.
* \return newly allocated struct bitmask, or \c NULL on error.
*/
static __hwloc_inline struct bitmask *
hwloc_nodeset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset) __hwloc_attribute_malloc;
@ -231,6 +243,9 @@ hwloc_nodeset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_no
*
* This function may be used after calling many numa_ functions
* that use a struct bitmask as an output parameter.
*
* \return 0 on success.
* \return -1 with errno set to \c ENOMEM if some internal reallocation failed.
*/
static __hwloc_inline int
hwloc_cpuset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_cpuset_t cpuset,
@ -241,7 +256,8 @@ hwloc_cpuset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_cpuset_
hwloc_bitmap_zero(cpuset);
while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL)
if (numa_bitmask_isbitset(bitmask, node->os_index))
hwloc_bitmap_or(cpuset, cpuset, node->cpuset);
if (hwloc_bitmap_or(cpuset, cpuset, node->cpuset) < 0)
return -1;
return 0;
}
@ -249,6 +265,9 @@ hwloc_cpuset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_cpuset_
*
* This function may be used after calling many numa_ functions
* that use a struct bitmask as an output parameter.
*
* \return 0 on success.
* \return -1 with errno set to \c ENOMEM if some internal reallocation failed.
*/
static __hwloc_inline int
hwloc_nodeset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_nodeset_t nodeset,
@ -259,7 +278,8 @@ hwloc_nodeset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_nodese
hwloc_bitmap_zero(nodeset);
while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL)
if (numa_bitmask_isbitset(bitmask, node->os_index))
hwloc_bitmap_set(nodeset, node->os_index);
if (hwloc_bitmap_set(nodeset, node->os_index) < 0)
return -1;
return 0;
}

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2021 Inria. All rights reserved.
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2011 Université Bordeaux
* See COPYING in top-level directory.
*/
@ -38,6 +38,8 @@ extern "C" {
* The behavior is exactly the same as the Linux sched_setaffinity system call,
* but uses a hwloc cpuset.
*
* \return 0 on success, -1 on error.
*
* \note This is equivalent to calling hwloc_set_proc_cpubind() with
* HWLOC_CPUBIND_THREAD as flags.
*/
@ -52,6 +54,8 @@ HWLOC_DECLSPEC int hwloc_linux_set_tid_cpubind(hwloc_topology_t topology, pid_t
* The behavior is exactly the same as the Linux sched_getaffinity system call,
* but uses a hwloc cpuset.
*
* \return 0 on success, -1 on error.
*
* \note This is equivalent to calling hwloc_get_proc_cpubind() with
* ::HWLOC_CPUBIND_THREAD as flags.
*/
@ -62,6 +66,8 @@ HWLOC_DECLSPEC int hwloc_linux_get_tid_cpubind(hwloc_topology_t topology, pid_t
* The CPU-set \p set (previously allocated by the caller)
* is filled with the PU which the thread last ran on.
*
* \return 0 on success, -1 on error.
*
* \note This is equivalent to calling hwloc_get_proc_last_cpu_location() with
* ::HWLOC_CPUBIND_THREAD as flags.
*/
@ -72,6 +78,8 @@ HWLOC_DECLSPEC int hwloc_linux_get_tid_last_cpu_location(hwloc_topology_t topolo
* Might be used when reading CPU set from sysfs attributes such as topology
* and caches for processors, or local_cpus for devices.
*
* \return 0 on success, -1 on error.
*
* \note This function ignores the HWLOC_FSROOT environment variable.
*/
HWLOC_DECLSPEC int hwloc_linux_read_path_as_cpumask(const char *path, hwloc_bitmap_t set);

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2019-2022 Inria. All rights reserved.
* Copyright © 2019-2024 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -54,6 +54,10 @@ extern "C" {
* Attribute values for these nodes, if any, may then be obtained with
* hwloc_memattr_get_value() and manually compared with the desired criteria.
*
* Memory attributes are also used internally to build Memory Tiers which provide
* an easy way to distinguish NUMA nodes of different kinds, as explained
* in \ref heteromem.
*
* \sa An example is available in doc/examples/memory-attributes.c in the source tree.
*
* \note The API also supports specific objects as initiator,
@ -65,7 +69,10 @@ extern "C" {
* @{
*/
/** \brief Memory node attributes. */
/** \brief Predefined memory attribute IDs.
* See ::hwloc_memattr_id_t for the generic definition of IDs
* for predefined or custom attributes.
*/
enum hwloc_memattr_id_e {
/** \brief
* The \"Capacity\" is returned in bytes (local_memory attribute in objects).
@ -74,6 +81,8 @@ enum hwloc_memattr_id_e {
*
* No initiator is involved when looking at this attribute.
* The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST.
*
* Capacity values may not be modified using hwloc_memattr_set_value().
* \hideinitializer
*/
HWLOC_MEMATTR_ID_CAPACITY = 0,
@ -89,6 +98,8 @@ enum hwloc_memattr_id_e {
*
* No initiator is involved when looking at this attribute.
* The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST.
* Locality values may not be modified using hwloc_memattr_set_value().
* \hideinitializer
*/
HWLOC_MEMATTR_ID_LOCALITY = 1,
@ -169,15 +180,26 @@ enum hwloc_memattr_id_e {
/* TODO persistence? */
HWLOC_MEMATTR_ID_MAX /**< \private Sentinel value */
HWLOC_MEMATTR_ID_MAX /**< \private
* Sentinel value for predefined attributes.
* Dynamically registered custom attributes start here.
*/
};
/** \brief A memory attribute identifier.
* May be either one of ::hwloc_memattr_id_e or a new id returned by hwloc_memattr_register().
*
* hwloc predefines some commonly-used attributes in ::hwloc_memattr_id_e.
* One may then dynamically register custom ones with hwloc_memattr_register(),
* they will be assigned IDs immediately after the predefined ones.
* See \ref hwlocality_memattrs_manage for more information about
* existing attribute IDs.
*/
typedef unsigned hwloc_memattr_id_t;
/** \brief Return the identifier of the memory attribute with the given name.
*
* \return 0 on success.
* \return -1 with errno set to \c EINVAL if no such attribute exists.
*/
HWLOC_DECLSPEC int
hwloc_memattr_get_by_name(hwloc_topology_t topology,
@ -247,6 +269,8 @@ enum hwloc_local_numanode_flag_e {
* or the number of nodes that would have been stored if there were
* enough room.
*
* \return 0 on success or -1 on error.
*
* \note Some of these NUMA nodes may not have any memory attribute
* values and hence not be reported as actual targets in other functions.
*
@ -274,8 +298,16 @@ hwloc_get_local_numanode_objs(hwloc_topology_t topology,
* (it does not have the flag ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR),
* location \p initiator is ignored and may be \c NULL.
*
* \p target_node cannot be \c NULL. If \p attribute is ::HWLOC_MEMATTR_ID_CAPACITY,
* \p target_node must be a NUMA node. If it is ::HWLOC_MEMATTR_ID_LOCALITY,
* \p target_node must have a CPU set.
*
* \p flags must be \c 0 for now.
*
* \return 0 on success.
* \return -1 on error, for instance with errno set to \c EINVAL if flags
* are invalid or no such attribute exists.
*
* \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
* when refering to accesses performed by CPU cores.
* ::HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc,
@ -307,7 +339,10 @@ hwloc_memattr_get_value(hwloc_topology_t topology,
*
* \p flags must be \c 0 for now.
*
* If there are no matching targets, \c -1 is returned with \p errno set to \c ENOENT;
* \return 0 on success.
* \return -1 with errno set to \c ENOENT if there are no matching targets.
* \return -1 with errno set to \c EINVAL if flags are invalid,
* or no such attribute exists.
*
* \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
* when refering to accesses performed by CPU cores.
@ -323,10 +358,6 @@ hwloc_memattr_get_best_target(hwloc_topology_t topology,
hwloc_obj_t *best_target, hwloc_uint64_t *value);
/** \brief Return the best initiator for the given attribute and target NUMA node.
*
* If the attribute does not relate to a specific initiator
* (it does not have the flag ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR),
* \c -1 is returned and \p errno is set to \c EINVAL.
*
* If \p value is non \c NULL, the corresponding value is returned there.
*
@ -340,96 +371,22 @@ hwloc_memattr_get_best_target(hwloc_topology_t topology,
* The returned initiator should not be modified or freed,
* it belongs to the topology.
*
* \p target_node cannot be \c NULL.
*
* \p flags must be \c 0 for now.
*
* If there are no matching initiators, \c -1 is returned with \p errno set to \c ENOENT;
* \return 0 on success.
* \return -1 with errno set to \c ENOENT if there are no matching initiators.
* \return -1 with errno set to \c EINVAL if the attribute does not relate to a specific initiator
* (it does not have the flag ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR).
*/
HWLOC_DECLSPEC int
hwloc_memattr_get_best_initiator(hwloc_topology_t topology,
hwloc_memattr_id_t attribute,
hwloc_obj_t target,
hwloc_obj_t target_node,
unsigned long flags,
struct hwloc_location *best_initiator, hwloc_uint64_t *value);
/** @} */
/** \defgroup hwlocality_memattrs_manage Managing memory attributes
* @{
*/
/** \brief Return the name of a memory attribute.
*/
HWLOC_DECLSPEC int
hwloc_memattr_get_name(hwloc_topology_t topology,
hwloc_memattr_id_t attribute,
const char **name);
/** \brief Return the flags of the given attribute.
*
* Flags are a OR'ed set of ::hwloc_memattr_flag_e.
*/
HWLOC_DECLSPEC int
hwloc_memattr_get_flags(hwloc_topology_t topology,
hwloc_memattr_id_t attribute,
unsigned long *flags);
/** \brief Memory attribute flags.
* Given to hwloc_memattr_register() and returned by hwloc_memattr_get_flags().
*/
enum hwloc_memattr_flag_e {
/** \brief The best nodes for this memory attribute are those with the higher values.
* For instance Bandwidth.
*/
HWLOC_MEMATTR_FLAG_HIGHER_FIRST = (1UL<<0),
/** \brief The best nodes for this memory attribute are those with the lower values.
* For instance Latency.
*/
HWLOC_MEMATTR_FLAG_LOWER_FIRST = (1UL<<1),
/** \brief The value returned for this memory attribute depends on the given initiator.
* For instance Bandwidth and Latency, but not Capacity.
*/
HWLOC_MEMATTR_FLAG_NEED_INITIATOR = (1UL<<2)
};
/** \brief Register a new memory attribute.
*
* Add a specific memory attribute that is not defined in ::hwloc_memattr_id_e.
* Flags are a OR'ed set of ::hwloc_memattr_flag_e. It must contain at least
* one of ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST or ::HWLOC_MEMATTR_FLAG_LOWER_FIRST.
*/
HWLOC_DECLSPEC int
hwloc_memattr_register(hwloc_topology_t topology,
const char *name,
unsigned long flags,
hwloc_memattr_id_t *id);
/** \brief Set an attribute value for a specific target NUMA node.
*
* If the attribute does not relate to a specific initiator
* (it does not have the flag ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR),
* location \p initiator is ignored and may be \c NULL.
*
* The initiator will be copied into the topology,
* the caller should free anything allocated to store the initiator,
* for instance the cpuset.
*
* \p flags must be \c 0 for now.
*
* \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
* when referring to accesses performed by CPU cores.
* ::HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc,
* but users may for instance use it to provide custom information about
* host memory accesses performed by GPUs.
*/
HWLOC_DECLSPEC int
hwloc_memattr_set_value(hwloc_topology_t topology,
hwloc_memattr_id_t attribute,
hwloc_obj_t target_node,
struct hwloc_location *initiator,
unsigned long flags,
hwloc_uint64_t value);
/** \brief Return the target NUMA nodes that have some values for a given attribute.
*
* Return targets for the given attribute in the \p targets array
@ -460,6 +417,8 @@ hwloc_memattr_set_value(hwloc_topology_t topology,
* NUMA nodes with hwloc_get_local_numanode_objs() and then look at their attribute
* values.
*
* \return 0 on success or -1 on error.
*
* \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
* when referring to accesses performed by CPU cores.
* ::HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc,
@ -491,12 +450,16 @@ hwloc_memattr_get_targets(hwloc_topology_t topology,
* The returned initiators should not be modified or freed,
* they belong to the topology.
*
* \p target_node cannot be \c NULL.
*
* \p flags must be \c 0 for now.
*
* If the attribute does not relate to a specific initiator
* (it does not have the flag ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR),
* no initiator is returned.
*
* \return 0 on success or -1 on error.
*
* \note This function is meant for tools and debugging (listing internal information)
* rather than for application queries. Applications should rather select useful
* NUMA nodes with hwloc_get_local_numanode_objs() and then look at their attribute
@ -508,6 +471,131 @@ hwloc_memattr_get_initiators(hwloc_topology_t topology,
hwloc_obj_t target_node,
unsigned long flags,
unsigned *nr, struct hwloc_location *initiators, hwloc_uint64_t *values);
/** @} */
/** \defgroup hwlocality_memattrs_manage Managing memory attributes
*
* Memory attribues are identified by an ID (::hwloc_memattr_id_t)
* and a name. hwloc_memattr_get_name() and hwloc_memattr_get_by_name()
* convert between them (or return error if the attribute does not exist).
*
* The set of valid ::hwloc_memattr_id_t is a contigous set starting at \c 0.
* It first contains predefined attributes, as listed
* in ::hwloc_memattr_id_e (from \c 0 to \c HWLOC_MEMATTR_ID_MAX-1).
* Then custom attributes may be dynamically registered with
* hwloc_memattr_register(). They will get the following IDs
* (\c HWLOC_MEMATTR_ID_MAX for the first one, etc.).
*
* To iterate over all valid attributes
* (either predefined or dynamically registered custom ones),
* one may iterate over IDs starting from \c 0 until hwloc_memattr_get_name()
* or hwloc_memattr_get_flags() returns an error.
*
* The values for an existing attribute or for custom dynamically registered ones
* may be set or modified with hwloc_memattr_set_value().
*
* @{
*/
/** \brief Return the name of a memory attribute.
*
* The output pointer \p name cannot be \c NULL.
*
* \return 0 on success.
* \return -1 with errno set to \c EINVAL if the attribute does not exist.
*/
HWLOC_DECLSPEC int
hwloc_memattr_get_name(hwloc_topology_t topology,
hwloc_memattr_id_t attribute,
const char **name);
/** \brief Return the flags of the given attribute.
*
* Flags are a OR'ed set of ::hwloc_memattr_flag_e.
*
* The output pointer \p flags cannot be \c NULL.
*
* \return 0 on success.
* \return -1 with errno set to \c EINVAL if the attribute does not exist.
*/
HWLOC_DECLSPEC int
hwloc_memattr_get_flags(hwloc_topology_t topology,
hwloc_memattr_id_t attribute,
unsigned long *flags);
/** \brief Memory attribute flags.
* Given to hwloc_memattr_register() and returned by hwloc_memattr_get_flags().
*/
enum hwloc_memattr_flag_e {
/** \brief The best nodes for this memory attribute are those with the higher values.
* For instance Bandwidth.
*/
HWLOC_MEMATTR_FLAG_HIGHER_FIRST = (1UL<<0),
/** \brief The best nodes for this memory attribute are those with the lower values.
* For instance Latency.
*/
HWLOC_MEMATTR_FLAG_LOWER_FIRST = (1UL<<1),
/** \brief The value returned for this memory attribute depends on the given initiator.
* For instance Bandwidth and Latency, but not Capacity.
*/
HWLOC_MEMATTR_FLAG_NEED_INITIATOR = (1UL<<2)
};
/** \brief Register a new memory attribute.
*
* Add a new custom memory attribute.
* Flags are a OR'ed set of ::hwloc_memattr_flag_e. It must contain one of
* ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST or ::HWLOC_MEMATTR_FLAG_LOWER_FIRST but not both.
*
* The new attribute \p id is immediately after the last existing attribute ID
* (which is either the ID of the last registered attribute if any,
* or the ID of the last predefined attribute in ::hwloc_memattr_id_e).
*
* \return 0 on success.
* \return -1 with errno set to \c EINVAL if an invalid set of flags is given.
* \return -1 with errno set to \c EBUSY if another attribute already uses this name.
*/
HWLOC_DECLSPEC int
hwloc_memattr_register(hwloc_topology_t topology,
const char *name,
unsigned long flags,
hwloc_memattr_id_t *id);
/** \brief Set an attribute value for a specific target NUMA node.
*
* If the attribute does not relate to a specific initiator
* (it does not have the flag ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR),
* location \p initiator is ignored and may be \c NULL.
*
* The initiator will be copied into the topology,
* the caller should free anything allocated to store the initiator,
* for instance the cpuset.
*
* \p target_node cannot be \c NULL.
*
* \p attribute cannot be ::HWLOC_MEMATTR_FLAG_ID_CAPACITY or
* ::HWLOC_MEMATTR_FLAG_ID_LOCALITY.
*
* \p flags must be \c 0 for now.
*
* \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
* when referring to accesses performed by CPU cores.
* ::HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc,
* but users may for instance use it to provide custom information about
* host memory accesses performed by GPUs.
*
* \return 0 on success or -1 on error.
*/
HWLOC_DECLSPEC int
hwloc_memattr_set_value(hwloc_topology_t topology,
hwloc_memattr_id_t attribute,
hwloc_obj_t target_node,
struct hwloc_location *initiator,
unsigned long flags,
hwloc_uint64_t value);
/** @} */
#ifdef __cplusplus

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2012-2021 Inria. All rights reserved.
* Copyright © 2012-2023 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -51,6 +51,9 @@ extern "C" {
*
* This function is currently only implemented in a meaningful way for
* Linux; other systems will simply get a full cpuset.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_nvml_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2012-2021 Inria. All rights reserved.
* Copyright © 2012-2023 Inria. All rights reserved.
* Copyright © 2013, 2018 Université Bordeaux. All right reserved.
* See COPYING in top-level directory.
*/
@ -41,6 +41,15 @@ extern "C" {
*/
/* Copyright (c) 2008-2018 The Khronos Group Inc. */
/* needs "cl_khr_pci_bus_info" device extension, but not strictly required for clGetDeviceInfo() */
typedef struct {
cl_uint pci_domain;
cl_uint pci_bus;
cl_uint pci_device;
cl_uint pci_function;
} hwloc_cl_device_pci_bus_info_khr;
#define HWLOC_CL_DEVICE_PCI_BUS_INFO_KHR 0x410F
/* needs "cl_amd_device_attribute_query" device extension, but not strictly required for clGetDeviceInfo() */
#define HWLOC_CL_DEVICE_TOPOLOGY_AMD 0x4037
typedef union {
@ -69,15 +78,28 @@ typedef union {
/** \brief Return the domain, bus and device IDs of the OpenCL device \p device.
*
* Device \p device must match the local machine.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_opencl_get_device_pci_busid(cl_device_id device,
unsigned *domain, unsigned *bus, unsigned *dev, unsigned *func)
{
hwloc_cl_device_topology_amd amdtopo;
hwloc_cl_device_pci_bus_info_khr khrbusinfo;
cl_uint nvbus, nvslot, nvdomain;
cl_int clret;
clret = clGetDeviceInfo(device, HWLOC_CL_DEVICE_PCI_BUS_INFO_KHR, sizeof(khrbusinfo), &khrbusinfo, NULL);
if (CL_SUCCESS == clret) {
*domain = (unsigned) khrbusinfo.pci_domain;
*bus = (unsigned) khrbusinfo.pci_bus;
*dev = (unsigned) khrbusinfo.pci_device;
*func = (unsigned) khrbusinfo.pci_function;
return 0;
}
clret = clGetDeviceInfo(device, HWLOC_CL_DEVICE_TOPOLOGY_AMD, sizeof(amdtopo), &amdtopo, NULL);
if (CL_SUCCESS == clret
&& HWLOC_CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD == amdtopo.raw.type) {
@ -126,6 +148,9 @@ hwloc_opencl_get_device_pci_busid(cl_device_id device,
* This function is currently only implemented in a meaningful way for
* Linux with the AMD or NVIDIA OpenCL implementation; other systems will simply
* get a full cpuset.
*
* \return 0 on success.
* \return -1 on error, for instance if the device could not be found.
*/
static __hwloc_inline int
hwloc_opencl_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2021 Inria. All rights reserved.
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2010 Université Bordeaux
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -57,6 +57,9 @@ extern "C" {
*
* This function is currently only implemented in a meaningful way for
* Linux; other systems will simply get a full cpuset.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_ibv_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2013-2022 Inria. All rights reserved.
* Copyright © 2013-2024 Inria. All rights reserved.
* Copyright © 2016 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
*/
@ -164,7 +164,7 @@ struct hwloc_disc_status {
*/
unsigned excluded_phases;
/** \brief OR'ed set of hwloc_disc_status_flag_e */
/** \brief OR'ed set of ::hwloc_disc_status_flag_e */
unsigned long flags;
};
@ -645,6 +645,19 @@ HWLOC_DECLSPEC struct hwloc_obj * hwloc_pci_find_parent_by_busid(struct hwloc_to
*/
HWLOC_DECLSPEC struct hwloc_obj * hwloc_pci_find_by_busid(struct hwloc_topology *topology, unsigned domain, unsigned bus, unsigned dev, unsigned func);
/** @} */
/** \defgroup hwlocality_components_distances Components and Plugins: distances
*
* \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
*
* @{
*/
/** \brief Handle to a new distances structure during its addition to the topology. */
typedef void * hwloc_backend_distances_add_handle_t;

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
* Copyright © 2010-2022 Inria. All rights reserved.
* Copyright © 2010-2024 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -176,6 +176,7 @@ extern "C" {
#define hwloc_topology_insert_misc_object HWLOC_NAME(topology_insert_misc_object)
#define hwloc_topology_alloc_group_object HWLOC_NAME(topology_alloc_group_object)
#define hwloc_topology_free_group_object HWLOC_NAME(topology_free_group_object)
#define hwloc_topology_insert_group_object HWLOC_NAME(topology_insert_group_object)
#define hwloc_obj_add_other_obj_sets HWLOC_NAME(obj_add_other_obj_sets)
#define hwloc_topology_refresh HWLOC_NAME(topology_refresh)
@ -209,6 +210,7 @@ extern "C" {
#define hwloc_obj_get_info_by_name HWLOC_NAME(obj_get_info_by_name)
#define hwloc_obj_add_info HWLOC_NAME(obj_add_info)
#define hwloc_obj_set_subtype HWLOC_NAME(obj_set_subtype)
#define HWLOC_CPUBIND_PROCESS HWLOC_NAME_CAPS(CPUBIND_PROCESS)
#define HWLOC_CPUBIND_THREAD HWLOC_NAME_CAPS(CPUBIND_THREAD)
@ -231,6 +233,7 @@ extern "C" {
#define HWLOC_MEMBIND_FIRSTTOUCH HWLOC_NAME_CAPS(MEMBIND_FIRSTTOUCH)
#define HWLOC_MEMBIND_BIND HWLOC_NAME_CAPS(MEMBIND_BIND)
#define HWLOC_MEMBIND_INTERLEAVE HWLOC_NAME_CAPS(MEMBIND_INTERLEAVE)
#define HWLOC_MEMBIND_WEIGHTED_INTERLEAVE HWLOC_NAME_CAPS(MEMBIND_WEIGHTED_INTERLEAVE)
#define HWLOC_MEMBIND_NEXTTOUCH HWLOC_NAME_CAPS(MEMBIND_NEXTTOUCH)
#define HWLOC_MEMBIND_MIXED HWLOC_NAME_CAPS(MEMBIND_MIXED)
@ -559,6 +562,7 @@ extern "C" {
/* opencl.h */
#define hwloc_cl_device_pci_bus_info_khr HWLOC_NAME(cl_device_pci_bus_info_khr)
#define hwloc_cl_device_topology_amd HWLOC_NAME(cl_device_topology_amd)
#define hwloc_opencl_get_device_pci_busid HWLOC_NAME(opencl_get_device_pci_ids)
#define hwloc_opencl_get_device_cpuset HWLOC_NAME(opencl_get_device_cpuset)
@ -714,6 +718,8 @@ extern "C" {
#define hwloc__obj_type_is_dcache HWLOC_NAME(_obj_type_is_dcache)
#define hwloc__obj_type_is_icache HWLOC_NAME(_obj_type_is_icache)
#define hwloc__pci_link_speed HWLOC_NAME(_pci_link_speed)
/* private/cpuid-x86.h */
#define hwloc_have_x86_cpuid HWLOC_NAME(have_x86_cpuid)

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2012-2021 Inria. All rights reserved.
* Copyright © 2012-2023 Inria. All rights reserved.
* Copyright (c) 2020, Advanced Micro Devices, Inc. All rights reserved.
* Written by Advanced Micro Devices,
* See COPYING in top-level directory.
@ -55,6 +55,9 @@ extern "C" {
*
* This function is currently only implemented in a meaningful way for
* Linux; other systems will simply get a full cpuset.
*
* \return 0 on success.
* \return -1 on error, for instance if device information could not be found.
*/
static __hwloc_inline int
hwloc_rsmi_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2013-2018 Inria. All rights reserved.
* Copyright © 2013-2023 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -48,6 +48,8 @@ extern "C" {
* This length (in bytes) must be used in hwloc_shmem_topology_write()
* and hwloc_shmem_topology_adopt() later.
*
* \return the length, or -1 on error, for instance if flags are invalid.
*
* \note Flags \p flags are currently unused, must be 0.
*/
HWLOC_DECLSPEC int hwloc_shmem_topology_get_length(hwloc_topology_t topology,
@ -74,9 +76,10 @@ HWLOC_DECLSPEC int hwloc_shmem_topology_get_length(hwloc_topology_t topology,
* is not. However the caller may also allocate it manually in shared memory
* to share it as well.
*
* \return -1 with errno set to EBUSY if the virtual memory mapping defined
* \return 0 on success.
* \return -1 with errno set to \c EBUSY if the virtual memory mapping defined
* by \p mmap_address and \p length isn't available in the process.
* \return -1 with errno set to EINVAL if \p fileoffset, \p mmap_address
* \return -1 with errno set to \c EINVAL if \p fileoffset, \p mmap_address
* or \p length aren't page-aligned.
*/
HWLOC_DECLSPEC int hwloc_shmem_topology_write(hwloc_topology_t topology,
@ -112,14 +115,16 @@ HWLOC_DECLSPEC int hwloc_shmem_topology_write(hwloc_topology_t topology,
*
* \note This function takes care of calling hwloc_topology_abi_check().
*
* \return -1 with errno set to EBUSY if the virtual memory mapping defined
* \return 0 on success.
*
* \return -1 with errno set to \c EBUSY if the virtual memory mapping defined
* by \p mmap_address and \p length isn't available in the process.
*
* \return -1 with errno set to EINVAL if \p fileoffset, \p mmap_address
* \return -1 with errno set to \c EINVAL if \p fileoffset, \p mmap_address
* or \p length aren't page-aligned, or do not match what was given to
* hwloc_shmem_topology_write() earlier.
*
* \return -1 with errno set to EINVAL if the layout of the topology structure
* \return -1 with errno set to \c EINVAL if the layout of the topology structure
* is different between the writer process and the adopter process.
*/
HWLOC_DECLSPEC int hwloc_shmem_topology_adopt(hwloc_topology_t *topologyp,

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009, 2011, 2012 CNRS. All rights reserved.
* Copyright © 2009-2021 Inria. All rights reserved.
* Copyright © 2009-2020 Inria. All rights reserved.
* Copyright © 2009, 2011, 2012, 2015 Université Bordeaux. All rights reserved.
* Copyright © 2009-2020 Cisco Systems, Inc. All rights reserved.
* $COPYRIGHT$
@ -17,6 +17,10 @@
#define HWLOC_HAVE_MSVC_CPUIDEX 1
/* #undef HAVE_MKSTEMP */
#define HWLOC_HAVE_X86_CPUID 1
/* Define to 1 if the system has the type `CACHE_DESCRIPTOR'. */
#define HAVE_CACHE_DESCRIPTOR 0
@ -128,8 +132,7 @@
#define HAVE_DECL__SC_PAGE_SIZE 0
/* Define to 1 if you have the <dirent.h> header file. */
/* #define HAVE_DIRENT_H 1 */
#undef HAVE_DIRENT_H
/* #undef HAVE_DIRENT_H */
/* Define to 1 if you have the <dlfcn.h> header file. */
/* #undef HAVE_DLFCN_H */
@ -282,7 +285,7 @@
#define HAVE_STRING_H 1
/* Define to 1 if you have the `strncasecmp' function. */
#define HAVE_STRNCASECMP 1
/* #undef HAVE_STRNCASECMP */
/* Define to '1' if sysctl is present and usable */
/* #undef HAVE_SYSCTL */
@ -323,8 +326,7 @@
/* #undef HAVE_UNAME */
/* Define to 1 if you have the <unistd.h> header file. */
/* #define HAVE_UNISTD_H 1 */
#undef HAVE_UNISTD_H
/* #undef HAVE_UNISTD_H */
/* Define to 1 if you have the `uselocale' function. */
/* #undef HAVE_USELOCALE */
@ -659,7 +661,7 @@
#define hwloc_pid_t HANDLE
/* Define this to either strncasecmp or strncmp */
#define hwloc_strncasecmp strncasecmp
/* #undef hwloc_strncasecmp */
/* Define this to the thread ID type */
#define hwloc_thread_t HANDLE

View file

@ -11,6 +11,22 @@
#ifndef HWLOC_PRIVATE_CPUID_X86_H
#define HWLOC_PRIVATE_CPUID_X86_H
/* A macro for annotating memory as uninitialized when building with MSAN
* (and otherwise having no effect). See below for why this is used with
* our custom assembly.
*/
#ifdef __has_feature
#define HWLOC_HAS_FEATURE(name) __has_feature(name)
#else
#define HWLOC_HAS_FEATURE(name) 0
#endif
#if HWLOC_HAS_FEATURE(memory_sanitizer) || defined(MEMORY_SANITIZER)
#include <sanitizer/msan_interface.h>
#define HWLOC_ANNOTATE_MEMORY_IS_INITIALIZED(ptr, len) __msan_unpoison(ptr, len)
#else
#define HWLOC_ANNOTATE_MEMORY_IS_INITIALIZED(ptr, len)
#endif
#if (defined HWLOC_X86_32_ARCH) && (!defined HWLOC_HAVE_MSVC_CPUIDEX)
static __hwloc_inline int hwloc_have_x86_cpuid(void)
{
@ -71,12 +87,18 @@ static __hwloc_inline void hwloc_x86_cpuid(unsigned *eax, unsigned *ebx, unsigne
"movl %k2,%1\n\t"
: "+a" (*eax), "=m" (*ebx), "=&r"(sav_rbx),
"+c" (*ecx), "=&d" (*edx));
/* MSAN does not recognize the effect of the above assembly on the memory operand
* (`"=m"(*ebx)`). This may get improved in MSAN at some point in the future, e.g.
* see https://github.com/llvm/llvm-project/pull/77393. */
HWLOC_ANNOTATE_MEMORY_IS_INITIALIZED(ebx, sizeof *ebx);
#elif defined(HWLOC_X86_32_ARCH)
__asm__(
"mov %%ebx,%1\n\t"
"cpuid\n\t"
"xchg %%ebx,%1\n\t"
: "+a" (*eax), "=&SD" (*ebx), "+c" (*ecx), "=&d" (*edx));
/* See above. */
HWLOC_ANNOTATE_MEMORY_IS_INITIALIZED(ebx, sizeof *ebx);
#else
#error unknown architecture
#endif

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2019 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2012 Université Bordeaux
* Copyright © 2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -573,4 +573,35 @@ typedef SSIZE_T ssize_t;
# endif
#endif
static __inline float
hwloc__pci_link_speed(unsigned generation, unsigned lanes)
{
float lanespeed;
/*
* These are single-direction bandwidths only.
*
* Gen1 used NRZ with 8/10 encoding.
* PCIe Gen1 = 2.5GT/s signal-rate per lane x 8/10 = 0.25GB/s data-rate per lane
* PCIe Gen2 = 5 GT/s signal-rate per lane x 8/10 = 0.5 GB/s data-rate per lane
* Gen3 switched to NRZ with 128/130 encoding.
* PCIe Gen3 = 8 GT/s signal-rate per lane x 128/130 = 1 GB/s data-rate per lane
* PCIe Gen4 = 16 GT/s signal-rate per lane x 128/130 = 2 GB/s data-rate per lane
* PCIe Gen5 = 32 GT/s signal-rate per lane x 128/130 = 4 GB/s data-rate per lane
* Gen6 switched to PAM with with 242/256 FLIT (242B payload protected by 8B CRC + 6B FEC).
* PCIe Gen6 = 64 GT/s signal-rate per lane x 242/256 = 8 GB/s data-rate per lane
* PCIe Gen7 = 128GT/s signal-rate per lane x 242/256 = 16 GB/s data-rate per lane
*/
/* lanespeed in Gbit/s */
if (generation <= 2)
lanespeed = 2.5f * generation * 0.8f;
else if (generation <= 5)
lanespeed = 8.0f * (1<<(generation-3)) * 128/130;
else
lanespeed = 8.0f * (1<<(generation-3)) * 242/256; /* assume Gen8 will be 256 GT/s and so on */
/* linkspeed in GB/s */
return lanespeed * lanes / 8;
}
#endif /* HWLOC_PRIVATE_MISC_H */

View file

@ -1,578 +0,0 @@
/*
* Copyright © 2014 Cisco Systems, Inc. All rights reserved.
* Copyright © 2013-2014 University of Wisconsin-La Crosse.
* All rights reserved.
* Copyright © 2015-2017 Inria. All rights reserved.
*
* $COPYRIGHT$
*
* Additional copyrights may follow
* See COPYING in top-level directory.
*
* $HEADER$
*/
#ifndef _NETLOC_PRIVATE_H_
#define _NETLOC_PRIVATE_H_
#include <hwloc.h>
#include <netloc.h>
#include <netloc/uthash.h>
#include <netloc/utarray.h>
#include <private/autogen/config.h>
#define NETLOCFILE_VERSION 1
#ifdef NETLOC_SCOTCH
#include <stdint.h>
#include <scotch.h>
#define NETLOC_int SCOTCH_Num
#else
#define NETLOC_int int
#endif
/*
* "Import" a few things from hwloc
*/
#define __netloc_attribute_unused __hwloc_attribute_unused
#define __netloc_attribute_malloc __hwloc_attribute_malloc
#define __netloc_attribute_const __hwloc_attribute_const
#define __netloc_attribute_pure __hwloc_attribute_pure
#define __netloc_attribute_deprecated __hwloc_attribute_deprecated
#define __netloc_attribute_may_alias __hwloc_attribute_may_alias
#define NETLOC_DECLSPEC HWLOC_DECLSPEC
/**********************************************************************
* Types
**********************************************************************/
/**
* Definitions for Comparators
* \sa These are the return values from the following functions:
* netloc_network_compare, netloc_dt_edge_t_compare, netloc_dt_node_t_compare
*/
typedef enum {
NETLOC_CMP_SAME = 0, /**< Compared as the Same */
NETLOC_CMP_SIMILAR = -1, /**< Compared as Similar, but not the Same */
NETLOC_CMP_DIFF = -2 /**< Compared as Different */
} netloc_compare_type_t;
/**
* Enumerated type for the various types of supported networks
*/
typedef enum {
NETLOC_NETWORK_TYPE_ETHERNET = 1, /**< Ethernet network */
NETLOC_NETWORK_TYPE_INFINIBAND = 2, /**< InfiniBand network */
NETLOC_NETWORK_TYPE_INVALID = 3 /**< Invalid network */
} netloc_network_type_t;
/**
* Enumerated type for the various types of supported topologies
*/
typedef enum {
NETLOC_TOPOLOGY_TYPE_INVALID = -1, /**< Invalid */
NETLOC_TOPOLOGY_TYPE_TREE = 1, /**< Tree */
} netloc_topology_type_t;
/**
* Enumerated type for the various types of nodes
*/
typedef enum {
NETLOC_NODE_TYPE_HOST = 0, /**< Host (a.k.a., network addressable endpoint - e.g., MAC Address) node */
NETLOC_NODE_TYPE_SWITCH = 1, /**< Switch node */
NETLOC_NODE_TYPE_INVALID = 2 /**< Invalid node */
} netloc_node_type_t;
typedef enum {
NETLOC_ARCH_TREE = 0, /* Fat tree */
} netloc_arch_type_t;
/* Pre declarations to avoid inter dependency problems */
/** \cond IGNORE */
struct netloc_topology_t;
typedef struct netloc_topology_t netloc_topology_t;
struct netloc_node_t;
typedef struct netloc_node_t netloc_node_t;
struct netloc_edge_t;
typedef struct netloc_edge_t netloc_edge_t;
struct netloc_physical_link_t;
typedef struct netloc_physical_link_t netloc_physical_link_t;
struct netloc_path_t;
typedef struct netloc_path_t netloc_path_t;
struct netloc_arch_tree_t;
typedef struct netloc_arch_tree_t netloc_arch_tree_t;
struct netloc_arch_node_t;
typedef struct netloc_arch_node_t netloc_arch_node_t;
struct netloc_arch_node_slot_t;
typedef struct netloc_arch_node_slot_t netloc_arch_node_slot_t;
struct netloc_arch_t;
typedef struct netloc_arch_t netloc_arch_t;
/** \endcond */
/**
* \struct netloc_topology_t
* \brief Netloc Topology Context
*
* An opaque data structure used to reference a network topology.
*
* \note Must be initialized with \ref netloc_topology_construct()
*/
struct netloc_topology_t {
/** Topology path */
char *topopath;
/** Subnet ID */
char *subnet_id;
/** Node List */
netloc_node_t *nodes; /* Hash table of nodes by physical_id */
netloc_node_t *nodesByHostname; /* Hash table of nodes by hostname */
netloc_physical_link_t *physical_links; /* Hash table with physcial links */
/** Partition List */
UT_array *partitions;
/** Hwloc topology List */
char *hwlocpath;
UT_array *topos;
hwloc_topology_t *hwloc_topos;
/** Type of the graph */
netloc_topology_type_t type;
};
/**
* \brief Netloc Node Type
*
* Represents the concept of a node (a.k.a., vertex, endpoint) within a network
* graph. This could be a server or a network switch. The \ref node_type parameter
* will distinguish the exact type of node this represents in the graph.
*/
struct netloc_node_t {
UT_hash_handle hh; /* makes this structure hashable with physical_id */
UT_hash_handle hh2; /* makes this structure hashable with hostname */
/** Physical ID of the node */
char physical_id[20];
/** Logical ID of the node (if any) */
int logical_id;
/** Type of the node */
netloc_node_type_t type;
/* Pointer to physical_links */
UT_array *physical_links;
/** Description information from discovery (if any) */
char *description;
/**
* Application-given private data pointer.
* Initialized to NULL, and not used by the netloc library.
*/
void * userdata;
/** Outgoing edges from this node */
netloc_edge_t *edges;
UT_array *subnodes; /* the group of nodes for the virtual nodes */
netloc_path_t *paths;
char *hostname;
UT_array *partitions; /* index in the list from the topology */
hwloc_topology_t hwlocTopo;
int hwlocTopoIdx;
};
/**
* \brief Netloc Edge Type
*
* Represents the concept of a directed edge within a network graph.
*
* \note We do not point to the netloc_node_t structure directly to
* simplify the representation, and allow the information to more easily
* be entered into the data store without circular references.
* \todo JJH Is the note above still true?
*/
struct netloc_edge_t {
UT_hash_handle hh; /* makes this structure hashable */
netloc_node_t *dest;
int id;
/** Pointers to the parent node */
netloc_node_t *node;
/* Pointer to physical_links */
UT_array *physical_links;
/** total gbits of the links */
float total_gbits;
UT_array *partitions; /* index in the list from the topology */
UT_array *subnode_edges; /* for edges going to virtual nodes */
struct netloc_edge_t *other_way;
/**
* Application-given private data pointer.
* Initialized to NULL, and not used by the netloc library.
*/
void * userdata;
};
struct netloc_physical_link_t {
UT_hash_handle hh; /* makes this structure hashable */
int id; // TODO long long
netloc_node_t *src;
netloc_node_t *dest;
int ports[2];
char *width;
char *speed;
netloc_edge_t *edge;
int other_way_id;
struct netloc_physical_link_t *other_way;
UT_array *partitions; /* index in the list from the topology */
/** gbits of the link from speed and width */
float gbits;
/** Description information from discovery (if any) */
char *description;
};
struct netloc_path_t {
UT_hash_handle hh; /* makes this structure hashable */
char dest_id[20];
UT_array *links;
};
/**********************************************************************
* Architecture structures
**********************************************************************/
struct netloc_arch_tree_t {
NETLOC_int num_levels;
NETLOC_int *degrees;
NETLOC_int *cost;
};
struct netloc_arch_node_t {
UT_hash_handle hh; /* makes this structure hashable */
char *name; /* Hash key */
netloc_node_t *node; /* Corresponding node */
int idx_in_topo; /* idx with ghost hosts to have complete topo */
int num_slots; /* it is not the real number of slots but the maximum slot idx */
int *slot_idx; /* corresponding idx in slot_tree */
int *slot_os_idx; /* corresponding os index for each leaf in tree */
netloc_arch_tree_t *slot_tree; /* Tree built from hwloc */
int num_current_slots; /* Number of PUs */
NETLOC_int *current_slots; /* indices in the complete tree */
int *slot_ranks; /* corresponding MPI rank for each leaf in tree */
};
struct netloc_arch_node_slot_t {
netloc_arch_node_t *node;
int slot;
};
struct netloc_arch_t {
netloc_topology_t *topology;
int has_slots; /* if slots are included in the architecture */
netloc_arch_type_t type;
union {
netloc_arch_tree_t *node_tree;
netloc_arch_tree_t *global_tree;
} arch;
netloc_arch_node_t *nodes_by_name;
netloc_arch_node_slot_t *node_slot_by_idx; /* node_slot by index in complete topo */
NETLOC_int num_current_hosts; /* if has_slots, host is a slot, else host is a node */
NETLOC_int *current_hosts; /* indices in the complete topology */
};
/**********************************************************************
* Topology Functions
**********************************************************************/
/**
* Allocate a topology handle.
*
* User is responsible for calling \ref netloc_detach on the topology handle.
* The network parameter information is deep copied into the topology handle, so the
* user may destruct the network handle after calling this function and/or reuse
* the network handle.
*
* \returns NETLOC_SUCCESS on success
* \returns NETLOC_ERROR upon an error.
*/
netloc_topology_t *netloc_topology_construct(char *path);
/**
* Destruct a topology handle
*
* \param topology A valid pointer to a \ref netloc_topology_t handle created
* from a prior call to \ref netloc_topology_construct.
*
* \returns NETLOC_SUCCESS on success
* \returns NETLOC_ERROR upon an error.
*/
int netloc_topology_destruct(netloc_topology_t *topology);
int netloc_topology_find_partition_idx(netloc_topology_t *topology, char *partition_name);
int netloc_topology_read_hwloc(netloc_topology_t *topology, int num_nodes,
netloc_node_t **node_list);
#define netloc_topology_iter_partitions(topology,partition) \
for ((partition) = (char **)utarray_front(topology->partitions); \
(partition) != NULL; \
(partition) = (char **)utarray_next(topology->partitions, partition))
#define netloc_topology_iter_hwloctopos(topology,hwloctopo) \
for ((hwloctopo) = (char **)utarray_front(topology->topos); \
(hwloctopo) != NULL; \
(hwloctopo) = (char **)utarray_next(topology->topos, hwloctopo))
#define netloc_topology_find_node(topology,node_id,node) \
HASH_FIND_STR(topology->nodes, node_id, node)
#define netloc_topology_iter_nodes(topology,node,_tmp) \
HASH_ITER(hh, topology->nodes, node, _tmp)
#define netloc_topology_num_nodes(topology) \
HASH_COUNT(topology->nodes)
/*************************************************/
/**
* Constructor for netloc_node_t
*
* User is responsible for calling the destructor on the handle.
*
* Returns
* A newly allocated pointer to the network information.
*/
netloc_node_t *netloc_node_construct(void);
/**
* Destructor for netloc_node_t
*
* \param node A valid node handle
*
* Returns
* NETLOC_SUCCESS on success
* NETLOC_ERROR on error
*/
int netloc_node_destruct(netloc_node_t *node);
char *netloc_node_pretty_print(netloc_node_t* node);
#define netloc_node_get_num_subnodes(node) \
utarray_len((node)->subnodes)
#define netloc_node_get_subnode(node,i) \
(*(netloc_node_t **)utarray_eltptr((node)->subnodes, (i)))
#define netloc_node_get_num_edges(node) \
utarray_len((node)->edges)
#define netloc_node_get_edge(node,i) \
(*(netloc_edge_t **)utarray_eltptr((node)->edges, (i)))
#define netloc_node_iter_edges(node,edge,_tmp) \
HASH_ITER(hh, node->edges, edge, _tmp)
#define netloc_node_iter_paths(node,path,_tmp) \
HASH_ITER(hh, node->paths, path, _tmp)
#define netloc_node_is_host(node) \
(node->type == NETLOC_NODE_TYPE_HOST)
#define netloc_node_is_switch(node) \
(node->type == NETLOC_NODE_TYPE_SWITCH)
#define netloc_node_iter_paths(node, path,_tmp) \
HASH_ITER(hh, node->paths, path, _tmp)
int netloc_node_is_in_partition(netloc_node_t *node, int partition);
/*************************************************/
/**
* Constructor for netloc_edge_t
*
* User is responsible for calling the destructor on the handle.
*
* Returns
* A newly allocated pointer to the edge information.
*/
netloc_edge_t *netloc_edge_construct(void);
/**
* Destructor for netloc_edge_t
*
* \param edge A valid edge handle
*
* Returns
* NETLOC_SUCCESS on success
* NETLOC_ERROR on error
*/
int netloc_edge_destruct(netloc_edge_t *edge);
char * netloc_edge_pretty_print(netloc_edge_t* edge);
void netloc_edge_reset_uid(void);
int netloc_edge_is_in_partition(netloc_edge_t *edge, int partition);
#define netloc_edge_get_num_links(edge) \
utarray_len((edge)->physical_links)
#define netloc_edge_get_link(edge,i) \
(*(netloc_physical_link_t **)utarray_eltptr((edge)->physical_links, (i)))
#define netloc_edge_get_num_subedges(edge) \
utarray_len((edge)->subnode_edges)
#define netloc_edge_get_subedge(edge,i) \
(*(netloc_edge_t **)utarray_eltptr((edge)->subnode_edges, (i)))
/*************************************************/
/**
* Constructor for netloc_physical_link_t
*
* User is responsible for calling the destructor on the handle.
*
* Returns
* A newly allocated pointer to the physical link information.
*/
netloc_physical_link_t * netloc_physical_link_construct(void);
/**
* Destructor for netloc_physical_link_t
*
* Returns
* NETLOC_SUCCESS on success
* NETLOC_ERROR on error
*/
int netloc_physical_link_destruct(netloc_physical_link_t *link);
char * netloc_link_pretty_print(netloc_physical_link_t* link);
/*************************************************/
netloc_path_t *netloc_path_construct(void);
int netloc_path_destruct(netloc_path_t *path);
/**********************************************************************
* Architecture functions
**********************************************************************/
netloc_arch_t * netloc_arch_construct(void);
int netloc_arch_destruct(netloc_arch_t *arch);
int netloc_arch_build(netloc_arch_t *arch, int add_slots);
int netloc_arch_set_current_resources(netloc_arch_t *arch);
int netloc_arch_set_global_resources(netloc_arch_t *arch);
int netloc_arch_node_get_hwloc_info(netloc_arch_node_t *arch);
void netloc_arch_tree_complete(netloc_arch_tree_t *tree, UT_array **down_degrees_by_level,
int num_hosts, int **parch_idx);
NETLOC_int netloc_arch_tree_num_leaves(netloc_arch_tree_t *tree);
/**********************************************************************
* Access functions of various elements of the topology
**********************************************************************/
#define netloc_get_num_partitions(object) \
utarray_len((object)->partitions)
#define netloc_get_partition(object,i) \
(*(int *)utarray_eltptr((object)->partitions, (i)))
#define netloc_path_iter_links(path,link) \
for ((link) = (netloc_physical_link_t **)utarray_front(path->links); \
(link) != NULL; \
(link) = (netloc_physical_link_t **)utarray_next(path->links, link))
/**********************************************************************
* Misc functions
**********************************************************************/
/**
* Decode the network type
*
* \param net_type A valid member of the \ref netloc_network_type_t type
*
* \returns NULL if the type is invalid
* \returns A string for that \ref netloc_network_type_t type
*/
static inline const char * netloc_network_type_decode(netloc_network_type_t net_type) {
if( NETLOC_NETWORK_TYPE_ETHERNET == net_type ) {
return "ETH";
}
else if( NETLOC_NETWORK_TYPE_INFINIBAND == net_type ) {
return "IB";
}
else {
return NULL;
}
}
/**
* Decode the node type
*
* \param node_type A valid member of the \ref netloc_node_type_t type
*
* \returns NULL if the type is invalid
* \returns A string for that \ref netloc_node_type_t type
*/
static inline const char * netloc_node_type_decode(netloc_node_type_t node_type) {
if( NETLOC_NODE_TYPE_SWITCH == node_type ) {
return "SW";
}
else if( NETLOC_NODE_TYPE_HOST == node_type ) {
return "CA";
}
else {
return NULL;
}
}
ssize_t netloc_line_get(char **lineptr, size_t *n, FILE *stream);
char *netloc_line_get_next_token(char **string, char c);
int netloc_build_comm_mat(char *filename, int *pn, double ***pmat);
#define STRDUP_IF_NOT_NULL(str) (NULL == str ? NULL : strdup(str))
#define STR_EMPTY_IF_NULL(str) (NULL == str ? "" : str)
#endif // _NETLOC_PRIVATE_H_

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2022 Inria. All rights reserved.
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2012, 2020 Université Bordeaux
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
*
@ -245,6 +245,12 @@ struct hwloc_topology {
* temporary variables during discovery
*/
/* set to 1 at the beginning of load() if the filter of any cpu cache type (L1 to L3i) is not NONE,
* may be checked by backends before querying caches
* (when they don't know the level of caches they are querying).
*/
int want_some_cpu_caches;
/* machine-wide memory.
* temporarily stored there by OSes that only provide this without NUMA information,
* and actually used later by the core.
@ -420,7 +426,7 @@ extern void hwloc_internal_memattrs_need_refresh(hwloc_topology_t topology);
extern void hwloc_internal_memattrs_refresh(hwloc_topology_t topology);
extern int hwloc_internal_memattrs_dup(hwloc_topology_t new, hwloc_topology_t old);
extern int hwloc_internal_memattr_set_value(hwloc_topology_t topology, hwloc_memattr_id_t id, hwloc_obj_type_t target_type, hwloc_uint64_t target_gp_index, unsigned target_os_index, struct hwloc_internal_location_s *initiator, hwloc_uint64_t value);
extern int hwloc_internal_memattrs_guess_memory_tiers(hwloc_topology_t topology);
extern int hwloc_internal_memattrs_guess_memory_tiers(hwloc_topology_t topology, int force_subtype);
extern void hwloc_internal_cpukinds_init(hwloc_topology_t topology);
extern int hwloc_internal_cpukinds_rank(hwloc_topology_t topology);
@ -477,6 +483,7 @@ extern char * hwloc_progname(struct hwloc_topology *topology);
#define HWLOC_GROUP_KIND_INTEL_DIE 104 /* no subkind */
#define HWLOC_GROUP_KIND_S390_BOOK 110 /* subkind 0 is book, subkind 1 is drawer (group of books) */
#define HWLOC_GROUP_KIND_AMD_COMPUTE_UNIT 120 /* no subkind */
#define HWLOC_GROUP_KIND_AMD_COMPLEX 121 /* no subkind */
/* then, OS-specific groups */
#define HWLOC_GROUP_KIND_SOLARIS_PG_HW_PERF 200 /* subkind is group width */
#define HWLOC_GROUP_KIND_AIX_SDL_UNKNOWN 210 /* subkind is SDL level */

View file

@ -19,13 +19,14 @@ HWLOC_DECLSPEC int hwloc__xml_verbose(void);
typedef struct hwloc__xml_import_state_s {
struct hwloc__xml_import_state_s *parent;
/* globals shared because the entire stack of states during import */
/* globals shared between the entire stack of states during import */
struct hwloc_xml_backend_data_s *global;
/* opaque data used to store backend-specific data.
* statically allocated to allow stack-allocation by the common code without knowing actual backend needs.
* libxml is 3 ptrs. nolibxml is 3 ptr + one int.
*/
char data[32];
char data[4 * SIZEOF_VOID_P];
} * hwloc__xml_import_state_t;
struct hwloc__xml_imported_v1distances_s {
@ -74,8 +75,9 @@ typedef struct hwloc__xml_export_state_s {
/* opaque data used to store backend-specific data.
* statically allocated to allow stack-allocation by the common code without knowing actual backend needs.
* libxml is 1 ptr. nolibxml is 1 ptr + 2 size_t + 3 ints.
*/
char data[40];
char data[6 * SIZEOF_VOID_P];
} * hwloc__xml_export_state_t;
HWLOC_DECLSPEC void hwloc__xml_export_topology(hwloc__xml_export_state_t parentstate, hwloc_topology_t topology, unsigned long flags);

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2020 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2010, 2012 Université Bordeaux
* Copyright © 2011-2015 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -287,6 +287,7 @@ static __hwloc_inline int hwloc__check_membind_policy(hwloc_membind_policy_t pol
|| policy == HWLOC_MEMBIND_FIRSTTOUCH
|| policy == HWLOC_MEMBIND_BIND
|| policy == HWLOC_MEMBIND_INTERLEAVE
|| policy == HWLOC_MEMBIND_WEIGHTED_INTERLEAVE
|| policy == HWLOC_MEMBIND_NEXTTOUCH)
return 0;
return -1;

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2020 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2011 Université Bordeaux
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -245,6 +245,7 @@ int hwloc_bitmap_copy(struct hwloc_bitmap_s * dst, const struct hwloc_bitmap_s *
/* Strings always use 32bit groups */
#define HWLOC_PRIxSUBBITMAP "%08lx"
#define HWLOC_BITMAP_SUBSTRING_SIZE 32
#define HWLOC_BITMAP_SUBSTRING_FULL_VALUE 0xFFFFFFFFUL
#define HWLOC_BITMAP_SUBSTRING_LENGTH (HWLOC_BITMAP_SUBSTRING_SIZE/4)
#define HWLOC_BITMAP_STRING_PER_LONG (HWLOC_BITS_PER_LONG/HWLOC_BITMAP_SUBSTRING_SIZE)
@ -261,6 +262,7 @@ int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buflen, const stru
const unsigned long accum_mask = ~0UL;
#else /* HWLOC_BITS_PER_LONG != HWLOC_BITMAP_SUBSTRING_SIZE */
const unsigned long accum_mask = ((1UL << HWLOC_BITMAP_SUBSTRING_SIZE) - 1) << (HWLOC_BITS_PER_LONG - HWLOC_BITMAP_SUBSTRING_SIZE);
int merge_with_infinite_prefix = 0;
#endif /* HWLOC_BITS_PER_LONG != HWLOC_BITMAP_SUBSTRING_SIZE */
HWLOC__BITMAP_CHECK(set);
@ -279,6 +281,9 @@ int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buflen, const stru
res = size>0 ? (int)size - 1 : 0;
tmp += res;
size -= res;
#if HWLOC_BITS_PER_LONG > HWLOC_BITMAP_SUBSTRING_SIZE
merge_with_infinite_prefix = 1;
#endif
}
i=(int) set->ulongs_count-1;
@ -294,16 +299,24 @@ int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buflen, const stru
}
while (i>=0 || accumed) {
unsigned long value;
/* Refill accumulator */
if (!accumed) {
accum = set->ulongs[i--];
accumed = HWLOC_BITS_PER_LONG;
}
value = (accum & accum_mask) >> (HWLOC_BITS_PER_LONG - HWLOC_BITMAP_SUBSTRING_SIZE);
if (accum & accum_mask) {
#if HWLOC_BITS_PER_LONG > HWLOC_BITMAP_SUBSTRING_SIZE
if (merge_with_infinite_prefix && value == HWLOC_BITMAP_SUBSTRING_FULL_VALUE) {
/* first full subbitmap merged with infinite prefix */
res = 0;
} else
#endif
if (value) {
/* print the whole subset if not empty */
res = hwloc_snprintf(tmp, size, needcomma ? ",0x" HWLOC_PRIxSUBBITMAP : "0x" HWLOC_PRIxSUBBITMAP,
(accum & accum_mask) >> (HWLOC_BITS_PER_LONG - HWLOC_BITMAP_SUBSTRING_SIZE));
res = hwloc_snprintf(tmp, size, needcomma ? ",0x" HWLOC_PRIxSUBBITMAP : "0x" HWLOC_PRIxSUBBITMAP, value);
needcomma = 1;
} else if (i == -1 && accumed == HWLOC_BITMAP_SUBSTRING_SIZE) {
/* print a single 0 to mark the last subset */
@ -323,6 +336,7 @@ int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buflen, const stru
#else
accum <<= HWLOC_BITMAP_SUBSTRING_SIZE;
accumed -= HWLOC_BITMAP_SUBSTRING_SIZE;
merge_with_infinite_prefix = 0;
#endif
if (res >= size)
@ -362,7 +376,8 @@ int hwloc_bitmap_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc_restric
{
const char * current = string;
unsigned long accum = 0;
int count=0;
int count = 0;
int ulongcount;
int infinite = 0;
/* count how many substrings there are */
@ -383,9 +398,20 @@ int hwloc_bitmap_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc_restric
count--;
}
if (hwloc_bitmap_reset_by_ulongs(set, (count + HWLOC_BITMAP_STRING_PER_LONG - 1) / HWLOC_BITMAP_STRING_PER_LONG) < 0)
ulongcount = (count + HWLOC_BITMAP_STRING_PER_LONG - 1) / HWLOC_BITMAP_STRING_PER_LONG;
if (hwloc_bitmap_reset_by_ulongs(set, ulongcount) < 0)
return -1;
set->infinite = 0;
set->infinite = 0; /* will be updated later */
#if HWLOC_BITS_PER_LONG != HWLOC_BITMAP_SUBSTRING_SIZE
if (infinite && (count % HWLOC_BITMAP_STRING_PER_LONG) != 0) {
/* accumulate substrings of the first ulong that are hidden in the infinite prefix */
int i;
for(i = (count % HWLOC_BITMAP_STRING_PER_LONG); i < HWLOC_BITMAP_STRING_PER_LONG; i++)
accum |= (HWLOC_BITMAP_SUBSTRING_FULL_VALUE << (i*HWLOC_BITMAP_SUBSTRING_SIZE));
}
#endif
while (*current != '\0') {
unsigned long val;
@ -544,6 +570,9 @@ int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, size_t buflen, co
ssize_t size = buflen;
char *tmp = buf;
int res, ret = 0;
#if HWLOC_BITS_PER_LONG == 64
int merge_with_infinite_prefix = 0;
#endif
int started = 0;
int i;
@ -563,6 +592,9 @@ int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, size_t buflen, co
res = size>0 ? (int)size - 1 : 0;
tmp += res;
size -= res;
#if HWLOC_BITS_PER_LONG == 64
merge_with_infinite_prefix = 1;
#endif
}
i=set->ulongs_count-1;
@ -582,7 +614,11 @@ int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, size_t buflen, co
if (started) {
/* print the whole subset */
#if HWLOC_BITS_PER_LONG == 64
res = hwloc_snprintf(tmp, size, "%016lx", val);
if (merge_with_infinite_prefix && (val & 0xffffffff00000000UL) == 0xffffffff00000000UL) {
res = hwloc_snprintf(tmp, size, "%08lx", val & 0xffffffffUL);
} else {
res = hwloc_snprintf(tmp, size, "%016lx", val);
}
#else
res = hwloc_snprintf(tmp, size, "%08lx", val);
#endif
@ -599,6 +635,9 @@ int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, size_t buflen, co
res = size>0 ? (int)size - 1 : 0;
tmp += res;
size -= res;
#if HWLOC_BITS_PER_LONG == 64
merge_with_infinite_prefix = 0;
#endif
}
/* if didn't display anything, display 0x0 */
@ -679,6 +718,10 @@ int hwloc_bitmap_taskset_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc
goto failed;
set->ulongs[count-1] = val;
if (infinite && tmpchars != HWLOC_BITS_PER_LONG/4) {
/* infinite prefix with partial substring, fill remaining bits */
set->ulongs[count-1] |= (~0ULL)<<(4*tmpchars);
}
current += tmpchars;
chars -= tmpchars;

View file

@ -94,8 +94,7 @@ static hwloc_dlhandle hwloc_dlopenext(const char *_filename)
{
hwloc_dlhandle handle;
char *filename = NULL;
(void) asprintf(&filename, "%s.so", _filename);
if (!filename)
if (asprintf(&filename, "%s.so", _filename) < 0)
return NULL;
handle = dlopen(filename, RTLD_NOW|RTLD_LOCAL);
free(filename);

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2020-2022 Inria. All rights reserved.
* Copyright © 2020-2024 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -50,6 +50,7 @@ hwloc_internal_cpukinds_dup(hwloc_topology_t new, hwloc_topology_t old)
return -1;
new->cpukinds = kinds;
new->nr_cpukinds = old->nr_cpukinds;
new->nr_cpukinds_allocated = old->nr_cpukinds;
memcpy(kinds, old->cpukinds, old->nr_cpukinds * sizeof(*kinds));
for(i=0;i<old->nr_cpukinds; i++) {

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2013-2022 Inria. All rights reserved.
* Copyright © 2013-2023 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -411,6 +411,30 @@ int hwloc_topology_diff_build(hwloc_topology_t topo1,
}
}
if (!err) {
/* cpukinds */
if (topo1->nr_cpukinds != topo2->nr_cpukinds)
goto roottoocomplex;
for(i=0; i<topo1->nr_cpukinds; i++) {
struct hwloc_internal_cpukind_s *ic1 = &topo1->cpukinds[i];
struct hwloc_internal_cpukind_s *ic2 = &topo2->cpukinds[i];
unsigned j;
if (!hwloc_bitmap_isequal(ic1->cpuset, ic2->cpuset)
|| ic1->efficiency != ic2->efficiency
|| ic1->forced_efficiency != ic2->forced_efficiency
|| ic1->ranking_value != ic2->ranking_value
|| ic1->nr_infos != ic2->nr_infos)
goto roottoocomplex;
for(j=0; j<ic1->nr_infos; j++) {
struct hwloc_info_s *info1 = &ic1->infos[j], *info2 = &ic2->infos[j];
if (strcmp(info1->name, info2->name)
|| strcmp(info1->value, info2->value)) {
goto roottoocomplex;
}
}
}
}
return err;
roottoocomplex:

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2010-2022 Inria. All rights reserved.
* Copyright © 2010-2024 Inria. All rights reserved.
* Copyright © 2011-2012 Université Bordeaux
* Copyright © 2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -624,8 +624,8 @@ void * hwloc_distances_add_create(hwloc_topology_t topology,
return NULL;
}
if ((kind & ~HWLOC_DISTANCES_KIND_ALL)
|| hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_FROM_ALL) != 1
|| hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_MEANS_ALL) != 1) {
|| hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_FROM_ALL) > 1
|| hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_MEANS_ALL) > 1) {
errno = EINVAL;
return NULL;
}

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2020-2022 Inria. All rights reserved.
* Copyright © 2020-2024 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -14,13 +14,26 @@
*/
static __hwloc_inline
hwloc_uint64_t hwloc__memattr_get_convenience_value(hwloc_memattr_id_t id,
hwloc_obj_t node)
int hwloc__memattr_get_convenience_value(hwloc_memattr_id_t id,
hwloc_obj_t node,
hwloc_uint64_t *valuep)
{
if (id == HWLOC_MEMATTR_ID_CAPACITY)
return node->attr->numanode.local_memory;
else if (id == HWLOC_MEMATTR_ID_LOCALITY)
return hwloc_bitmap_weight(node->cpuset);
if (id == HWLOC_MEMATTR_ID_CAPACITY) {
if (node->type != HWLOC_OBJ_NUMANODE) {
errno = EINVAL;
return -1;
}
*valuep = node->attr->numanode.local_memory;
return 0;
}
else if (id == HWLOC_MEMATTR_ID_LOCALITY) {
if (!node->cpuset) {
errno = EINVAL;
return -1;
}
*valuep = hwloc_bitmap_weight(node->cpuset);
return 0;
}
else
assert(0);
return 0; /* shut up the compiler */
@ -622,7 +635,7 @@ hwloc_memattr_get_targets(hwloc_topology_t topology,
if (found<max) {
targets[found] = node;
if (values)
values[found] = hwloc__memattr_get_convenience_value(id, node);
hwloc__memattr_get_convenience_value(id, node, &values[found]);
}
found++;
}
@ -748,7 +761,7 @@ hwloc_memattr_get_initiators(hwloc_topology_t topology,
struct hwloc_internal_memattr_target_s *imtg;
unsigned i, max;
if (flags) {
if (flags || !target_node) {
errno = EINVAL;
return -1;
}
@ -810,7 +823,7 @@ hwloc_memattr_get_value(hwloc_topology_t topology,
struct hwloc_internal_memattr_s *imattr;
struct hwloc_internal_memattr_target_s *imtg;
if (flags) {
if (flags || !target_node) {
errno = EINVAL;
return -1;
}
@ -823,8 +836,7 @@ hwloc_memattr_get_value(hwloc_topology_t topology,
if (imattr->iflags & HWLOC_IMATTR_FLAG_CONVENIENCE) {
/* convenience attributes */
*valuep = hwloc__memattr_get_convenience_value(id, target_node);
return 0;
return hwloc__memattr_get_convenience_value(id, target_node, valuep);
}
/* normal attributes */
@ -936,7 +948,7 @@ hwloc_memattr_set_value(hwloc_topology_t topology,
{
struct hwloc_internal_location_s iloc, *ilocp;
if (flags) {
if (flags || !target_node) {
errno = EINVAL;
return -1;
}
@ -1007,10 +1019,10 @@ hwloc_memattr_get_best_target(hwloc_topology_t topology,
/* convenience attributes */
for(j=0; ; j++) {
hwloc_obj_t node = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, j);
hwloc_uint64_t value;
hwloc_uint64_t value = 0;
if (!node)
break;
value = hwloc__memattr_get_convenience_value(id, node);
hwloc__memattr_get_convenience_value(id, node, &value);
hwloc__update_best_target(&best, &best_value, &found,
node, value,
imattr->flags & HWLOC_MEMATTR_FLAG_HIGHER_FIRST);
@ -1093,7 +1105,7 @@ hwloc_memattr_get_best_initiator(hwloc_topology_t topology,
int found;
unsigned i;
if (flags) {
if (flags || !target_node) {
errno = EINVAL;
return -1;
}
@ -1219,24 +1231,82 @@ hwloc_get_local_numanode_objs(hwloc_topology_t topology,
* Using memattrs to identify HBM/DRAM
*/
enum hwloc_memory_tier_type_e {
/* WARNING: keep higher BW types first for compare_tiers_by_bw_and_type() when BW info is missing */
HWLOC_MEMORY_TIER_HBM = 1UL<<0,
HWLOC_MEMORY_TIER_DRAM = 1UL<<1,
HWLOC_MEMORY_TIER_GPU = 1UL<<2,
HWLOC_MEMORY_TIER_SPM = 1UL<<3, /* Specific-Purpose Memory is usually HBM, we'll use BW to confirm or force*/
HWLOC_MEMORY_TIER_NVM = 1UL<<4,
HWLOC_MEMORY_TIER_CXL = 1UL<<5
};
typedef unsigned long hwloc_memory_tier_type_t;
#define HWLOC_MEMORY_TIER_UNKNOWN 0UL
static const char * hwloc_memory_tier_type_snprintf(hwloc_memory_tier_type_t type)
{
switch (type) {
case HWLOC_MEMORY_TIER_DRAM: return "DRAM";
case HWLOC_MEMORY_TIER_HBM: return "HBM";
case HWLOC_MEMORY_TIER_GPU: return "GPUMemory";
case HWLOC_MEMORY_TIER_SPM: return "SPM";
case HWLOC_MEMORY_TIER_NVM: return "NVM";
case HWLOC_MEMORY_TIER_CXL:
case HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_DRAM: return "CXL-DRAM";
case HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_HBM: return "CXL-HBM";
case HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_GPU: return "CXL-GPUMemory";
case HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_SPM: return "CXL-SPM";
case HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_NVM: return "CXL-NVM";
default: return NULL;
}
}
static hwloc_memory_tier_type_t hwloc_memory_tier_type_sscanf(const char *name)
{
if (!strcasecmp(name, "DRAM"))
return HWLOC_MEMORY_TIER_DRAM;
if (!strcasecmp(name, "HBM"))
return HWLOC_MEMORY_TIER_HBM;
if (!strcasecmp(name, "GPUMemory"))
return HWLOC_MEMORY_TIER_GPU;
if (!strcasecmp(name, "SPM"))
return HWLOC_MEMORY_TIER_SPM;
if (!strcasecmp(name, "NVM"))
return HWLOC_MEMORY_TIER_NVM;
if (!strcasecmp(name, "CXL-DRAM"))
return HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_DRAM;
if (!strcasecmp(name, "CXL-HBM"))
return HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_HBM;
if (!strcasecmp(name, "CXL-GPUMemory"))
return HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_GPU;
if (!strcasecmp(name, "CXL-SPM"))
return HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_SPM;
if (!strcasecmp(name, "CXL-NVM"))
return HWLOC_MEMORY_TIER_CXL|HWLOC_MEMORY_TIER_NVM;
return 0;
}
/* factorized tier, grouping multiple nodes */
struct hwloc_memory_tier_s {
hwloc_obj_t node;
uint64_t local_bw;
enum hwloc_memory_tier_type_e {
/* warning the order is important for guess_memory_tiers() after qsort() */
HWLOC_MEMORY_TIER_UNKNOWN,
HWLOC_MEMORY_TIER_DRAM,
HWLOC_MEMORY_TIER_HBM,
HWLOC_MEMORY_TIER_SPM, /* Specific-Purpose Memory is usually HBM, we'll use BW to confirm */
HWLOC_MEMORY_TIER_NVM,
HWLOC_MEMORY_TIER_GPU,
} type;
hwloc_nodeset_t nodeset;
uint64_t local_bw_min, local_bw_max;
uint64_t local_lat_min, local_lat_max;
hwloc_memory_tier_type_t type;
};
static int compare_tiers(const void *_a, const void *_b)
/* early tier discovery, one entry per node */
struct hwloc_memory_node_info_s {
hwloc_obj_t node;
uint64_t local_bw;
uint64_t local_lat;
hwloc_memory_tier_type_t type;
unsigned rank;
};
static int compare_node_infos_by_type_and_bw(const void *_a, const void *_b)
{
const struct hwloc_memory_tier_s *a = _a, *b = _b;
/* sort by type of tier first */
const struct hwloc_memory_node_info_s *a = _a, *b = _b;
/* sort by type of node first */
if (a->type != b->type)
return a->type - b->type;
/* then by bandwidth */
@ -1247,180 +1317,566 @@ static int compare_tiers(const void *_a, const void *_b)
return 0;
}
int
hwloc_internal_memattrs_guess_memory_tiers(hwloc_topology_t topology)
static int compare_tiers_by_bw_and_type(const void *_a, const void *_b)
{
struct hwloc_internal_memattr_s *imattr;
struct hwloc_memory_tier_s *tiers;
unsigned i, j, n;
const char *env;
int spm_is_hbm = -1; /* -1 will guess from BW, 0 no, 1 forced */
int mark_dram = 1;
unsigned first_spm, first_nvm;
hwloc_uint64_t max_unknown_bw, min_spm_bw;
env = getenv("HWLOC_MEMTIERS_GUESS");
if (env) {
if (!strcmp(env, "none")) {
return 0;
} else if (!strcmp(env, "default")) {
/* nothing */
} else if (!strcmp(env, "spm_is_hbm")) {
hwloc_debug("Assuming SPM-tier is HBM, ignore bandwidth\n");
spm_is_hbm = 1;
} else if (HWLOC_SHOW_CRITICAL_ERRORS()) {
fprintf(stderr, "hwloc: Failed to recognize HWLOC_MEMTIERS_GUESS value %s\n", env);
}
const struct hwloc_memory_tier_s *a = _a, *b = _b;
/* sort by (average) BW first */
if (a->local_bw_min && b->local_bw_min) {
if (a->local_bw_min + a->local_bw_max > b->local_bw_min + b->local_bw_max)
return -1;
else if (a->local_bw_min + a->local_bw_max < b->local_bw_min + b->local_bw_max)
return 1;
}
/* then by tier type */
if (a->type != b->type)
return a->type - b->type;
return 0;
}
imattr = &topology->memattrs[HWLOC_MEMATTR_ID_BANDWIDTH];
if (!(imattr->iflags & HWLOC_IMATTR_FLAG_CACHE_VALID))
hwloc__imattr_refresh(topology, imattr);
static struct hwloc_memory_tier_s *
hwloc__group_memory_tiers(hwloc_topology_t topology,
unsigned *nr_tiers_p)
{
struct hwloc_internal_memattr_s *imattr_bw, *imattr_lat;
struct hwloc_memory_node_info_s *nodeinfos;
struct hwloc_memory_tier_s *tiers;
unsigned nr_tiers;
float bw_threshold = 0.1;
float lat_threshold = 0.1;
const char *env;
unsigned i, j, n;
n = hwloc_get_nbobjs_by_depth(topology, HWLOC_TYPE_DEPTH_NUMANODE);
assert(n);
tiers = malloc(n * sizeof(*tiers));
if (!tiers)
return -1;
env = getenv("HWLOC_MEMTIERS_BANDWIDTH_THRESHOLD");
if (env)
bw_threshold = atof(env);
env = getenv("HWLOC_MEMTIERS_LATENCY_THRESHOLD");
if (env)
lat_threshold = atof(env);
imattr_bw = &topology->memattrs[HWLOC_MEMATTR_ID_BANDWIDTH];
imattr_lat = &topology->memattrs[HWLOC_MEMATTR_ID_LATENCY];
if (!(imattr_bw->iflags & HWLOC_IMATTR_FLAG_CACHE_VALID))
hwloc__imattr_refresh(topology, imattr_bw);
if (!(imattr_lat->iflags & HWLOC_IMATTR_FLAG_CACHE_VALID))
hwloc__imattr_refresh(topology, imattr_lat);
nodeinfos = malloc(n * sizeof(*nodeinfos));
if (!nodeinfos)
return NULL;
for(i=0; i<n; i++) {
hwloc_obj_t node;
const char *daxtype;
struct hwloc_internal_location_s iloc;
struct hwloc_internal_memattr_target_s *imtg = NULL;
struct hwloc_internal_memattr_initiator_s *imi;
struct hwloc_internal_memattr_target_s *imtg;
node = hwloc_get_obj_by_depth(topology, HWLOC_TYPE_DEPTH_NUMANODE, i);
assert(node);
tiers[i].node = node;
nodeinfos[i].node = node;
/* defaults */
tiers[i].type = HWLOC_MEMORY_TIER_UNKNOWN;
tiers[i].local_bw = 0; /* unknown */
/* defaults to unknown */
nodeinfos[i].type = HWLOC_MEMORY_TIER_UNKNOWN;
nodeinfos[i].local_bw = 0;
nodeinfos[i].local_lat = 0;
daxtype = hwloc_obj_get_info_by_name(node, "DAXType");
/* mark NVM, SPM and GPU nodes */
if (daxtype && !strcmp(daxtype, "NVM"))
tiers[i].type = HWLOC_MEMORY_TIER_NVM;
if (daxtype && !strcmp(daxtype, "SPM"))
tiers[i].type = HWLOC_MEMORY_TIER_SPM;
if (node->subtype && !strcmp(node->subtype, "GPUMemory"))
tiers[i].type = HWLOC_MEMORY_TIER_GPU;
nodeinfos[i].type = HWLOC_MEMORY_TIER_GPU;
else if (daxtype && !strcmp(daxtype, "NVM"))
nodeinfos[i].type = HWLOC_MEMORY_TIER_NVM;
else if (daxtype && !strcmp(daxtype, "SPM"))
nodeinfos[i].type = HWLOC_MEMORY_TIER_SPM;
/* add CXL flag */
if (hwloc_obj_get_info_by_name(node, "CXLDevice") != NULL) {
/* CXL is always SPM for now. HBM and DRAM not possible here yet.
* Hence remove all but NVM first.
*/
nodeinfos[i].type &= HWLOC_MEMORY_TIER_NVM;
nodeinfos[i].type |= HWLOC_MEMORY_TIER_CXL;
}
if (spm_is_hbm == -1) {
for(j=0; j<imattr->nr_targets; j++)
if (imattr->targets[j].obj == node) {
imtg = &imattr->targets[j];
break;
}
if (imtg && !hwloc_bitmap_iszero(node->cpuset)) {
iloc.type = HWLOC_LOCATION_TYPE_CPUSET;
iloc.location.cpuset = node->cpuset;
imi = hwloc__memattr_target_get_initiator(imtg, &iloc, 0);
if (imi)
tiers[i].local_bw = imi->value;
/* get local bandwidth */
imtg = NULL;
for(j=0; j<imattr_bw->nr_targets; j++)
if (imattr_bw->targets[j].obj == node) {
imtg = &imattr_bw->targets[j];
break;
}
if (imtg && !hwloc_bitmap_iszero(node->cpuset)) {
struct hwloc_internal_memattr_initiator_s *imi;
iloc.type = HWLOC_LOCATION_TYPE_CPUSET;
iloc.location.cpuset = node->cpuset;
imi = hwloc__memattr_target_get_initiator(imtg, &iloc, 0);
if (imi)
nodeinfos[i].local_bw = imi->value;
}
/* get local latency */
imtg = NULL;
for(j=0; j<imattr_lat->nr_targets; j++)
if (imattr_lat->targets[j].obj == node) {
imtg = &imattr_lat->targets[j];
break;
}
if (imtg && !hwloc_bitmap_iszero(node->cpuset)) {
struct hwloc_internal_memattr_initiator_s *imi;
iloc.type = HWLOC_LOCATION_TYPE_CPUSET;
iloc.location.cpuset = node->cpuset;
imi = hwloc__memattr_target_get_initiator(imtg, &iloc, 0);
if (imi)
nodeinfos[i].local_lat = imi->value;
}
}
/* Sort nodes.
* We could also sort by the existing subtype.
* KNL is the only case where subtypes are set in backends, but we set memattrs as well there.
* Also HWLOC_MEMTIERS_REFRESH would be a special value to ignore existing subtypes.
*/
hwloc_debug("Sorting memory node infos...\n");
qsort(nodeinfos, n, sizeof(*nodeinfos), compare_node_infos_by_type_and_bw);
#ifdef HWLOC_DEBUG
for(i=0; i<n; i++)
hwloc_debug(" node info %u = node L#%u P#%u with info type %lx and local BW %llu lat %llu\n",
i,
nodeinfos[i].node->logical_index, nodeinfos[i].node->os_index,
nodeinfos[i].type,
(unsigned long long) nodeinfos[i].local_bw,
(unsigned long long) nodeinfos[i].local_lat);
#endif
/* now we have UNKNOWN nodes (sorted by BW only), then known ones */
/* iterate among them and add a rank value.
* start from rank 0 and switch to next rank when the type changes or when the BW or latendy difference is > threshold */
hwloc_debug("Starting memory tier #0 and iterating over nodes...\n");
nodeinfos[0].rank = 0;
for(i=1; i<n; i++) {
/* reuse the same rank by default */
nodeinfos[i].rank = nodeinfos[i-1].rank;
/* comparing type */
if (nodeinfos[i].type != nodeinfos[i-1].type) {
hwloc_debug(" Switching to memory tier #%u starting with node L#%u P#%u because of type\n",
nodeinfos[i].rank, nodeinfos[i].node->logical_index, nodeinfos[i].node->os_index);
nodeinfos[i].rank++;
continue;
}
/* comparing bandwidth */
if (nodeinfos[i].local_bw && nodeinfos[i-1].local_bw) {
float bw_ratio = (float)nodeinfos[i].local_bw/(float)nodeinfos[i-1].local_bw;
if (bw_ratio < 1.)
bw_ratio = 1./bw_ratio;
if (bw_ratio > 1.0 + bw_threshold) {
nodeinfos[i].rank++;
hwloc_debug(" Switching to memory tier #%u starting with node L#%u P#%u because of bandwidth\n",
nodeinfos[i].rank, nodeinfos[i].node->logical_index, nodeinfos[i].node->os_index);
continue;
}
}
/* comparing latency */
if (nodeinfos[i].local_lat && nodeinfos[i-1].local_lat) {
float lat_ratio = (float)nodeinfos[i].local_lat/(float)nodeinfos[i-1].local_lat;
if (lat_ratio < 1.)
lat_ratio = 1./lat_ratio;
if (lat_ratio > 1.0 + lat_threshold) {
hwloc_debug(" Switching to memory tier #%u starting with node L#%u P#%u because of latency\n",
nodeinfos[i].rank, nodeinfos[i].node->logical_index, nodeinfos[i].node->os_index);
nodeinfos[i].rank++;
continue;
}
}
}
/* FIXME: if there are cpuset-intersecting nodes in same tier, split again? */
hwloc_debug(" Found %u tiers total\n", nodeinfos[n-1].rank + 1);
/* sort tiers */
qsort(tiers, n, sizeof(*tiers), compare_tiers);
hwloc_debug("Sorting memory tiers...\n");
for(i=0; i<n; i++)
hwloc_debug(" tier %u = node L#%u P#%u with tier type %d and local BW #%llu\n",
i,
tiers[i].node->logical_index, tiers[i].node->os_index,
tiers[i].type, (unsigned long long) tiers[i].local_bw);
/* now we have UNKNOWN tiers (sorted by BW), then SPM tiers (sorted by BW), then NVM, then GPU */
/* iterate over UNKNOWN tiers, and find their BW */
/* now group nodeinfos into factorized tiers */
nr_tiers = nodeinfos[n-1].rank + 1;
tiers = calloc(nr_tiers, sizeof(*tiers));
if (!tiers)
goto out_with_nodeinfos;
for(i=0; i<nr_tiers; i++) {
tiers[i].nodeset = hwloc_bitmap_alloc();
if (!tiers[i].nodeset)
goto out_with_tiers;
tiers[i].local_bw_min = tiers[i].local_bw_max = 0;
tiers[i].local_lat_min = tiers[i].local_lat_max = 0;
tiers[i].type = HWLOC_MEMORY_TIER_UNKNOWN;
}
for(i=0; i<n; i++) {
if (tiers[i].type > HWLOC_MEMORY_TIER_UNKNOWN)
break;
}
first_spm = i;
/* get max BW from first */
if (first_spm > 0)
max_unknown_bw = tiers[0].local_bw;
else
max_unknown_bw = 0;
/* there are no DRAM or HBM tiers yet */
/* iterate over SPM tiers, and find their BW */
for(i=first_spm; i<n; i++) {
if (tiers[i].type > HWLOC_MEMORY_TIER_SPM)
break;
}
first_nvm = i;
/* get min BW from last */
if (first_nvm > first_spm)
min_spm_bw = tiers[first_nvm-1].local_bw;
else
min_spm_bw = 0;
/* FIXME: if there's more than 10% between some sets of nodes inside a tier, split it? */
/* FIXME: if there are cpuset-intersecting nodes in same tier, abort? */
if (spm_is_hbm == -1) {
/* if we have BW for all SPM and UNKNOWN
* and all SPM BW are 2x superior to all UNKNOWN BW
*/
hwloc_debug("UNKNOWN-memory-tier max bandwidth %llu\n", (unsigned long long) max_unknown_bw);
hwloc_debug("SPM-memory-tier min bandwidth %llu\n", (unsigned long long) min_spm_bw);
if (max_unknown_bw > 0 && min_spm_bw > 0 && max_unknown_bw*2 < min_spm_bw) {
hwloc_debug("assuming SPM means HBM and !SPM means DRAM since bandwidths are very different\n");
spm_is_hbm = 1;
} else {
hwloc_debug("cannot assume SPM means HBM\n");
spm_is_hbm = 0;
}
unsigned rank = nodeinfos[i].rank;
assert(rank < nr_tiers);
hwloc_bitmap_set(tiers[rank].nodeset, nodeinfos[i].node->os_index);
assert(tiers[rank].type == HWLOC_MEMORY_TIER_UNKNOWN
|| tiers[rank].type == nodeinfos[i].type);
tiers[rank].type = nodeinfos[i].type;
/* nodeinfos are sorted in BW order, no need to compare */
if (!tiers[rank].local_bw_min)
tiers[rank].local_bw_min = nodeinfos[i].local_bw;
tiers[rank].local_bw_max = nodeinfos[i].local_bw;
/* compare latencies to update min/max */
if (!tiers[rank].local_lat_min || nodeinfos[i].local_lat < tiers[rank].local_lat_min)
tiers[rank].local_lat_min = nodeinfos[i].local_lat;
if (!tiers[rank].local_lat_max || nodeinfos[i].local_lat > tiers[rank].local_lat_max)
tiers[rank].local_lat_max = nodeinfos[i].local_lat;
}
if (spm_is_hbm) {
for(i=0; i<first_spm; i++)
tiers[i].type = HWLOC_MEMORY_TIER_DRAM;
for(i=first_spm; i<first_nvm; i++)
tiers[i].type = HWLOC_MEMORY_TIER_HBM;
}
if (first_spm == n)
mark_dram = 0;
/* now apply subtypes */
for(i=0; i<n; i++) {
const char *type = NULL;
if (tiers[i].node->subtype) /* don't overwrite the existing subtype */
continue;
switch (tiers[i].type) {
case HWLOC_MEMORY_TIER_DRAM:
if (mark_dram)
type = "DRAM";
break;
case HWLOC_MEMORY_TIER_HBM:
type = "HBM";
break;
case HWLOC_MEMORY_TIER_SPM:
type = "SPM";
break;
case HWLOC_MEMORY_TIER_NVM:
type = "NVM";
break;
default:
/* GPU memory is already marked with subtype="GPUMemory",
* UNKNOWN doesn't deserve any subtype
*/
break;
}
if (type) {
hwloc_debug("Marking node L#%u P#%u as %s\n", tiers[i].node->logical_index, tiers[i].node->os_index, type);
tiers[i].node->subtype = strdup(type);
}
}
free(nodeinfos);
*nr_tiers_p = nr_tiers;
return tiers;
out_with_tiers:
for(i=0; i<nr_tiers; i++)
hwloc_bitmap_free(tiers[i].nodeset);
free(tiers);
out_with_nodeinfos:
free(nodeinfos);
return NULL;
}
enum hwloc_guess_memtiers_flag {
HWLOC_GUESS_MEMTIERS_FLAG_NODE0_IS_DRAM = 1<<0,
HWLOC_GUESS_MEMTIERS_FLAG_SPM_IS_HBM = 1<<1
};
static int
hwloc__guess_dram_hbm_tiers(struct hwloc_memory_tier_s *tier1,
struct hwloc_memory_tier_s *tier2,
unsigned long flags)
{
struct hwloc_memory_tier_s *tmp;
if (!tier1->local_bw_min || !tier2->local_bw_min) {
hwloc_debug(" Missing BW info\n");
return -1;
}
/* reorder tiers by BW */
if (tier1->local_bw_min > tier2->local_bw_min) {
tmp = tier1; tier1 = tier2; tier2 = tmp;
}
/* tier1 < tier2 */
hwloc_debug(" tier1 BW %llu-%llu vs tier2 BW %llu-%llu\n",
(unsigned long long) tier1->local_bw_min,
(unsigned long long) tier1->local_bw_max,
(unsigned long long) tier2->local_bw_min,
(unsigned long long) tier2->local_bw_max);
if (tier2->local_bw_min <= tier1->local_bw_max * 2) {
/* tier2 BW isn't 2x tier1, we cannot guess HBM */
hwloc_debug(" BW difference isn't >2x\n");
return -1;
}
/* tier2 BW is >2x tier1 */
if ((flags & HWLOC_GUESS_MEMTIERS_FLAG_NODE0_IS_DRAM)
&& hwloc_bitmap_isset(tier2->nodeset, 0)) {
/* node0 is not DRAM, and we assume that's not possible */
hwloc_debug(" node0 shouldn't have HBM BW\n");
return -1;
}
/* assume tier1 == DRAM and tier2 == HBM */
tier1->type = HWLOC_MEMORY_TIER_DRAM;
tier2->type = HWLOC_MEMORY_TIER_HBM;
hwloc_debug(" Success\n");
return 0;
}
static int
hwloc__guess_memory_tiers_types(hwloc_topology_t topology __hwloc_attribute_unused,
unsigned nr_tiers,
struct hwloc_memory_tier_s *tiers)
{
unsigned long flags;
const char *env;
unsigned nr_unknown, nr_spm;
struct hwloc_memory_tier_s *unknown_tier[2], *spm_tier;
unsigned i;
flags = 0;
env = getenv("HWLOC_MEMTIERS_GUESS");
if (env) {
if (!strcmp(env, "none"))
return 0;
/* by default, we don't guess anything unsure */
if (!strcmp(env, "all"))
/* enable all typical cases */
flags = ~0UL;
if (strstr(env, "spm_is_hbm")) {
hwloc_debug("Assuming SPM-tier is HBM, ignore bandwidth\n");
flags |= HWLOC_GUESS_MEMTIERS_FLAG_SPM_IS_HBM;
}
if (strstr(env, "node0_is_dram")) {
hwloc_debug("Assuming node0 is DRAM\n");
flags |= HWLOC_GUESS_MEMTIERS_FLAG_NODE0_IS_DRAM;
}
}
if (nr_tiers == 1)
/* Likely DRAM only, but could also be HBM-only in non-SPM mode.
* We cannot be sure, but it doesn't matter since there's a single tier.
*/
return 0;
nr_unknown = nr_spm = 0;
unknown_tier[0] = unknown_tier[1] = spm_tier = NULL;
for(i=0; i<nr_tiers; i++) {
switch (tiers[i].type) {
case HWLOC_MEMORY_TIER_UNKNOWN:
if (nr_unknown < 2)
unknown_tier[nr_unknown] = &tiers[i];
nr_unknown++;
break;
case HWLOC_MEMORY_TIER_SPM:
spm_tier = &tiers[i];
nr_spm++;
break;
case HWLOC_MEMORY_TIER_DRAM:
case HWLOC_MEMORY_TIER_HBM:
/* not possible */
abort();
default:
/* ignore HBM, NVM, ... */
break;
}
}
hwloc_debug("Found %u unknown memory tiers and %u SPM\n",
nr_unknown, nr_spm);
/* Try to guess DRAM + HBM common cases.
* Other things we'd like to detect:
* single unknown => DRAM or HBM? HBM won't be SPM on HBM-only CPUs
* unknown + CXL DRAM => DRAM or HBM?
*/
if (nr_unknown == 2 && !nr_spm) {
/* 2 unknown, could be DRAM + non-SPM HBM */
hwloc_debug(" Trying to guess 2 unknown tiers using BW\n");
hwloc__guess_dram_hbm_tiers(unknown_tier[0], unknown_tier[1], flags);
} else if (nr_unknown == 1 && nr_spm == 1) {
/* 1 unknown + 1 SPM, could be DRAM + SPM HBM */
hwloc_debug(" Trying to guess 1 unknown + 1 SPM tiers using BW\n");
hwloc__guess_dram_hbm_tiers(unknown_tier[0], spm_tier, flags);
}
if (flags & HWLOC_GUESS_MEMTIERS_FLAG_SPM_IS_HBM) {
/* force mark SPM as HBM */
for(i=0; i<nr_tiers; i++)
if (tiers[i].type == HWLOC_MEMORY_TIER_SPM) {
hwloc_debug("Forcing SPM tier to HBM");
tiers[i].type = HWLOC_MEMORY_TIER_HBM;
}
}
if (flags & HWLOC_GUESS_MEMTIERS_FLAG_NODE0_IS_DRAM) {
/* force mark node0's tier as DRAM if we couldn't guess it */
for(i=0; i<nr_tiers; i++)
if (hwloc_bitmap_isset(tiers[i].nodeset, 0)
&& tiers[i].type == HWLOC_MEMORY_TIER_UNKNOWN) {
hwloc_debug("Forcing node0 tier to DRAM");
tiers[i].type = HWLOC_MEMORY_TIER_DRAM;
break;
}
}
return 0;
}
/* parses something like 0xf=HBM;0x0f=DRAM;0x00f=CXL-DRAM */
static struct hwloc_memory_tier_s *
hwloc__force_memory_tiers(hwloc_topology_t topology __hwloc_attribute_unused,
unsigned *nr_tiers_p,
const char *_env)
{
struct hwloc_memory_tier_s *tiers = NULL;
unsigned nr_tiers, i;
hwloc_bitmap_t nodeset = NULL;
char *env;
const char *tmp;
env = strdup(_env);
if (!env) {
fprintf(stderr, "[hwloc/memtiers] failed to duplicate HWLOC_MEMTIERS envvar\n");
goto out;
}
tmp = env;
nr_tiers = 1;
while (1) {
tmp = strchr(tmp, ';');
if (!tmp)
break;
tmp++;
nr_tiers++;
}
nodeset = hwloc_bitmap_alloc();
if (!nodeset) {
fprintf(stderr, "[hwloc/memtiers] failed to allocated forced tiers' nodeset\n");
goto out_with_envvar;
}
tiers = calloc(nr_tiers, sizeof(*tiers));
if (!tiers) {
fprintf(stderr, "[hwloc/memtiers] failed to allocated forced tiers\n");
goto out_with_nodeset;
}
nr_tiers = 0;
tmp = env;
while (1) {
char *end;
char *equal;
hwloc_memory_tier_type_t type;
end = strchr(tmp, ';');
if (end)
*end = '\0';
equal = strchr(tmp, '=');
if (!equal) {
fprintf(stderr, "[hwloc/memtiers] missing `=' before end of forced tier description at `%s'\n", tmp);
goto out_with_tiers;
}
*equal = '\0';
hwloc_bitmap_sscanf(nodeset, tmp);
if (hwloc_bitmap_iszero(nodeset)) {
fprintf(stderr, "[hwloc/memtiers] empty forced tier nodeset `%s', aborting\n", tmp);
goto out_with_tiers;
}
type = hwloc_memory_tier_type_sscanf(equal+1);
if (!type)
hwloc_debug("failed to recognize forced tier type `%s'\n", equal+1);
tiers[nr_tiers].nodeset = hwloc_bitmap_dup(nodeset);
tiers[nr_tiers].type = type;
tiers[nr_tiers].local_bw_min = tiers[nr_tiers].local_bw_max = 0;
tiers[nr_tiers].local_lat_min = tiers[nr_tiers].local_lat_max = 0;
nr_tiers++;
if (!end)
break;
tmp = end+1;
}
free(env);
hwloc_bitmap_free(nodeset);
hwloc_debug("Forcing %u memory tiers\n", nr_tiers);
#ifdef HWLOC_DEBUG
for(i=0; i<nr_tiers; i++) {
char *s;
hwloc_bitmap_asprintf(&s, tiers[i].nodeset);
hwloc_debug(" tier #%u type %lx nodeset %s\n", i, tiers[i].type, s);
free(s);
}
#endif
*nr_tiers_p = nr_tiers;
return tiers;
out_with_tiers:
for(i=0; i<nr_tiers; i++)
hwloc_bitmap_free(tiers[i].nodeset);
free(tiers);
out_with_nodeset:
hwloc_bitmap_free(nodeset);
out_with_envvar:
free(env);
out:
return NULL;
}
static void
hwloc__apply_memory_tiers_subtypes(hwloc_topology_t topology,
unsigned nr_tiers,
struct hwloc_memory_tier_s *tiers,
int force)
{
hwloc_obj_t node = NULL;
hwloc_debug("Marking node tiers\n");
while ((node = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NUMANODE, node)) != NULL) {
unsigned j;
for(j=0; j<nr_tiers; j++) {
if (hwloc_bitmap_isset(tiers[j].nodeset, node->os_index)) {
const char *subtype = hwloc_memory_tier_type_snprintf(tiers[j].type);
if (!node->subtype || force) { /* don't overwrite the existing subtype unless forced */
if (subtype) { /* don't set a subtype for unknown tiers */
hwloc_debug(" marking node L#%u P#%u as %s (was %s)\n", node->logical_index, node->os_index, subtype, node->subtype);
free(node->subtype);
node->subtype = strdup(subtype);
}
} else
hwloc_debug(" node L#%u P#%u already marked as %s, not setting %s\n",
node->logical_index, node->os_index, node->subtype, subtype);
if (nr_tiers > 1) {
char tmp[20];
snprintf(tmp, sizeof(tmp), "%u", j);
hwloc__add_info_nodup(&node->infos, &node->infos_count, "MemoryTier", tmp, 1);
}
break; /* each node is in a single tier */
}
}
}
if (nr_tiers > 1) {
hwloc_obj_t root = hwloc_get_root_obj(topology);
char tmp[20];
snprintf(tmp, sizeof(tmp), "%u", nr_tiers);
hwloc__add_info_nodup(&root->infos, &root->infos_count, "MemoryTiersNr", tmp, 1);
}
}
int
hwloc_internal_memattrs_guess_memory_tiers(hwloc_topology_t topology, int force_subtype)
{
struct hwloc_memory_tier_s *tiers;
unsigned nr_tiers;
unsigned i;
const char *env;
env = getenv("HWLOC_MEMTIERS");
if (env) {
if (!strcmp(env, "none"))
goto out;
tiers = hwloc__force_memory_tiers(topology, &nr_tiers, env);
if (tiers) {
assert(nr_tiers > 0);
force_subtype = 1;
goto ready;
}
}
tiers = hwloc__group_memory_tiers(topology, &nr_tiers);
if (!tiers)
goto out;
hwloc__guess_memory_tiers_types(topology, nr_tiers, tiers);
/* sort tiers by BW first, then by type */
hwloc_debug("Sorting memory tiers...\n");
qsort(tiers, nr_tiers, sizeof(*tiers), compare_tiers_by_bw_and_type);
ready:
#ifdef HWLOC_DEBUG
for(i=0; i<nr_tiers; i++) {
char *s;
hwloc_bitmap_asprintf(&s, tiers[i].nodeset);
hwloc_debug(" tier %u = nodes %s with type %lx and local BW %llu-%llu lat %llu-%llu\n",
i,
s, tiers[i].type,
(unsigned long long) tiers[i].local_bw_min,
(unsigned long long) tiers[i].local_bw_max,
(unsigned long long) tiers[i].local_lat_min,
(unsigned long long) tiers[i].local_lat_max);
free(s);
}
#endif
hwloc__apply_memory_tiers_subtypes(topology, nr_tiers, tiers, force_subtype);
for(i=0; i<nr_tiers; i++)
hwloc_bitmap_free(tiers[i].nodeset);
free(tiers);
out:
return 0;
}

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2009-2022 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* See COPYING in top-level directory.
*/
@ -886,36 +886,12 @@ hwloc_pcidisc_find_linkspeed(const unsigned char *config,
unsigned offset, float *linkspeed)
{
unsigned linksta, speed, width;
float lanespeed;
memcpy(&linksta, &config[offset + HWLOC_PCI_EXP_LNKSTA], 4);
speed = linksta & HWLOC_PCI_EXP_LNKSTA_SPEED; /* PCIe generation */
width = (linksta & HWLOC_PCI_EXP_LNKSTA_WIDTH) >> 4; /* how many lanes */
/*
* These are single-direction bandwidths only.
*
* Gen1 used NRZ with 8/10 encoding.
* PCIe Gen1 = 2.5GT/s signal-rate per lane x 8/10 = 0.25GB/s data-rate per lane
* PCIe Gen2 = 5 GT/s signal-rate per lane x 8/10 = 0.5 GB/s data-rate per lane
* Gen3 switched to NRZ with 128/130 encoding.
* PCIe Gen3 = 8 GT/s signal-rate per lane x 128/130 = 1 GB/s data-rate per lane
* PCIe Gen4 = 16 GT/s signal-rate per lane x 128/130 = 2 GB/s data-rate per lane
* PCIe Gen5 = 32 GT/s signal-rate per lane x 128/130 = 4 GB/s data-rate per lane
* Gen6 switched to PAM with with 242/256 FLIT (242B payload protected by 8B CRC + 6B FEC).
* PCIe Gen6 = 64 GT/s signal-rate per lane x 242/256 = 8 GB/s data-rate per lane
* PCIe Gen7 = 128GT/s signal-rate per lane x 242/256 = 16 GB/s data-rate per lane
*/
/* lanespeed in Gbit/s */
if (speed <= 2)
lanespeed = 2.5f * speed * 0.8f;
else if (speed <= 5)
lanespeed = 8.0f * (1<<(speed-3)) * 128/130;
else
lanespeed = 8.0f * (1<<(speed-3)) * 242/256; /* assume Gen8 will be 256 GT/s and so on */
/* linkspeed in GB/s */
*linkspeed = lanespeed * width / 8;
*linkspeed = hwloc__pci_link_speed(speed, width);
return 0;
}

View file

@ -23,6 +23,7 @@ struct hwloc_shmem_header {
uint32_t header_length; /* where the actual topology starts in the file/mapping */
uint64_t mmap_address; /* virtual address to pass to mmap */
uint64_t mmap_length; /* length to pass to mmap (includes the header) */
/* we will pad the end to a multiple of pointer size so that the topology is well aligned */
};
#define HWLOC_SHMEM_MALLOC_ALIGN 8UL
@ -85,6 +86,7 @@ hwloc_shmem_topology_write(hwloc_topology_t topology,
hwloc_topology_t new;
struct hwloc_tma tma;
struct hwloc_shmem_header header;
uint32_t header_length = (sizeof(header) + sizeof(void*) - 1) & ~(sizeof(void*) - 1); /* pad to a multiple of pointer size */
void *mmap_res;
int err;
@ -100,7 +102,7 @@ hwloc_shmem_topology_write(hwloc_topology_t topology,
hwloc_internal_memattrs_refresh(topology);
header.header_version = HWLOC_SHMEM_HEADER_VERSION;
header.header_length = sizeof(header);
header.header_length = header_length;
header.mmap_address = (uintptr_t) mmap_address;
header.mmap_length = length;
@ -127,7 +129,7 @@ hwloc_shmem_topology_write(hwloc_topology_t topology,
tma.malloc = tma_shmem_malloc;
tma.dontfree = 1;
tma.data = (char *)mmap_res + sizeof(header);
tma.data = (char *)mmap_res + header_length;
err = hwloc__topology_dup(&new, topology, &tma);
if (err < 0)
return err;
@ -154,6 +156,7 @@ hwloc_shmem_topology_adopt(hwloc_topology_t *topologyp,
{
hwloc_topology_t new, old;
struct hwloc_shmem_header header;
uint32_t header_length = (sizeof(header) + sizeof(void*) - 1) & ~(sizeof(void*) - 1); /* pad to a multiple of pointer size */
void *mmap_res;
int err;
@ -171,7 +174,7 @@ hwloc_shmem_topology_adopt(hwloc_topology_t *topologyp,
return -1;
if (header.header_version != HWLOC_SHMEM_HEADER_VERSION
|| header.header_length != sizeof(header)
|| header.header_length != header_length
|| header.mmap_address != (uintptr_t) mmap_address
|| header.mmap_length != length) {
errno = EINVAL;
@ -186,7 +189,7 @@ hwloc_shmem_topology_adopt(hwloc_topology_t *topologyp,
goto out_with_mmap;
}
old = (hwloc_topology_t)((char*)mmap_address + sizeof(header));
old = (hwloc_topology_t)((char*)mmap_address + header_length);
if (hwloc_topology_abi_check(old) < 0) {
errno = EINVAL;
goto out_with_mmap;

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2022 Inria. All rights reserved.
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2010 Université Bordeaux
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -23,6 +23,7 @@ struct hwloc_synthetic_attr_s {
unsigned depth; /* For caches/groups */
hwloc_obj_cache_type_t cachetype; /* For caches */
hwloc_uint64_t memorysize; /* For caches/memory */
hwloc_uint64_t memorysidecachesize; /* Single level of memory-side-cache in-front of a NUMA node */
};
struct hwloc_synthetic_indexes_s {
@ -380,6 +381,9 @@ hwloc_synthetic_parse_attrs(const char *attrs, const char **next_posp,
} else if (!iscache && !strncmp("memory=", attrs, 7)) {
memorysize = hwloc_synthetic_parse_memory_attr(attrs+7, &attrs);
} else if (!strncmp("memorysidecachesize=", attrs, 20)) {
sattr->memorysidecachesize = hwloc_synthetic_parse_memory_attr(attrs+20, &attrs);
} else if (!strncmp("indexes=", attrs, 8)) {
index_string = attrs+8;
attrs += 8;
@ -387,10 +391,9 @@ hwloc_synthetic_parse_attrs(const char *attrs, const char **next_posp,
attrs += index_string_length;
} else {
if (verbose)
fprintf(stderr, "Unknown attribute at '%s'\n", attrs);
errno = EINVAL;
return -1;
size_t length = strcspn(attrs, " )");
fprintf(stderr, "hwloc/synthetic: Ignoring unknown attribute at '%s'\n", attrs);
attrs += length;
}
if (' ' == *attrs)
@ -416,6 +419,32 @@ hwloc_synthetic_parse_attrs(const char *attrs, const char **next_posp,
return 0;
}
static void
hwloc_synthetic_set_default_attrs(struct hwloc_synthetic_attr_s *sattr,
int *type_count)
{
hwloc_obj_type_t type = sattr->type;
if (type == HWLOC_OBJ_GROUP) {
if (sattr->depth == (unsigned)-1)
sattr->depth = type_count[HWLOC_OBJ_GROUP]--;
} else if (hwloc__obj_type_is_cache(type)) {
if (!sattr->memorysize) {
if (1 == sattr->depth)
/* 32KiB in L1 */
sattr->memorysize = 32*1024;
else
/* *4 at each level, starting from 1MiB for L2, unified */
sattr->memorysize = 256ULL*1024 << (2*sattr->depth);
}
} else if (type == HWLOC_OBJ_NUMANODE && !sattr->memorysize) {
/* 1GiB in memory nodes. */
sattr->memorysize = 1024*1024*1024;
}
}
/* frees level until arity = 0 */
static void
hwloc_synthetic_free_levels(struct hwloc_synthetic_backend_data_s *data)
@ -465,6 +494,7 @@ hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data,
data->level[0].indexes.string = NULL;
data->level[0].indexes.array = NULL;
data->level[0].attr.memorysize = 0;
data->level[0].attr.memorysidecachesize = 0;
data->level[0].attached = NULL;
type_count[HWLOC_OBJ_MACHINE] = 1;
if (*description == '(') {
@ -514,6 +544,7 @@ hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data,
if (attached) {
attached->attr.type = type;
attached->attr.memorysize = 0;
attached->attr.memorysidecachesize = 0;
/* attached->attr.depth and .cachetype unused */
attached->next = NULL;
pprev = &data->level[count-1].attached;
@ -601,7 +632,7 @@ hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data,
}
if (!item) {
if (verbose)
fprintf(stderr,"Synthetic string with disallow 0 number of objects at '%s'\n", pos);
fprintf(stderr,"Synthetic string with disallowed 0 number of objects at '%s'\n", pos);
errno = EINVAL;
goto error;
}
@ -611,6 +642,7 @@ hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data,
data->level[count].indexes.string = NULL;
data->level[count].indexes.array = NULL;
data->level[count].attr.memorysize = 0;
data->level[count].attr.memorysidecachesize = 0;
if (*next_pos == '(') {
err = hwloc_synthetic_parse_attrs(next_pos+1, &next_pos, &data->level[count].attr, &data->level[count].indexes, verbose);
if (err < 0)
@ -796,6 +828,7 @@ hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data,
data->level[1].indexes.string = NULL;
data->level[1].indexes.array = NULL;
data->level[1].attr.memorysize = 0;
data->level[1].attr.memorysidecachesize = 0;
data->level[1].totalwidth = data->level[0].totalwidth;
/* update arity to insert a single NUMA node per parent */
data->level[1].arity = data->level[0].arity;
@ -803,30 +836,14 @@ hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data,
count++;
}
/* set default attributes that depend on the depth/hierarchy of levels */
for (i=0; i<count; i++) {
struct hwloc_synthetic_attached_s *attached;
struct hwloc_synthetic_level_data_s *curlevel = &data->level[i];
hwloc_obj_type_t type = curlevel->attr.type;
if (type == HWLOC_OBJ_GROUP) {
if (curlevel->attr.depth == (unsigned)-1)
curlevel->attr.depth = type_count[HWLOC_OBJ_GROUP]--;
} else if (hwloc__obj_type_is_cache(type)) {
if (!curlevel->attr.memorysize) {
if (1 == curlevel->attr.depth)
/* 32KiB in L1 */
curlevel->attr.memorysize = 32*1024;
else
/* *4 at each level, starting from 1MiB for L2, unified */
curlevel->attr.memorysize = 256ULL*1024 << (2*curlevel->attr.depth);
}
} else if (type == HWLOC_OBJ_NUMANODE && !curlevel->attr.memorysize) {
/* 1GiB in memory nodes. */
curlevel->attr.memorysize = 1024*1024*1024;
}
hwloc_synthetic_process_indexes(data, &data->level[i].indexes, data->level[i].totalwidth, verbose);
hwloc_synthetic_set_default_attrs(&curlevel->attr, type_count);
for(attached = curlevel->attached; attached != NULL; attached = attached->next)
hwloc_synthetic_set_default_attrs(&attached->attr, type_count);
hwloc_synthetic_process_indexes(data, &curlevel->indexes, curlevel->totalwidth, verbose);
}
hwloc_synthetic_process_indexes(data, &data->numa_attached_indexes, data->numa_attached_nr, verbose);
@ -859,6 +876,12 @@ hwloc_synthetic_set_attr(struct hwloc_synthetic_attr_s *sattr,
obj->attr->numanode.page_types[0].size = 4096;
obj->attr->numanode.page_types[0].count = sattr->memorysize / 4096;
break;
case HWLOC_OBJ_MEMCACHE:
obj->attr->cache.depth = 1;
obj->attr->cache.linesize = 64;
obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED;
obj->attr->cache.size = sattr->memorysidecachesize;
break;
case HWLOC_OBJ_PACKAGE:
case HWLOC_OBJ_DIE:
break;
@ -926,6 +949,14 @@ hwloc_synthetic_insert_attached(struct hwloc_topology *topology,
hwloc__insert_object_by_cpuset(topology, NULL, child, "synthetic:attached");
if (attached->attr.memorysidecachesize) {
hwloc_obj_t mscachechild = hwloc_alloc_setup_object(topology, HWLOC_OBJ_MEMCACHE, HWLOC_UNKNOWN_INDEX);
mscachechild->cpuset = hwloc_bitmap_dup(set);
mscachechild->nodeset = hwloc_bitmap_dup(child->nodeset);
hwloc_synthetic_set_attr(&attached->attr, mscachechild);
hwloc__insert_object_by_cpuset(topology, NULL, mscachechild, "synthetic:attached:mscache");
}
hwloc_synthetic_insert_attached(topology, data, attached->next, set);
}
@ -977,6 +1008,14 @@ hwloc__look_synthetic(struct hwloc_topology *topology,
hwloc_synthetic_set_attr(&curlevel->attr, obj);
hwloc__insert_object_by_cpuset(topology, NULL, obj, "synthetic");
if (type == HWLOC_OBJ_NUMANODE && curlevel->attr.memorysidecachesize) {
hwloc_obj_t mscachechild = hwloc_alloc_setup_object(topology, HWLOC_OBJ_MEMCACHE, HWLOC_UNKNOWN_INDEX);
mscachechild->cpuset = hwloc_bitmap_dup(set);
mscachechild->nodeset = hwloc_bitmap_dup(obj->nodeset);
hwloc_synthetic_set_attr(&curlevel->attr, mscachechild);
hwloc__insert_object_by_cpuset(topology, NULL, mscachechild, "synthetic:mscache");
}
}
hwloc_synthetic_insert_attached(topology, data, curlevel->attached, set);
@ -1217,6 +1256,7 @@ hwloc__export_synthetic_indexes(hwloc_obj_t *level, unsigned total,
static int
hwloc__export_synthetic_obj_attr(struct hwloc_topology * topology,
unsigned long flags,
hwloc_obj_t obj,
char *buffer, size_t buflen)
{
@ -1224,6 +1264,7 @@ hwloc__export_synthetic_obj_attr(struct hwloc_topology * topology,
const char * prefix = "(";
char cachesize[64] = "";
char memsize[64] = "";
char memorysidecachesize[64] = "";
int needindexes = 0;
if (hwloc__obj_type_is_cache(obj->type) && obj->attr->cache.size) {
@ -1236,6 +1277,19 @@ hwloc__export_synthetic_obj_attr(struct hwloc_topology * topology,
prefix, (unsigned long long) obj->attr->numanode.local_memory);
prefix = separator;
}
if (obj->type == HWLOC_OBJ_NUMANODE && !(flags & HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_V1)) {
hwloc_obj_t memorysidecache = obj->parent;
hwloc_uint64_t size = 0;
while (memorysidecache && memorysidecache->type == HWLOC_OBJ_MEMCACHE) {
size += memorysidecache->attr->cache.size;
memorysidecache = memorysidecache->parent;
}
if (size) {
snprintf(memorysidecachesize, sizeof(memorysidecachesize), "%smemorysidecachesize=%llu",
prefix, (unsigned long long) size);
prefix = separator;
}
}
if (!obj->logical_index /* only display indexes once per level (not for non-first NUMA children, etc.) */
&& (obj->type == HWLOC_OBJ_PU || obj->type == HWLOC_OBJ_NUMANODE)) {
hwloc_obj_t cur = obj;
@ -1247,12 +1301,12 @@ hwloc__export_synthetic_obj_attr(struct hwloc_topology * topology,
cur = cur->next_cousin;
}
}
if (*cachesize || *memsize || needindexes) {
if (*cachesize || *memsize || *memorysidecachesize || needindexes) {
ssize_t tmplen = buflen;
char *tmp = buffer;
int res, ret = 0;
res = hwloc_snprintf(tmp, tmplen, "%s%s%s", cachesize, memsize, needindexes ? "" : ")");
res = hwloc_snprintf(tmp, tmplen, "%s%s%s%s", cachesize, memsize, memorysidecachesize, needindexes ? "" : ")");
if (hwloc__export_synthetic_update_status(&ret, &tmp, &tmplen, res) < 0)
return -1;
@ -1326,7 +1380,7 @@ hwloc__export_synthetic_obj(struct hwloc_topology * topology, unsigned long flag
if (!(flags & HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_ATTRS)) {
/* obj attributes */
res = hwloc__export_synthetic_obj_attr(topology, obj, tmp, tmplen);
res = hwloc__export_synthetic_obj_attr(topology, flags, obj, tmp, tmplen);
if (hwloc__export_synthetic_update_status(&ret, &tmp, &tmplen, res) < 0)
return -1;
}
@ -1351,7 +1405,7 @@ hwloc__export_synthetic_memory_children(struct hwloc_topology * topology, unsign
if (flags & HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_V1) {
/* v1: export a single NUMA child */
if (parent->memory_arity > 1 || mchild->type != HWLOC_OBJ_NUMANODE) {
if (parent->memory_arity > 1) {
/* not supported */
if (verbose)
fprintf(stderr, "Cannot export to synthetic v1 if multiple memory children are attached to the same location.\n");
@ -1362,6 +1416,9 @@ hwloc__export_synthetic_memory_children(struct hwloc_topology * topology, unsign
if (needprefix)
hwloc__export_synthetic_add_char(&ret, &tmp, &tmplen, ' ');
/* ignore memcaches and export the NUMA node */
while (mchild->type != HWLOC_OBJ_NUMANODE)
mchild = mchild->memory_first_child;
res = hwloc__export_synthetic_obj(topology, flags, mchild, 1, tmp, tmplen);
if (hwloc__export_synthetic_update_status(&ret, &tmp, &tmplen, res) < 0)
return -1;
@ -1369,16 +1426,25 @@ hwloc__export_synthetic_memory_children(struct hwloc_topology * topology, unsign
}
while (mchild) {
/* FIXME: really recurse to export memcaches and numanode,
/* The core doesn't support shared memcache for now (because ACPI and Linux don't).
* So, for each mchild here, recurse only in the first children at each level.
*
* FIXME: whenever supported by the core, really recurse to export memcaches and numanode,
* but it requires clever parsing of [ memcache [numa] [numa] ] during import,
* better attaching of things to describe the hierarchy.
*/
hwloc_obj_t numanode = mchild;
/* only export the first NUMA node leaf of each memory child
* FIXME: This assumes mscache aren't shared between nodes, that's true in current platforms
/* Only export the first NUMA node leaf of each memory child.
* Memcaches are ignored here, they will be summed and exported as a single attribute
* of the NUMA node in hwloc__export_synthetic_obj().
*/
while (numanode && numanode->type != HWLOC_OBJ_NUMANODE) {
assert(numanode->arity == 1);
if (verbose && numanode->memory_arity > 1) {
static int warned = 0;
if (!warned)
fprintf(stderr, "Ignoring non-first memory children at non-first level of memory hierarchy.\n");
warned = 1;
}
numanode = numanode->memory_first_child;
}
assert(numanode); /* there's always a numanode at the bottom of the memory tree */
@ -1511,17 +1577,21 @@ hwloc_topology_export_synthetic(struct hwloc_topology * topology,
if (flags & HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_V1) {
/* v1 requires all NUMA at the same level */
hwloc_obj_t node;
hwloc_obj_t node, parent;
signed pdepth;
node = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, 0);
assert(node);
assert(hwloc__obj_type_is_normal(node->parent->type)); /* only depth-1 memory children for now */
pdepth = node->parent->depth;
parent = node->parent;
while (!hwloc__obj_type_is_normal(parent->type))
parent = parent->parent;
pdepth = parent->depth;
while ((node = node->next_cousin) != NULL) {
assert(hwloc__obj_type_is_normal(node->parent->type)); /* only depth-1 memory children for now */
if (node->parent->depth != pdepth) {
parent = node->parent;
while (!hwloc__obj_type_is_normal(parent->type))
parent = parent->parent;
if (parent->depth != pdepth) {
if (verbose)
fprintf(stderr, "Cannot export to synthetic v1 if memory is attached to parents at different depths.\n");
errno = EINVAL;
@ -1534,7 +1604,7 @@ hwloc_topology_export_synthetic(struct hwloc_topology * topology,
if (!(flags & HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_ATTRS)) {
/* obj attributes */
res = hwloc__export_synthetic_obj_attr(topology, obj, tmp, tmplen);
res = hwloc__export_synthetic_obj_attr(topology, flags, obj, tmp, tmplen);
if (res > 0)
needprefix = 1;
if (hwloc__export_synthetic_update_status(&ret, &tmp, &tmplen, res) < 0)

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2022 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2012, 2020 Université Bordeaux
* Copyright © 2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -220,7 +220,7 @@ static void hwloc_win_get_function_ptrs(void)
#pragma GCC diagnostic ignored "-Wcast-function-type"
#endif
kernel32 = LoadLibrary("kernel32.dll");
kernel32 = LoadLibrary(TEXT("kernel32.dll"));
if (kernel32) {
GetActiveProcessorGroupCountProc =
(PFN_GETACTIVEPROCESSORGROUPCOUNT) GetProcAddress(kernel32, "GetActiveProcessorGroupCount");
@ -249,12 +249,12 @@ static void hwloc_win_get_function_ptrs(void)
}
if (!QueryWorkingSetExProc) {
HMODULE psapi = LoadLibrary("psapi.dll");
HMODULE psapi = LoadLibrary(TEXT("psapi.dll"));
if (psapi)
QueryWorkingSetExProc = (PFN_QUERYWORKINGSETEX) GetProcAddress(psapi, "QueryWorkingSetEx");
}
ntdll = GetModuleHandle("ntdll");
ntdll = GetModuleHandle(TEXT("ntdll"));
RtlGetVersionProc = (PFN_RTLGETVERSION) GetProcAddress(ntdll, "RtlGetVersion");
#if HWLOC_HAVE_GCC_W_CAST_FUNCTION_TYPE
@ -367,7 +367,7 @@ hwloc_win_get_processor_groups(void)
if (nr_processor_groups > 1 && SIZEOF_VOID_P == 4) {
if (HWLOC_SHOW_ALL_ERRORS())
fprintf(stderr, "hwloc: multiple processor groups found on 32bits Windows, topology may be invalid/incomplete.\n");
fprintf(stderr, "hwloc/windows: multiple processor groups found on 32bits Windows, topology may be invalid/incomplete.\n");
}
length = 0;
@ -987,7 +987,11 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
OSVERSIONINFOEX osvi;
char versionstr[20];
char hostname[122] = "";
unsigned hostname_size = sizeof(hostname);
#if !defined(__CYGWIN__)
DWORD hostname_size = sizeof(hostname);
#else
size_t hostname_size = sizeof(hostname);
#endif
int has_efficiencyclass = 0;
struct hwloc_win_efficiency_classes eclasses;
char *env = getenv("HWLOC_WINDOWS_PROCESSOR_GROUP_OBJS");
@ -1051,12 +1055,16 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
unsigned efficiency_class = 0;
GROUP_AFFINITY *GroupMask;
/* Ignore unknown caches */
if (procInfo->Relationship == RelationCache
&& procInfo->Cache.Type != CacheUnified
&& procInfo->Cache.Type != CacheData
&& procInfo->Cache.Type != CacheInstruction)
continue;
if (procInfo->Relationship == RelationCache) {
if (!topology->want_some_cpu_caches)
/* TODO: check if RelationAll&~RelationCache works? */
continue;
if (procInfo->Cache.Type != CacheUnified
&& procInfo->Cache.Type != CacheData
&& procInfo->Cache.Type != CacheInstruction)
/* Ignore unknown caches */
continue;
}
id = HWLOC_UNKNOWN_INDEX;
switch (procInfo->Relationship) {

View file

@ -1,11 +1,11 @@
/*
* Copyright © 2010-2022 Inria. All rights reserved.
* Copyright © 2010-2024 Inria. All rights reserved.
* Copyright © 2010-2013 Université Bordeaux
* Copyright © 2010-2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
*
*
* This backend is only used when the operating system does not export
* This backend is mostly used when the operating system does not export
* the necessary hardware topology information to user-space applications.
* Currently, FreeBSD and NetBSD only add PUs and then fallback to this
* backend for CPU/Cache discovery.
@ -15,6 +15,7 @@
* on various architectures, without having to use this x86-specific code.
* But this backend is still used after them to annotate some objects with
* additional details (CPU info in Package, Inclusiveness in Caches).
* It may also be enabled manually to work-around bugs in native OS discovery.
*/
#include "private/autogen/config.h"
@ -38,6 +39,12 @@ struct hwloc_x86_backend_data_s {
int apicid_unique;
char *src_cpuiddump_path;
int is_knl;
int is_hybrid;
int found_die_ids;
int found_complex_ids;
int found_unit_ids;
int found_module_ids;
int found_tile_ids;
};
/************************************
@ -80,7 +87,7 @@ cpuiddump_read(const char *dirpath, unsigned idx)
cpuiddump = malloc(sizeof(*cpuiddump));
if (!cpuiddump) {
fprintf(stderr, "Failed to allocate cpuiddump for PU #%u, ignoring cpuiddump.\n", idx);
fprintf(stderr, "hwloc/x86: Failed to allocate cpuiddump for PU #%u, ignoring cpuiddump.\n", idx);
goto out;
}
@ -91,7 +98,7 @@ cpuiddump_read(const char *dirpath, unsigned idx)
snprintf(filename, filenamelen, "%s/pu%u", dirpath, idx);
file = fopen(filename, "r");
if (!file) {
fprintf(stderr, "Could not read dumped cpuid file %s, ignoring cpuiddump.\n", filename);
fprintf(stderr, "hwloc/x86: Could not read dumped cpuid file %s, ignoring cpuiddump.\n", filename);
goto out_with_filename;
}
@ -100,7 +107,7 @@ cpuiddump_read(const char *dirpath, unsigned idx)
nr++;
cpuiddump->entries = malloc(nr * sizeof(struct cpuiddump_entry));
if (!cpuiddump->entries) {
fprintf(stderr, "Failed to allocate %u cpuiddump entries for PU #%u, ignoring cpuiddump.\n", nr, idx);
fprintf(stderr, "hwloc/x86: Failed to allocate %u cpuiddump entries for PU #%u, ignoring cpuiddump.\n", nr, idx);
goto out_with_file;
}
@ -156,7 +163,7 @@ cpuiddump_find_by_input(unsigned *eax, unsigned *ebx, unsigned *ecx, unsigned *e
return;
}
fprintf(stderr, "Couldn't find %x,%x,%x,%x in dumped cpuid, returning 0s.\n",
fprintf(stderr, "hwloc/x86: Couldn't find %x,%x,%x,%x in dumped cpuid, returning 0s.\n",
*eax, *ebx, *ecx, *edx);
*eax = 0;
*ebx = 0;
@ -210,7 +217,8 @@ struct procinfo {
#define TILE 4
#define MODULE 5
#define DIE 6
#define HWLOC_X86_PROCINFO_ID_NR 7
#define COMPLEX 7
#define HWLOC_X86_PROCINFO_ID_NR 8
unsigned ids[HWLOC_X86_PROCINFO_ID_NR];
unsigned *otherids;
unsigned levels;
@ -314,7 +322,7 @@ static void read_amd_caches_topoext(struct procinfo *infos, struct cpuiddump *sr
/* the code below doesn't want any other cache yet */
assert(!infos->numcaches);
for (cachenum = 0; ; cachenum++) {
for (cachenum = 0; cachenum<16 /* guard */; cachenum++) {
eax = 0x8000001d;
ecx = cachenum;
cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
@ -325,7 +333,7 @@ static void read_amd_caches_topoext(struct procinfo *infos, struct cpuiddump *sr
cache = infos->cache = malloc(infos->numcaches * sizeof(*infos->cache));
if (cache) {
for (cachenum = 0; ; cachenum++) {
for (cachenum = 0; cachenum<16 /* guard */; cachenum++) {
unsigned long linesize, linepart, ways, sets;
eax = 0x8000001d;
ecx = cachenum;
@ -378,7 +386,7 @@ static void read_intel_caches(struct hwloc_x86_backend_data_s *data, struct proc
unsigned cachenum;
struct cacheinfo *cache;
for (cachenum = 0; ; cachenum++) {
for (cachenum = 0; cachenum<16 /* guard */; cachenum++) {
eax = 0x04;
ecx = cachenum;
cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
@ -400,7 +408,7 @@ static void read_intel_caches(struct hwloc_x86_backend_data_s *data, struct proc
infos->cache = tmpcaches;
cache = &infos->cache[oldnumcaches];
for (cachenum = 0; ; cachenum++) {
for (cachenum = 0; cachenum<16 /* guard */; cachenum++) {
unsigned long linesize, linepart, ways, sets;
eax = 0x04;
ecx = cachenum;
@ -480,7 +488,7 @@ static void read_amd_cores_legacy(struct procinfo *infos, struct cpuiddump *src_
}
/* AMD unit/node from CPUID 0x8000001e leaf (topoext) */
static void read_amd_cores_topoext(struct procinfo *infos, unsigned long flags, struct cpuiddump *src_cpuiddump)
static void read_amd_cores_topoext(struct hwloc_x86_backend_data_s *data, struct procinfo *infos, unsigned long flags __hwloc_attribute_unused, struct cpuiddump *src_cpuiddump)
{
unsigned apic_id, nodes_per_proc = 0;
unsigned eax, ebx, ecx, edx;
@ -489,7 +497,6 @@ static void read_amd_cores_topoext(struct procinfo *infos, unsigned long flags,
cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
infos->apicid = apic_id = eax;
if (flags & HWLOC_X86_DISC_FLAG_TOPOEXT_NUMANODES) {
if (infos->cpufamilynumber == 0x16) {
/* ecx is reserved */
infos->ids[NODE] = 0;
@ -504,12 +511,12 @@ static void read_amd_cores_topoext(struct procinfo *infos, unsigned long flags,
|| (infos->cpufamilynumber == 0x19 && nodes_per_proc > 1)) {
hwloc_debug("warning: undefined nodes_per_proc value %u, assuming it means %u\n", nodes_per_proc, nodes_per_proc);
}
}
if (infos->cpufamilynumber <= 0x16) { /* topoext appeared in 0x15 and compute-units were only used in 0x15 and 0x16 */
unsigned cores_per_unit;
/* coreid was obtained from read_amd_cores_legacy() earlier */
infos->ids[UNIT] = ebx & 0xff;
data->found_unit_ids = 1;
cores_per_unit = ((ebx >> 8) & 0xff) + 1;
hwloc_debug("topoext %08x, %u nodes, node %u, %u cores in unit %u\n", apic_id, nodes_per_proc, infos->ids[NODE], cores_per_unit, infos->ids[UNIT]);
/* coreid and unitid are package-wide (core 0-15 and unit 0-7 on 16-core 2-NUMAnode processor).
@ -524,19 +531,29 @@ static void read_amd_cores_topoext(struct procinfo *infos, unsigned long flags,
}
}
/* Intel core/thread or even die/module/tile from CPUID 0x0b or 0x1f leaves (v1 and v2 extended topology enumeration) */
static void read_intel_cores_exttopoenum(struct procinfo *infos, unsigned leaf, struct cpuiddump *src_cpuiddump)
/* Intel core/thread or even die/module/tile from CPUID 0x0b or 0x1f leaves (v1 and v2 extended topology enumeration)
* or AMD core/thread or even complex/ccd from CPUID 0x0b or 0x80000026 (extended CPU topology)
*/
static void read_extended_topo(struct hwloc_x86_backend_data_s *data, struct procinfo *infos, unsigned leaf, enum cpuid_type cpuid_type __hwloc_attribute_unused, struct cpuiddump *src_cpuiddump)
{
unsigned level, apic_nextshift, apic_number, apic_type, apic_id = 0, apic_shift = 0, id;
unsigned level, apic_nextshift, apic_type, apic_id = 0, apic_shift = 0, id;
unsigned threadid __hwloc_attribute_unused = 0; /* shut-up compiler */
unsigned eax, ebx, ecx = 0, edx;
int apic_packageshift = 0;
for (level = 0; ; level++) {
for (level = 0; level<32 /* guard */; level++) {
ecx = level;
eax = leaf;
cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
if (!eax && !ebx)
/* Intel specifies that the 0x0b/0x1f loop should stop when we get "invalid domain" (0 in ecx[8:15])
* (if so, we also get 0 in eax/ebx for invalid subleaves). Zhaoxin implements this too.
* However AMD rather says that the 0x80000026/0x0b loop should stop when we get "no thread at this level" (0 in ebx[0:15]).
*
* Linux kernel <= 6.8 used "invalid domain" for both Intel and AMD (in detect_extended_topology())
* but x86 discovery revamp in 6.9 now properly checks both Intel and AMD conditions (in topo_subleaf()).
* So let's assume we are allowed to break-out once one of the Intel+AMD conditions is met.
*/
if (!(ebx & 0xffff) || !(ecx & 0xff00))
break;
apic_packageshift = eax & 0x1f;
}
@ -545,47 +562,68 @@ static void read_intel_cores_exttopoenum(struct procinfo *infos, unsigned leaf,
infos->otherids = malloc(level * sizeof(*infos->otherids));
if (infos->otherids) {
infos->levels = level;
for (level = 0; ; level++) {
for (level = 0; level<32 /* guard */; level++) {
ecx = level;
eax = leaf;
cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
if (!eax && !ebx)
break;
if (!(ebx & 0xffff) || !(ecx & 0xff00))
break;
apic_nextshift = eax & 0x1f;
apic_number = ebx & 0xffff;
apic_type = (ecx & 0xff00) >> 8;
apic_id = edx;
id = (apic_id >> apic_shift) & ((1 << (apic_packageshift - apic_shift)) - 1);
hwloc_debug("x2APIC %08x %u: nextshift %u num %2u type %u id %2u\n", apic_id, level, apic_nextshift, apic_number, apic_type, id);
hwloc_debug("x2APIC %08x %u: nextshift %u nextnumber %2u type %u id %2u\n",
apic_id,
level,
apic_nextshift,
ebx & 0xffff /* number of threads in next level */,
apic_type,
id);
infos->apicid = apic_id;
infos->otherids[level] = UINT_MAX;
switch (apic_type) {
case 1:
threadid = id;
/* apic_number is the actual number of threads per core */
break;
case 2:
infos->ids[CORE] = id;
/* apic_number is the actual number of threads per die */
break;
case 3:
infos->ids[MODULE] = id;
/* apic_number is the actual number of threads per tile */
break;
case 4:
infos->ids[TILE] = id;
/* apic_number is the actual number of threads per die */
break;
case 5:
infos->ids[DIE] = id;
/* apic_number is the actual number of threads per package */
break;
default:
hwloc_debug("x2APIC %u: unknown type %u\n", level, apic_type);
infos->otherids[level] = apic_id >> apic_shift;
break;
}
apic_shift = apic_nextshift;
switch (apic_type) {
case 1:
threadid = id;
break;
case 2:
infos->ids[CORE] = id;
break;
case 3:
if (leaf == 0x80000026) {
data->found_complex_ids = 1;
infos->ids[COMPLEX] = id;
} else {
data->found_module_ids = 1;
infos->ids[MODULE] = id;
}
break;
case 4:
if (leaf == 0x80000026) {
data->found_die_ids = 1;
infos->ids[DIE] = id;
} else {
data->found_tile_ids = 1;
infos->ids[TILE] = id;
}
break;
case 5:
if (leaf == 0x80000026) {
goto unknown_type;
} else {
data->found_die_ids = 1;
infos->ids[DIE] = id;
}
break;
case 6:
/* TODO: "DieGrp" on Intel */
/* fallthrough */
default:
unknown_type:
hwloc_debug("x2APIC %u: unknown type %u\n", level, apic_type);
infos->otherids[level] = apic_id >> apic_shift;
break;
}
apic_shift = apic_nextshift;
}
infos->apicid = apic_id;
infos->ids[PKG] = apic_id >> apic_shift;
@ -704,12 +742,13 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
}
if (highest_cpuid >= 0x1a && has_hybrid(features)) {
/* Get hybrid cpu information from cpuid 0x1a */
/* Get hybrid cpu information from cpuid 0x1a on Intel */
eax = 0x1a;
ecx = 0;
cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
infos->hybridcoretype = eax >> 24;
infos->hybridnativemodel = eax & 0xffffff;
data->is_hybrid = 1;
}
/*********************************************************************************
@ -731,23 +770,30 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
*
* Only needed when x2apic supported if NUMA nodes are needed.
*/
read_amd_cores_topoext(infos, flags, src_cpuiddump);
read_amd_cores_topoext(data, infos, flags, src_cpuiddump);
}
if ((cpuid_type == intel) && highest_cpuid >= 0x1f) {
if ((cpuid_type == amd) && highest_ext_cpuid >= 0x80000026) {
/* Get socket/die/complex/core/thread information from cpuid 0x80000026
* (AMD Extended CPU Topology)
*/
read_extended_topo(data, infos, 0x80000026, cpuid_type, src_cpuiddump);
} else if ((cpuid_type == intel || cpuid_type == zhaoxin) && highest_cpuid >= 0x1f) {
/* Get package/die/module/tile/core/thread information from cpuid 0x1f
* (Intel v2 Extended Topology Enumeration)
*/
read_intel_cores_exttopoenum(infos, 0x1f, src_cpuiddump);
read_extended_topo(data, infos, 0x1f, cpuid_type, src_cpuiddump);
} else if ((cpuid_type == intel || cpuid_type == amd || cpuid_type == zhaoxin)
&& highest_cpuid >= 0x0b && has_x2apic(features)) {
/* Get package/core/thread information from cpuid 0x0b
* (Intel v1 Extended Topology Enumeration)
*/
read_intel_cores_exttopoenum(infos, 0x0b, src_cpuiddump);
read_extended_topo(data, infos, 0x0b, cpuid_type, src_cpuiddump);
}
if (backend->topology->want_some_cpu_caches) {
/**************************************
* Get caches from CPU-specific leaves
*/
@ -845,6 +891,7 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
}
}
}
}
if (hwloc_bitmap_isset(data->apicid_set, infos->apicid))
data->apicid_unique = 0;
@ -1046,21 +1093,34 @@ static void summarize(struct hwloc_backend *backend, struct procinfo *infos, uns
if (hwloc_filter_check_keep_object_type(topology, HWLOC_OBJ_GROUP)) {
if (fulldiscovery) {
/* Look for AMD Compute units inside packages */
hwloc_bitmap_copy(remaining_cpuset, complete_cpuset);
hwloc_x86_add_groups(topology, infos, nbprocs, remaining_cpuset,
UNIT, "Compute Unit",
HWLOC_GROUP_KIND_AMD_COMPUTE_UNIT, 0);
/* Look for Intel Modules inside packages */
hwloc_bitmap_copy(remaining_cpuset, complete_cpuset);
hwloc_x86_add_groups(topology, infos, nbprocs, remaining_cpuset,
MODULE, "Module",
HWLOC_GROUP_KIND_INTEL_MODULE, 0);
/* Look for Intel Tiles inside packages */
hwloc_bitmap_copy(remaining_cpuset, complete_cpuset);
hwloc_x86_add_groups(topology, infos, nbprocs, remaining_cpuset,
TILE, "Tile",
HWLOC_GROUP_KIND_INTEL_TILE, 0);
if (data->found_unit_ids) {
/* Look for AMD Complex inside packages */
hwloc_bitmap_copy(remaining_cpuset, complete_cpuset);
hwloc_x86_add_groups(topology, infos, nbprocs, remaining_cpuset,
COMPLEX, "Complex",
HWLOC_GROUP_KIND_AMD_COMPLEX, 0);
}
if (data->found_unit_ids) {
/* Look for AMD Compute units inside packages */
hwloc_bitmap_copy(remaining_cpuset, complete_cpuset);
hwloc_x86_add_groups(topology, infos, nbprocs, remaining_cpuset,
UNIT, "Compute Unit",
HWLOC_GROUP_KIND_AMD_COMPUTE_UNIT, 0);
}
if (data->found_module_ids) {
/* Look for Intel Modules inside packages */
hwloc_bitmap_copy(remaining_cpuset, complete_cpuset);
hwloc_x86_add_groups(topology, infos, nbprocs, remaining_cpuset,
MODULE, "Module",
HWLOC_GROUP_KIND_INTEL_MODULE, 0);
}
if (data->found_tile_ids) {
/* Look for Intel Tiles inside packages */
hwloc_bitmap_copy(remaining_cpuset, complete_cpuset);
hwloc_x86_add_groups(topology, infos, nbprocs, remaining_cpuset,
TILE, "Tile",
HWLOC_GROUP_KIND_INTEL_TILE, 0);
}
/* Look for unknown objects */
if (infos[one].otherids) {
@ -1094,7 +1154,8 @@ static void summarize(struct hwloc_backend *backend, struct procinfo *infos, uns
}
}
if (hwloc_filter_check_keep_object_type(topology, HWLOC_OBJ_DIE)) {
if (data->found_die_ids
&& hwloc_filter_check_keep_object_type(topology, HWLOC_OBJ_DIE)) {
/* Look for Intel Dies inside packages */
if (fulldiscovery) {
hwloc_bitmap_t die_cpuset;
@ -1349,40 +1410,45 @@ look_procs(struct hwloc_backend *backend, struct procinfo *infos, unsigned long
if (data->apicid_unique) {
summarize(backend, infos, flags);
if (has_hybrid(features) && !(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS)) {
if (data->is_hybrid
&& !(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS)) {
/* use hybrid info for cpukinds */
hwloc_bitmap_t atomset = hwloc_bitmap_alloc();
hwloc_bitmap_t coreset = hwloc_bitmap_alloc();
for(i=0; i<nbprocs; i++) {
if (infos[i].hybridcoretype == 0x20)
hwloc_bitmap_set(atomset, i);
else if (infos[i].hybridcoretype == 0x40)
hwloc_bitmap_set(coreset, i);
}
/* register IntelAtom set if any */
if (!hwloc_bitmap_iszero(atomset)) {
struct hwloc_info_s infoattr;
infoattr.name = (char *) "CoreType";
infoattr.value = (char *) "IntelAtom";
hwloc_internal_cpukinds_register(topology, atomset, HWLOC_CPUKIND_EFFICIENCY_UNKNOWN, &infoattr, 1, 0);
/* the cpuset is given to the callee */
} else {
hwloc_bitmap_free(atomset);
}
/* register IntelCore set if any */
if (!hwloc_bitmap_iszero(coreset)) {
struct hwloc_info_s infoattr;
infoattr.name = (char *) "CoreType";
infoattr.value = (char *) "IntelCore";
hwloc_internal_cpukinds_register(topology, coreset, HWLOC_CPUKIND_EFFICIENCY_UNKNOWN, &infoattr, 1, 0);
/* the cpuset is given to the callee */
} else {
hwloc_bitmap_free(coreset);
if (cpuid_type == intel) {
/* Hybrid Intel */
hwloc_bitmap_t atomset = hwloc_bitmap_alloc();
hwloc_bitmap_t coreset = hwloc_bitmap_alloc();
for(i=0; i<nbprocs; i++) {
if (infos[i].hybridcoretype == 0x20)
hwloc_bitmap_set(atomset, i);
else if (infos[i].hybridcoretype == 0x40)
hwloc_bitmap_set(coreset, i);
}
/* register IntelAtom set if any */
if (!hwloc_bitmap_iszero(atomset)) {
struct hwloc_info_s infoattr;
infoattr.name = (char *) "CoreType";
infoattr.value = (char *) "IntelAtom";
hwloc_internal_cpukinds_register(topology, atomset, HWLOC_CPUKIND_EFFICIENCY_UNKNOWN, &infoattr, 1, 0);
/* the cpuset is given to the callee */
} else {
hwloc_bitmap_free(atomset);
}
/* register IntelCore set if any */
if (!hwloc_bitmap_iszero(coreset)) {
struct hwloc_info_s infoattr;
infoattr.name = (char *) "CoreType";
infoattr.value = (char *) "IntelCore";
hwloc_internal_cpukinds_register(topology, coreset, HWLOC_CPUKIND_EFFICIENCY_UNKNOWN, &infoattr, 1, 0);
/* the cpuset is given to the callee */
} else {
hwloc_bitmap_free(coreset);
}
}
}
} else {
hwloc_debug("x86 APIC IDs aren't unique, x86 discovery ignored.\n");
/* do nothing and return success, so that the caller does nothing either */
}
/* if !data->apicid_unique, do nothing and return success, so that the caller does nothing either */
return 0;
}
@ -1459,7 +1525,15 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
unsigned i;
unsigned highest_cpuid;
unsigned highest_ext_cpuid;
/* This stores cpuid features with the same indexing as Linux */
/* This stores cpuid features with the same indexing as Linux:
* [0] = 0x1 edx
* [1] = 0x80000001 edx
* [4] = 0x1 ecx
* [6] = 0x80000001 ecx
* [9] = 0x7/0 ebx
* [16] = 0x7/0 ecx
* [18] = 0x7/0 edx
*/
unsigned features[19] = { 0 };
struct procinfo *infos = NULL;
enum cpuid_type cpuid_type = unknown;
@ -1579,6 +1653,7 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
ecx = 0;
cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
features[9] = ebx;
features[16] = ecx;
features[18] = edx;
}
@ -1730,17 +1805,17 @@ hwloc_x86_check_cpuiddump_input(const char *src_cpuiddump_path, hwloc_bitmap_t s
sprintf(path, "%s/hwloc-cpuid-info", src_cpuiddump_path);
file = fopen(path, "r");
if (!file) {
fprintf(stderr, "Couldn't open dumped cpuid summary %s\n", path);
fprintf(stderr, "hwloc/x86: Couldn't open dumped cpuid summary %s\n", path);
goto out_with_path;
}
if (!fgets(line, sizeof(line), file)) {
fprintf(stderr, "Found read dumped cpuid summary in %s\n", path);
fprintf(stderr, "hwloc/x86: Found read dumped cpuid summary in %s\n", path);
fclose(file);
goto out_with_path;
}
fclose(file);
if (strcmp(line, "Architecture: x86\n")) {
fprintf(stderr, "Found non-x86 dumped cpuid summary in %s: %s\n", path, line);
if (strncmp(line, "Architecture: x86", 17)) {
fprintf(stderr, "hwloc/x86: Found non-x86 dumped cpuid summary in %s: %s\n", path, line);
goto out_with_path;
}
free(path);
@ -1752,19 +1827,19 @@ hwloc_x86_check_cpuiddump_input(const char *src_cpuiddump_path, hwloc_bitmap_t s
if (!*end)
hwloc_bitmap_set(set, idx);
else
fprintf(stderr, "Ignoring invalid dirent `%s' in dumped cpuid directory `%s'\n",
fprintf(stderr, "hwloc/x86: Ignoring invalid dirent `%s' in dumped cpuid directory `%s'\n",
dirent->d_name, src_cpuiddump_path);
}
}
closedir(dir);
if (hwloc_bitmap_iszero(set)) {
fprintf(stderr, "Did not find any valid pu%%u entry in dumped cpuid directory `%s'\n",
fprintf(stderr, "hwloc/x86: Did not find any valid pu%%u entry in dumped cpuid directory `%s'\n",
src_cpuiddump_path);
return -1;
} else if (hwloc_bitmap_last(set) != hwloc_bitmap_weight(set) - 1) {
/* The x86 backends enforces contigous set of PUs starting at 0 so far */
fprintf(stderr, "Found non-contigous pu%%u range in dumped cpuid directory `%s'\n",
fprintf(stderr, "hwloc/x86: Found non-contigous pu%%u range in dumped cpuid directory `%s'\n",
src_cpuiddump_path);
return -1;
}
@ -1816,9 +1891,15 @@ hwloc_x86_component_instantiate(struct hwloc_topology *topology,
/* default values */
data->is_knl = 0;
data->is_hybrid = 0;
data->apicid_set = hwloc_bitmap_alloc();
data->apicid_unique = 1;
data->src_cpuiddump_path = NULL;
data->found_die_ids = 0;
data->found_complex_ids = 0;
data->found_unit_ids = 0;
data->found_module_ids = 0;
data->found_tile_ids = 0;
src_cpuiddump_path = getenv("HWLOC_CPUID_PATH");
if (src_cpuiddump_path) {
@ -1829,7 +1910,7 @@ hwloc_x86_component_instantiate(struct hwloc_topology *topology,
assert(!hwloc_bitmap_iszero(set)); /* enforced by hwloc_x86_check_cpuiddump_input() */
data->nbprocs = hwloc_bitmap_weight(set);
} else {
fprintf(stderr, "Ignoring dumped cpuid directory.\n");
fprintf(stderr, "hwloc/x86: Ignoring dumped cpuid directory.\n");
}
hwloc_bitmap_free(set);
}

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2020 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2011 Université Bordeaux
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -41,7 +41,7 @@ typedef struct hwloc__nolibxml_import_state_data_s {
static char *
hwloc__nolibxml_import_ignore_spaces(char *buffer)
{
return buffer + strspn(buffer, " \t\n");
return buffer + strspn(buffer, " \t\n\r");
}
static int
@ -411,12 +411,12 @@ hwloc_nolibxml_backend_init(struct hwloc_xml_backend_data_s *bdata,
bdata->data = nbdata;
if (xmlbuffer) {
nbdata->buffer = malloc(xmlbuflen+1);
nbdata->buffer = malloc(xmlbuflen);
if (!nbdata->buffer)
goto out_with_nbdata;
nbdata->buflen = xmlbuflen+1;
nbdata->buflen = xmlbuflen;
memcpy(nbdata->buffer, xmlbuffer, xmlbuflen);
nbdata->buffer[xmlbuflen] = '\0';
nbdata->buffer[xmlbuflen-1] = '\0'; /* make sure it's there as requested in the API */
} else {
int err = hwloc_nolibxml_read_file(xmlpath, &nbdata->buffer, &nbdata->buflen);
@ -453,8 +453,9 @@ hwloc_nolibxml_import_diff(struct hwloc__xml_import_state_s *state,
buffer = malloc(xmlbuflen);
if (!buffer)
goto out;
memcpy(buffer, xmlbuffer, xmlbuflen);
buflen = xmlbuflen;
memcpy(buffer, xmlbuffer, xmlbuflen);
buffer[xmlbuflen-1] = '\0'; /* make sure it's there as requested in the API */
} else {
ret = hwloc_nolibxml_read_file(xmlpath, &buffer, &buflen);

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2022 Inria. All rights reserved.
* Copyright © 2009-2024 Inria. All rights reserved.
* Copyright © 2009-2011, 2020 Université Bordeaux
* Copyright © 2009-2018 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
@ -562,7 +562,13 @@ hwloc__xml_import_pagetype(hwloc_topology_t topology __hwloc_attribute_unused, s
char *attrname, *attrvalue;
if (state->global->next_attr(state, &attrname, &attrvalue) < 0)
break;
if (!strcmp(attrname, "size"))
if (!strcmp(attrname, "info")) {
char *infoname, *infovalue;
int ret = hwloc___xml_import_info(&infoname, &infovalue, state);
if (ret < 0)
return -1;
/* ignored */
} else if (!strcmp(attrname, "size"))
size = strtoull(attrvalue, NULL, 10);
else if (!strcmp(attrname, "count"))
count = strtoull(attrvalue, NULL, 10);
@ -866,6 +872,10 @@ hwloc__xml_import_object(hwloc_topology_t topology,
/* deal with possible future type */
obj->type = HWLOC_OBJ_GROUP;
obj->attr->group.kind = HWLOC_GROUP_KIND_INTEL_MODULE;
} else if (!strcasecmp(attrvalue, "Cluster")) {
/* deal with possible future type */
obj->type = HWLOC_OBJ_GROUP;
obj->attr->group.kind = HWLOC_GROUP_KIND_LINUX_CLUSTER;
} else if (!strcasecmp(attrvalue, "MemCache")) {
/* ignore possible future type */
obj->type = _HWLOC_OBJ_FUTURE;
@ -1160,6 +1170,48 @@ hwloc__xml_import_object(hwloc_topology_t topology,
data->last_numanode = obj;
}
/* 3.0 forward compatibility */
if (data->version_major >= 3 && obj->type == HWLOC_OBJ_OS_DEVICE) {
/* osdev.type changed into bitmak in 3.0 */
if (obj->attr->osdev.type & 3 /* STORAGE|MEMORY for BLOCK */) {
obj->attr->osdev.type = HWLOC_OBJ_OSDEV_BLOCK;
} else if (obj->attr->osdev.type & 8 /* COPROC for COPROC and rsmi/nvml GPUs */) {
if (obj->subtype && (!strcmp(obj->subtype, "RSMI") || !strcmp(obj->subtype, "NVML")))
obj->attr->osdev.type = HWLOC_OBJ_OSDEV_GPU;
else
obj->attr->osdev.type = HWLOC_OBJ_OSDEV_COPROC;
} else if (obj->attr->osdev.type & 4 /* GPU for non-COPROC GPUs */) {
obj->attr->osdev.type = HWLOC_OBJ_OSDEV_GPU;
} else if (obj->attr->osdev.type & 32 /* OFED */) {
obj->attr->osdev.type = HWLOC_OBJ_OSDEV_OPENFABRICS;
} else if (obj->attr->osdev.type & 16 /* NET for NET and BXI v2-fake-OFED */) {
if (obj->subtype && !strcmp(obj->subtype, "BXI"))
obj->attr->osdev.type = HWLOC_OBJ_OSDEV_OPENFABRICS;
else
obj->attr->osdev.type = HWLOC_OBJ_OSDEV_NETWORK;
} else if (obj->attr->osdev.type & 64 /* DMA */) {
obj->attr->osdev.type = HWLOC_OBJ_OSDEV_DMA;
} else { /* none or unknown */
obj->attr->osdev.type = (hwloc_obj_osdev_type_t) -1;
}
/* Backend info only in root */
if (obj->subtype && !hwloc_obj_get_info_by_name(obj, "Backend")) {
if (!strcmp(obj->subtype, "CUDA")) {
hwloc_obj_add_info(obj, "Backend", "CUDA");
} else if (!strcmp(obj->subtype, "NVML")) {
hwloc_obj_add_info(obj, "Backend", "NVML");
} else if (!strcmp(obj->subtype, "OpenCL")) {
hwloc_obj_add_info(obj, "Backend", "OpenCL");
} else if (!strcmp(obj->subtype, "RSMI")) {
hwloc_obj_add_info(obj, "Backend", "RSMI");
} else if (!strcmp(obj->subtype, "LevelZero")) {
hwloc_obj_add_info(obj, "Backend", "LevelZero");
} else if (!strcmp(obj->subtype, "Display")) {
hwloc_obj_add_info(obj, "Backend", "GL");
}
}
}
if (!hwloc_filter_check_keep_object(topology, obj)) {
/* Ignore this object instead of inserting it.
*
@ -1296,7 +1348,7 @@ hwloc__xml_v2import_support(hwloc_topology_t topology,
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_support) == 4*sizeof(void*));
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_discovery_support) == 6);
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_cpubind_support) == 11);
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_membind_support) == 15);
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_membind_support) == 16);
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_misc_support) == 1);
#endif
@ -1330,6 +1382,7 @@ hwloc__xml_v2import_support(hwloc_topology_t topology,
else DO(membind,firsttouch_membind);
else DO(membind,bind_membind);
else DO(membind,interleave_membind);
else DO(membind,weighted_interleave_membind);
else DO(membind,nexttouch_membind);
else DO(membind,migrate_membind);
else DO(membind,get_area_memlocation);
@ -1388,6 +1441,10 @@ hwloc__xml_v2import_distances(hwloc_topology_t topology,
}
else if (!strcmp(attrname, "kind")) {
kind = strtoul(attrvalue, NULL, 10);
/* forward compat with "HOPS" kind in v3 */
if (kind & (1UL<<5))
/* hops becomes latency */
kind = (kind & ~(1UL<<5)) | HWLOC_DISTANCES_KIND_MEANS_LATENCY;
}
else if (!strcmp(attrname, "name")) {
name = attrvalue;
@ -1433,7 +1490,14 @@ hwloc__xml_v2import_distances(hwloc_topology_t topology,
if (ret <= 0)
break;
if (!strcmp(tag, "indexes"))
if (!strcmp(tag, "info")) {
char *infoname, *infovalue;
ret = hwloc___xml_import_info(&infoname, &infovalue, state);
if (ret < 0)
goto out_with_arrays;
/* ignored */
continue;
} else if (!strcmp(tag, "indexes"))
is_index = 1;
else if (!strcmp(tag, "u64values"))
is_u64values = 1;
@ -1766,6 +1830,10 @@ hwloc__xml_import_memattr(hwloc_topology_t topology,
if (!strcmp(tag, "memattr_value")) {
ret = hwloc__xml_import_memattr_value(topology, id, flags, &childstate);
} else if (!strcmp(tag, "info")) {
char *infoname, *infovalue;
ret = hwloc___xml_import_info(&infoname, &infovalue, &childstate);
/* ignored */
} else {
if (hwloc__xml_verbose())
fprintf(stderr, "%s: memattr with unrecognized child %s\n",
@ -2094,9 +2162,10 @@ hwloc_look_xml(struct hwloc_backend *backend, struct hwloc_disc_status *dstatus)
if (ret < 0)
goto failed;
if (data->version_major > 2) {
if (data->version_major > 3
|| (data->version_major == 3 && data->version_minor > 0)) {
if (hwloc__xml_verbose())
fprintf(stderr, "%s: cannot import XML version %u.%u > 2\n",
fprintf(stderr, "%s: cannot import XML version %u.%u > 3.0\n",
data->msgprefix, data->version_major, data->version_minor);
goto err;
}
@ -2144,6 +2213,13 @@ hwloc_look_xml(struct hwloc_backend *backend, struct hwloc_disc_status *dstatus)
ret = hwloc__xml_import_cpukind(topology, &childstate);
if (ret < 0)
goto failed;
} else if (!strcmp(tag, "info")) {
char *infoname, *infovalue;
ret = hwloc___xml_import_info(&infoname, &infovalue, &childstate);
if (ret < 0)
goto failed;
/* move 3.x topology info back to the root object */
hwloc_obj_add_info(topology->levels[0][0], infoname, infovalue);
} else {
if (hwloc__xml_verbose())
fprintf(stderr, "%s: ignoring unknown tag `%s' after root object.\n",
@ -3020,7 +3096,7 @@ hwloc__xml_v2export_support(hwloc__xml_export_state_t parentstate, hwloc_topolog
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_support) == 4*sizeof(void*));
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_discovery_support) == 6);
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_cpubind_support) == 11);
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_membind_support) == 15);
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_membind_support) == 16);
HWLOC_BUILD_ASSERT(sizeof(struct hwloc_topology_misc_support) == 1);
#endif
@ -3065,6 +3141,7 @@ hwloc__xml_v2export_support(hwloc__xml_export_state_t parentstate, hwloc_topolog
DO(membind,firsttouch_membind);
DO(membind,bind_membind);
DO(membind,interleave_membind);
DO(membind,weighted_interleave_membind);
DO(membind,nexttouch_membind);
DO(membind,migrate_membind);
DO(membind,get_area_memlocation);

View file

@ -1,6 +1,6 @@
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2022 Inria. All rights reserved.
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2012, 2020 Université Bordeaux
* Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved.
* Copyright © 2022 IBM Corporation. All rights reserved.
@ -146,21 +146,24 @@ report_insert_error_format_obj(char *buf, size_t buflen, hwloc_obj_t obj)
char typestr[64];
char *cpusetstr;
char *nodesetstr = NULL;
char indexstr[64] = "";
char groupstr[64] = "";
hwloc_obj_type_snprintf(typestr, sizeof(typestr), obj, 0);
hwloc_bitmap_asprintf(&cpusetstr, obj->cpuset);
if (obj->os_index != HWLOC_UNKNOWN_INDEX)
snprintf(indexstr, sizeof(indexstr), "P#%u ", obj->os_index);
if (obj->type == HWLOC_OBJ_GROUP)
snprintf(groupstr, sizeof(groupstr), "groupkind %u-%u ", obj->attr->group.kind, obj->attr->group.subkind);
if (obj->nodeset) /* may be missing during insert */
hwloc_bitmap_asprintf(&nodesetstr, obj->nodeset);
if (obj->os_index != HWLOC_UNKNOWN_INDEX)
snprintf(buf, buflen, "%s (P#%u cpuset %s%s%s)",
typestr, obj->os_index, cpusetstr,
nodesetstr ? " nodeset " : "",
nodesetstr ? nodesetstr : "");
else
snprintf(buf, buflen, "%s (cpuset %s%s%s)",
typestr, cpusetstr,
nodesetstr ? " nodeset " : "",
nodesetstr ? nodesetstr : "");
snprintf(buf, buflen, "%s (%s%s%s%s%scpuset %s%s%s)",
typestr,
indexstr,
obj->subtype ? "subtype " : "", obj->subtype ? obj->subtype : "", obj->subtype ? " " : "",
groupstr,
cpusetstr,
nodesetstr ? " nodeset " : "", nodesetstr ? nodesetstr : "");
free(cpusetstr);
free(nodesetstr);
}
@ -178,8 +181,9 @@ static void report_insert_error(hwloc_obj_t new, hwloc_obj_t old, const char *ms
fprintf(stderr, "****************************************************************************\n");
fprintf(stderr, "* hwloc %s received invalid information from the operating system.\n", HWLOC_VERSION);
fprintf(stderr, "*\n");
fprintf(stderr, "* Failed with: %s\n", msg);
fprintf(stderr, "* while inserting %s at %s\n", newstr, oldstr);
fprintf(stderr, "* Failed with error: %s\n", msg);
fprintf(stderr, "* while inserting %s\n", newstr);
fprintf(stderr, "* at %s\n", oldstr);
fprintf(stderr, "* coming from: %s\n", reason);
fprintf(stderr, "*\n");
fprintf(stderr, "* The following FAQ entry in the hwloc documentation may help:\n");
@ -461,6 +465,20 @@ hwloc_debug_print_objects(int indent __hwloc_attribute_unused, hwloc_obj_t obj)
#define hwloc_debug_print_objects(indent, obj) do { /* nothing */ } while (0)
#endif /* !HWLOC_DEBUG */
int hwloc_obj_set_subtype(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj, const char *subtype)
{
char *new = NULL;
if (subtype) {
new = strdup(subtype);
if (!new)
return -1;
}
if (obj->subtype)
free(obj->subtype);
obj->subtype = new;
return 0;
}
void hwloc__free_infos(struct hwloc_info_s *infos, unsigned count)
{
unsigned i;
@ -679,7 +697,8 @@ unlink_and_free_object_and_children(hwloc_obj_t *pobj)
void
hwloc_free_object_and_children(hwloc_obj_t obj)
{
unlink_and_free_object_and_children(&obj);
if (obj)
unlink_and_free_object_and_children(&obj);
}
/* Free an object, its next siblings and their children without unlinking from parent.
@ -1925,6 +1944,22 @@ hwloc_topology_alloc_group_object(struct hwloc_topology *topology)
return hwloc_alloc_setup_object(topology, HWLOC_OBJ_GROUP, HWLOC_UNKNOWN_INDEX);
}
int
hwloc_topology_free_group_object(struct hwloc_topology *topology, hwloc_obj_t obj)
{
if (!topology->is_loaded) {
/* this could actually work, see insert() below */
errno = EINVAL;
return -1;
}
if (topology->adopted_shmem_addr) {
errno = EPERM;
return -1;
}
hwloc_free_unlinked_object(obj);
return 0;
}
static void hwloc_propagate_symmetric_subtree(hwloc_topology_t topology, hwloc_obj_t root);
static void propagate_total_memory(hwloc_obj_t obj);
static void hwloc_set_group_depth(hwloc_topology_t topology);
@ -1935,7 +1970,7 @@ static int hwloc_connect_special_levels(hwloc_topology_t topology);
hwloc_obj_t
hwloc_topology_insert_group_object(struct hwloc_topology *topology, hwloc_obj_t obj)
{
hwloc_obj_t res, root;
hwloc_obj_t res, root, child;
int cmp;
if (!topology->is_loaded) {
@ -1945,6 +1980,7 @@ hwloc_topology_insert_group_object(struct hwloc_topology *topology, hwloc_obj_t
return NULL;
}
if (topology->adopted_shmem_addr) {
hwloc_free_unlinked_object(obj);
errno = EPERM;
return NULL;
}
@ -1998,6 +2034,7 @@ hwloc_topology_insert_group_object(struct hwloc_topology *topology, hwloc_obj_t
res = hwloc__insert_object_by_cpuset(topology, NULL, obj, NULL /* do not show errors on stdout */);
} else {
/* just merge root */
hwloc_free_unlinked_object(obj);
res = root;
}
@ -2024,6 +2061,13 @@ hwloc_topology_insert_group_object(struct hwloc_topology *topology, hwloc_obj_t
if (hwloc_topology_reconnect(topology, 0) < 0)
return NULL;
/* Compute group total_memory. */
res->total_memory = 0;
for_each_child(child, res)
res->total_memory += child->total_memory;
for_each_memory_child(child, res)
res->total_memory += child->total_memory;
hwloc_propagate_symmetric_subtree(topology, topology->levels[0][0]);
hwloc_set_group_depth(topology);
@ -2254,11 +2298,13 @@ fixup_sets(hwloc_obj_t obj)
int
hwloc_obj_add_other_obj_sets(hwloc_obj_t dst, hwloc_obj_t src)
{
#define ADD_OTHER_OBJ_SET(_dst, _src, _set) \
if ((_src)->_set) { \
if (!(_dst)->_set) \
(_dst)->_set = hwloc_bitmap_alloc(); \
hwloc_bitmap_or((_dst)->_set, (_dst)->_set, (_src)->_set); \
#define ADD_OTHER_OBJ_SET(_dst, _src, _set) \
if ((_src)->_set) { \
if (!(_dst)->_set) \
(_dst)->_set = hwloc_bitmap_alloc(); \
if (!(_dst)->_set \
|| hwloc_bitmap_or((_dst)->_set, (_dst)->_set, (_src)->_set) < 0) \
return -1; \
}
ADD_OTHER_OBJ_SET(dst, src, cpuset);
ADD_OTHER_OBJ_SET(dst, src, complete_cpuset);
@ -3730,6 +3776,7 @@ hwloc__topology_init (struct hwloc_topology **topologyp,
hwloc__topology_filter_init(topology);
/* always initialize since we don't know flags to disable those yet */
hwloc_internal_distances_init(topology);
hwloc_internal_memattrs_init(topology);
hwloc_internal_cpukinds_init(topology);
@ -3942,8 +3989,12 @@ int
hwloc_topology_set_cache_types_filter(hwloc_topology_t topology, enum hwloc_type_filter_e filter)
{
unsigned i;
for(i=HWLOC_OBJ_L1CACHE; i<HWLOC_OBJ_L3ICACHE; i++)
hwloc_topology_set_type_filter(topology, (hwloc_obj_type_t) i, filter);
if (topology->is_loaded) {
errno = EBUSY;
return -1;
}
for(i=HWLOC_OBJ_L1CACHE; i<=HWLOC_OBJ_L3ICACHE; i++)
hwloc__topology_set_type_filter(topology, (hwloc_obj_type_t) i, filter);
return 0;
}
@ -3951,17 +4002,25 @@ int
hwloc_topology_set_icache_types_filter(hwloc_topology_t topology, enum hwloc_type_filter_e filter)
{
unsigned i;
for(i=HWLOC_OBJ_L1ICACHE; i<HWLOC_OBJ_L3ICACHE; i++)
hwloc_topology_set_type_filter(topology, (hwloc_obj_type_t) i, filter);
if (topology->is_loaded) {
errno = EBUSY;
return -1;
}
for(i=HWLOC_OBJ_L1ICACHE; i<=HWLOC_OBJ_L3ICACHE; i++)
hwloc__topology_set_type_filter(topology, (hwloc_obj_type_t) i, filter);
return 0;
}
int
hwloc_topology_set_io_types_filter(hwloc_topology_t topology, enum hwloc_type_filter_e filter)
{
hwloc_topology_set_type_filter(topology, HWLOC_OBJ_BRIDGE, filter);
hwloc_topology_set_type_filter(topology, HWLOC_OBJ_PCI_DEVICE, filter);
hwloc_topology_set_type_filter(topology, HWLOC_OBJ_OS_DEVICE, filter);
if (topology->is_loaded) {
errno = EBUSY;
return -1;
}
hwloc__topology_set_type_filter(topology, HWLOC_OBJ_BRIDGE, filter);
hwloc__topology_set_type_filter(topology, HWLOC_OBJ_PCI_DEVICE, filter);
hwloc__topology_set_type_filter(topology, HWLOC_OBJ_OS_DEVICE, filter);
return 0;
}
@ -3982,9 +4041,12 @@ hwloc_topology_clear (struct hwloc_topology *topology)
{
/* no need to set to NULL after free() since callers will call setup_defaults() or just destroy the rest of the topology */
unsigned l;
/* always destroy cpukinds/distances/memattrs since there are always initialized during init() */
hwloc_internal_cpukinds_destroy(topology);
hwloc_internal_distances_destroy(topology);
hwloc_internal_memattrs_destroy(topology);
hwloc_free_object_and_children(topology->levels[0][0]);
hwloc_bitmap_free(topology->allowed_cpuset);
hwloc_bitmap_free(topology->allowed_nodeset);
@ -4024,6 +4086,7 @@ hwloc_topology_load (struct hwloc_topology *topology)
{
struct hwloc_disc_status dstatus;
const char *env;
unsigned i;
int err;
if (topology->is_loaded) {
@ -4032,8 +4095,18 @@ hwloc_topology_load (struct hwloc_topology *topology)
}
/* initialize envvar-related things */
hwloc_internal_distances_prepare(topology);
hwloc_internal_memattrs_prepare(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_DISTANCES))
hwloc_internal_distances_prepare(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_MEMATTRS))
hwloc_internal_memattrs_prepare(topology);
/* check if any cpu cache filter is not NONE */
topology->want_some_cpu_caches = 0;
for(i=HWLOC_OBJ_L1CACHE; i<=HWLOC_OBJ_L3ICACHE; i++)
if (topology->type_filter[i] != HWLOC_TYPE_FILTER_KEEP_NONE) {
topology->want_some_cpu_caches = 1;
break;
}
if (getenv("HWLOC_XML_USERDATA_NOT_DECODED"))
topology->userdata_not_decoded = 1;
@ -4110,23 +4183,32 @@ hwloc_topology_load (struct hwloc_topology *topology)
#endif
hwloc_topology_check(topology);
/* Rank cpukinds */
hwloc_internal_cpukinds_rank(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS)) {
/* Rank cpukinds */
hwloc_internal_cpukinds_rank(topology);
}
/* Mark distances objs arrays as invalid since we may have removed objects
* from the topology after adding the distances (remove_empty, etc).
* It would be hard to actually verify whether it's needed.
*/
hwloc_internal_distances_invalidate_cached_objs(topology);
/* And refresh distances so that multithreaded concurrent distances_get()
* don't refresh() concurrently (disallowed).
*/
hwloc_internal_distances_refresh(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_DISTANCES)) {
/* Mark distances objs arrays as invalid since we may have removed objects
* from the topology after adding the distances (remove_empty, etc).
* It would be hard to actually verify whether it's needed.
*/
hwloc_internal_distances_invalidate_cached_objs(topology);
/* And refresh distances so that multithreaded concurrent distances_get()
* don't refresh() concurrently (disallowed).
*/
hwloc_internal_distances_refresh(topology);
}
/* Same for memattrs */
hwloc_internal_memattrs_need_refresh(topology);
hwloc_internal_memattrs_refresh(topology);
hwloc_internal_memattrs_guess_memory_tiers(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_MEMATTRS)) {
int force_memtiers = (getenv("HWLOC_MEMTIERS_REFRESH") != NULL);
/* Same for memattrs */
hwloc_internal_memattrs_need_refresh(topology);
hwloc_internal_memattrs_refresh(topology);
/* update memtiers unless XML */
if (force_memtiers || strcmp(topology->backends->component->name, "xml"))
hwloc_internal_memattrs_guess_memory_tiers(topology, force_memtiers);
}
topology->is_loaded = 1;
@ -4185,20 +4267,11 @@ restrict_object_by_cpuset(hwloc_topology_t topology, unsigned long flags, hwloc_
hwloc_bitmap_andnot(obj->cpuset, obj->cpuset, droppedcpuset);
hwloc_bitmap_andnot(obj->complete_cpuset, obj->complete_cpuset, droppedcpuset);
modified = 1;
} else {
if ((flags & HWLOC_RESTRICT_FLAG_REMOVE_CPULESS)
&& hwloc_bitmap_iszero(obj->complete_cpuset)) {
/* we're empty, there's a NUMAnode below us, it'll be removed this time */
modified = 1;
}
/* nodeset cannot intersect unless cpuset intersects or is empty */
if (droppednodeset)
assert(!hwloc_bitmap_intersects(obj->complete_nodeset, droppednodeset)
|| hwloc_bitmap_iszero(obj->complete_cpuset));
}
if (droppednodeset) {
if (droppednodeset && hwloc_bitmap_intersects(obj->complete_nodeset, droppednodeset)) {
hwloc_bitmap_andnot(obj->nodeset, obj->nodeset, droppednodeset);
hwloc_bitmap_andnot(obj->complete_nodeset, obj->complete_nodeset, droppednodeset);
modified = 1;
}
if (modified) {
@ -4251,20 +4324,11 @@ restrict_object_by_nodeset(hwloc_topology_t topology, unsigned long flags, hwloc
hwloc_bitmap_andnot(obj->nodeset, obj->nodeset, droppednodeset);
hwloc_bitmap_andnot(obj->complete_nodeset, obj->complete_nodeset, droppednodeset);
modified = 1;
} else {
if ((flags & HWLOC_RESTRICT_FLAG_REMOVE_MEMLESS)
&& hwloc_bitmap_iszero(obj->complete_nodeset)) {
/* we're empty, there's a PU below us, it'll be removed this time */
modified = 1;
}
/* cpuset cannot intersect unless nodeset intersects or is empty */
if (droppedcpuset)
assert(!hwloc_bitmap_intersects(obj->complete_cpuset, droppedcpuset)
|| hwloc_bitmap_iszero(obj->complete_nodeset));
}
if (droppedcpuset) {
if (droppedcpuset && hwloc_bitmap_intersects(obj->complete_cpuset, droppedcpuset)) {
hwloc_bitmap_andnot(obj->cpuset, obj->cpuset, droppedcpuset);
hwloc_bitmap_andnot(obj->complete_cpuset, obj->complete_cpuset, droppedcpuset);
modified = 1;
}
if (modified) {
@ -4433,13 +4497,18 @@ hwloc_topology_restrict(struct hwloc_topology *topology, hwloc_const_bitmap_t se
if (hwloc_filter_levels_keep_structure(topology) < 0) /* takes care of reconnecting internally */
goto out;
/* some objects may have disappeared, we need to update distances objs arrays */
hwloc_internal_distances_invalidate_cached_objs(topology);
hwloc_internal_memattrs_need_refresh(topology);
/* some objects may have disappeared and sets were modified,
* we need to update distances, etc */
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_DISTANCES))
hwloc_internal_distances_invalidate_cached_objs(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_MEMATTRS))
hwloc_internal_memattrs_need_refresh(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS))
hwloc_internal_cpukinds_restrict(topology);
hwloc_propagate_symmetric_subtree(topology, topology->levels[0][0]);
propagate_total_memory(topology->levels[0][0]);
hwloc_internal_cpukinds_restrict(topology);
#ifndef HWLOC_DEBUG
if (getenv("HWLOC_DEBUG_CHECK"))
@ -4527,9 +4596,12 @@ hwloc_topology_allow(struct hwloc_topology *topology,
int
hwloc_topology_refresh(struct hwloc_topology *topology)
{
hwloc_internal_cpukinds_rank(topology);
hwloc_internal_distances_refresh(topology);
hwloc_internal_memattrs_refresh(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS))
hwloc_internal_cpukinds_rank(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_DISTANCES))
hwloc_internal_distances_refresh(topology);
if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_MEMATTRS))
hwloc_internal_memattrs_refresh(topology);
return 0;
}
@ -5081,6 +5153,9 @@ hwloc_topology_check(struct hwloc_topology *topology)
for(i=HWLOC_OBJ_TYPE_MIN; i<HWLOC_OBJ_TYPE_MAX; i++)
assert(obj_type_order[obj_order_type[i]] == i);
if (!topology->is_loaded)
return;
depth = hwloc_topology_get_depth(topology);
assert(!topology->modified);

View file

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.1)
cmake_minimum_required(VERSION 3.5)
project (ethash C)
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} -Os")

View file

@ -6,8 +6,8 @@
* Copyright 2016 Jay D Dee <jayddee246@gmail.com>
* Copyright 2017-2018 XMR-Stak <https://github.com/fireice-uk>, <https://github.com/psychocrypt>
* Copyright 2018 Lee Clagett <https://github.com/vtnerd>
* Copyright 2018-2020 SChernykh <https://github.com/SChernykh>
* Copyright 2016-2020 XMRig <https://github.com/xmrig>, <support@xmrig.com>
* Copyright 2018-2024 SChernykh <https://github.com/SChernykh>
* Copyright 2016-2024 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@ -23,7 +23,6 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <cstdlib>
#include <uv.h>
@ -61,13 +60,13 @@ int xmrig::App::exec()
return 2;
}
m_signals = std::make_shared<Signals>(this);
int rc = 0;
if (background(rc)) {
return rc;
}
m_signals = std::make_shared<Signals>(this);
rc = m_controller->init();
if (rc != 0) {
return rc;

View file

@ -5,8 +5,8 @@
* Copyright 2014-2016 Wolf9466 <https://github.com/OhGodAPet>
* Copyright 2016 Jay D Dee <jayddee246@gmail.com>
* Copyright 2017-2018 XMR-Stak <https://github.com/fireice-uk>, <https://github.com/psychocrypt>
* Copyright 2018-2020 SChernykh <https://github.com/SChernykh>
* Copyright 2016-2020 XMRig <https://github.com/xmrig>, <support@xmrig.com>
* Copyright 2018-2024 SChernykh <https://github.com/SChernykh>
* Copyright 2016-2024 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@ -22,7 +22,6 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <cstdlib>
#include <csignal>
#include <cerrno>
@ -53,16 +52,9 @@ bool xmrig::App::background(int &rc)
return true;
}
i = setsid();
if (i < 0) {
if (setsid() < 0) {
LOG_ERR("setsid() failed (errno = %d)", errno);
}
i = chdir("/");
if (i < 0) {
LOG_ERR("chdir() failed (errno = %d)", errno);
}
return false;
}

View file

@ -30,10 +30,10 @@
#include "base/tools/Handle.h"
inline static const char *format(double h, char *buf, size_t size)
inline static const char *format(std::pair<bool, double> h, char *buf, size_t size)
{
if (std::isnormal(h)) {
snprintf(buf, size, (h < 100.0) ? "%04.2f" : "%03.1f", h);
if (h.first) {
snprintf(buf, size, (h.second < 100.0) ? "%04.2f" : "%03.1f", h.second);
return buf;
}
@ -80,15 +80,16 @@ double xmrig::Hashrate::average() const
}
const char *xmrig::Hashrate::format(double h, char *buf, size_t size)
const char *xmrig::Hashrate::format(std::pair<bool, double> h, char *buf, size_t size)
{
return ::format(h, buf, size);
}
rapidjson::Value xmrig::Hashrate::normalize(double d)
rapidjson::Value xmrig::Hashrate::normalize(std::pair<bool, double> d)
{
return Json::normalize(d, false);
using namespace rapidjson;
return d.first ? Value(floor(d.second * 100.0) / 100.0) : Value(kNullType);
}
@ -122,11 +123,11 @@ rapidjson::Value xmrig::Hashrate::toJSON(size_t threadId, rapidjson::Document &d
#endif
double xmrig::Hashrate::hashrate(size_t index, size_t ms) const
std::pair<bool, double> xmrig::Hashrate::hashrate(size_t index, size_t ms) const
{
assert(index < m_threads);
if (index >= m_threads) {
return nan("");
return { false, 0.0 };
}
uint64_t earliestHashCount = 0;
@ -157,17 +158,27 @@ double xmrig::Hashrate::hashrate(size_t index, size_t ms) const
} while (idx != idx_start);
if (!haveFullSet || earliestStamp == 0 || lastestStamp == 0) {
return nan("");
return { false, 0.0 };
}
if (lastestStamp - earliestStamp == 0) {
return nan("");
if (lastestHashCnt == earliestHashCount) {
return { true, 0.0 };
}
if (lastestStamp == earliestStamp) {
return { false, 0.0 };
}
const auto hashes = static_cast<double>(lastestHashCnt - earliestHashCount);
const auto time = static_cast<double>(lastestStamp - earliestStamp) / 1000.0;
const auto time = static_cast<double>(lastestStamp - earliestStamp);
return hashes / time;
const auto hr = hashes * 1000.0 / time;
if (!std::isnormal(hr)) {
return { false, 0.0 };
}
return { true, hr };
}

View file

@ -47,16 +47,16 @@ public:
Hashrate(size_t threads);
~Hashrate();
inline double calc(size_t ms) const { const double data = hashrate(0U, ms); return std::isnormal(data) ? data : 0.0; }
inline double calc(size_t threadId, size_t ms) const { return hashrate(threadId + 1, ms); }
inline std::pair<bool, double> calc(size_t ms) const { return hashrate(0U, ms); }
inline std::pair<bool, double> calc(size_t threadId, size_t ms) const { return hashrate(threadId + 1, ms); }
inline size_t threads() const { return m_threads > 0U ? m_threads - 1U : 0U; }
inline void add(size_t threadId, uint64_t count, uint64_t timestamp) { addData(threadId + 1U, count, timestamp); }
inline void add(uint64_t count, uint64_t timestamp) { addData(0U, count, timestamp); }
double average() const;
static const char *format(double h, char *buf, size_t size);
static rapidjson::Value normalize(double d);
static const char *format(std::pair<bool, double> h, char *buf, size_t size);
static rapidjson::Value normalize(std::pair<bool, double> d);
# ifdef XMRIG_FEATURE_API
rapidjson::Value toJSON(rapidjson::Document &doc) const;
@ -64,7 +64,7 @@ public:
# endif
private:
double hashrate(size_t index, size_t ms) const;
std::pair<bool, double> hashrate(size_t index, size_t ms) const;
void addData(size_t index, uint64_t count, uint64_t timestamp);
constexpr static size_t kBucketSize = 2 << 11;

View file

@ -1,6 +1,6 @@
/* XMRig
* Copyright (c) 2018-2021 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2021 XMRig <https://github.com/xmrig>, <support@xmrig.com>
* Copyright (c) 2018-2024 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2024 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@ -122,17 +122,6 @@ size_t inline generate<Algorithm::RANDOM_X>(Threads<CpuThreads> &threads, uint32
}
}
if (!threads.isExist(Algorithm::RX_KEVA)) {
auto keva = cpuInfo->threads(Algorithm::RX_KEVA, limit);
if (keva == wow) {
threads.setAlias(Algorithm::RX_KEVA, Algorithm::kRX_WOW);
++count;
}
else {
count += threads.move(Algorithm::kRX_KEVA, std::move(keva));
}
}
if (!threads.isExist(Algorithm::RX_WOW)) {
count += threads.move(Algorithm::kRX_WOW, std::move(wow));
}

View file

@ -359,7 +359,9 @@ void xmrig::CpuWorker<N>::start()
}
}
consumeJob();
if (!Nonce::isPaused()) {
consumeJob();
}
}
}

View file

@ -52,7 +52,8 @@ public:
ARCH_ZEN_PLUS,
ARCH_ZEN2,
ARCH_ZEN3,
ARCH_ZEN4
ARCH_ZEN4,
ARCH_ZEN5
};
enum MsrMod : uint32_t {
@ -60,12 +61,13 @@ public:
MSR_MOD_RYZEN_17H,
MSR_MOD_RYZEN_19H,
MSR_MOD_RYZEN_19H_ZEN4,
MSR_MOD_RYZEN_1AH_ZEN5,
MSR_MOD_INTEL,
MSR_MOD_CUSTOM,
MSR_MOD_MAX
};
# define MSR_NAMES_LIST "none", "ryzen_17h", "ryzen_19h", "ryzen_19h_zen4", "intel", "custom"
# define MSR_NAMES_LIST "none", "ryzen_17h", "ryzen_19h", "ryzen_19h_zen4", "ryzen_1Ah_zen5", "intel", "custom"
enum Flag : uint32_t {
FLAG_AES,

View file

@ -64,7 +64,7 @@ static_assert(kCpuFlagsSize == ICpuInfo::FLAG_MAX, "kCpuFlagsSize and FLAG_MAX m
#ifdef XMRIG_FEATURE_MSR
constexpr size_t kMsrArraySize = 6;
constexpr size_t kMsrArraySize = 7;
static const std::array<const char *, kMsrArraySize> msrNames = { MSR_NAMES_LIST };
static_assert(kMsrArraySize == ICpuInfo::MSR_MOD_MAX, "kMsrArraySize and MSR_MOD_MAX mismatch");
#endif
@ -260,6 +260,11 @@ xmrig::BasicCpuInfo::BasicCpuInfo() :
}
break;
case 0x1a:
m_arch = ARCH_ZEN5;
m_msrMod = MSR_MOD_RYZEN_1AH_ZEN5;
break;
default:
m_msrMod = MSR_MOD_NONE;
break;

View file

@ -326,7 +326,8 @@ void xmrig::HwlocCpuInfo::processTopLevelCache(hwloc_obj_t cache, const Algorith
}
}
if (scratchpad == 2 * oneMiB) {
// This code is supposed to run only on Intel CPUs
if ((vendor() == VENDOR_INTEL) && (scratchpad == 2 * oneMiB)) {
if (L2 && (cores.size() * oneMiB) == L2 && L2_associativity == 16 && L3 >= L2) {
L3 = L2;
extra = L2;
@ -341,7 +342,7 @@ void xmrig::HwlocCpuInfo::processTopLevelCache(hwloc_obj_t cache, const Algorith
}
# ifdef XMRIG_ALGO_RANDOMX
if ((algorithm.family() == Algorithm::RANDOM_X) && L3_exclusive && (PUs > cores.size()) && (PUs < cores.size() * 2)) {
if ((vendor() == VENDOR_INTEL) && (algorithm.family() == Algorithm::RANDOM_X) && L3_exclusive && (PUs < cores.size() * 2)) {
// Use all L3+L2 on latest Intel CPUs with P-cores, E-cores and exclusive L3 cache
cacheHashes = (L3 + L2) / scratchpad;
}

View file

@ -372,15 +372,20 @@ void xmrig::CudaBackend::printHashrate(bool details)
char num[16 * 3] = { 0 };
const double hashrate_short = hashrate()->calc(Hashrate::ShortInterval);
const double hashrate_medium = hashrate()->calc(Hashrate::MediumInterval);
const double hashrate_large = hashrate()->calc(Hashrate::LargeInterval);
auto hashrate_short = hashrate()->calc(Hashrate::ShortInterval);
auto hashrate_medium = hashrate()->calc(Hashrate::MediumInterval);
auto hashrate_large = hashrate()->calc(Hashrate::LargeInterval);
double scale = 1.0;
const char* h = " H/s";
if ((hashrate_short >= 1e6) || (hashrate_medium >= 1e6) || (hashrate_large >= 1e6)) {
if ((hashrate_short.second >= 1e6) || (hashrate_medium.second >= 1e6) || (hashrate_large.second >= 1e6)) {
scale = 1e-6;
hashrate_short.second *= scale;
hashrate_medium.second *= scale;
hashrate_large.second *= scale;
h = "MH/s";
}
@ -388,12 +393,20 @@ void xmrig::CudaBackend::printHashrate(bool details)
size_t i = 0;
for (const auto& data : d_ptr->threads) {
Log::print("| %8zu | %8" PRId64 " | %8s | %8s | %8s |" CYAN_BOLD(" #%u") YELLOW(" %s") GREEN(" %s"),
auto h0 = hashrate()->calc(i, Hashrate::ShortInterval);
auto h1 = hashrate()->calc(i, Hashrate::MediumInterval);
auto h2 = hashrate()->calc(i, Hashrate::LargeInterval);
h0.second *= scale;
h1.second *= scale;
h2.second *= scale;
Log::print("| %8zu | %8" PRId64 " | %8s | %8s | %8s |" CYAN_BOLD(" #%u") YELLOW(" %s") GREEN(" %s"),
i,
data.thread.affinity(),
Hashrate::format(hashrate()->calc(i, Hashrate::ShortInterval) * scale, num, sizeof num / 3),
Hashrate::format(hashrate()->calc(i, Hashrate::MediumInterval) * scale, num + 16, sizeof num / 3),
Hashrate::format(hashrate()->calc(i, Hashrate::LargeInterval) * scale, num + 16 * 2, sizeof num / 3),
Hashrate::format(h0, num, sizeof num / 3),
Hashrate::format(h1, num + 16, sizeof num / 3),
Hashrate::format(h2, num + 16 * 2, sizeof num / 3),
data.device.index(),
data.device.topology().toString().data(),
data.device.name().data()
@ -403,9 +416,9 @@ void xmrig::CudaBackend::printHashrate(bool details)
}
Log::print(WHITE_BOLD_S "| - | - | %8s | %8s | %8s |",
Hashrate::format(hashrate_short * scale, num, sizeof num / 3),
Hashrate::format(hashrate_medium * scale, num + 16, sizeof num / 3),
Hashrate::format(hashrate_large * scale, num + 16 * 2, sizeof num / 3)
Hashrate::format(hashrate_short , num, sizeof num / 3),
Hashrate::format(hashrate_medium, num + 16, sizeof num / 3),
Hashrate::format(hashrate_large , num + 16 * 2, sizeof num / 3)
);
}

View file

@ -114,7 +114,6 @@ size_t inline generate<Algorithm::RANDOM_X>(Threads<CudaThreads> &threads, const
auto rx = CudaThreads(devices, Algorithm::RX_0);
auto wow = CudaThreads(devices, Algorithm::RX_WOW);
auto arq = CudaThreads(devices, Algorithm::RX_ARQ);
auto kva = CudaThreads(devices, Algorithm::RX_KEVA);
if (!threads.isExist(Algorithm::RX_WOW) && wow != rx) {
count += threads.move(Algorithm::kRX_WOW, std::move(wow));
@ -124,10 +123,6 @@ size_t inline generate<Algorithm::RANDOM_X>(Threads<CudaThreads> &threads, const
count += threads.move(Algorithm::kRX_ARQ, std::move(arq));
}
if (!threads.isExist(Algorithm::RX_KEVA) && kva != rx) {
count += threads.move(Algorithm::kRX_KEVA, std::move(kva));
}
count += threads.move(Algorithm::kRX, std::move(rx));
return count;

View file

@ -158,7 +158,7 @@ void xmrig::CudaWorker::start()
std::this_thread::yield();
}
if (!consumeJob()) {
if (isReady() && !consumeJob()) {
return;
}
}

View file

@ -352,15 +352,20 @@ void xmrig::OclBackend::printHashrate(bool details)
char num[16 * 3] = { 0 };
const double hashrate_short = hashrate()->calc(Hashrate::ShortInterval);
const double hashrate_medium = hashrate()->calc(Hashrate::MediumInterval);
const double hashrate_large = hashrate()->calc(Hashrate::LargeInterval);
auto hashrate_short = hashrate()->calc(Hashrate::ShortInterval);
auto hashrate_medium = hashrate()->calc(Hashrate::MediumInterval);
auto hashrate_large = hashrate()->calc(Hashrate::LargeInterval);
double scale = 1.0;
const char* h = " H/s";
if ((hashrate_short >= 1e6) || (hashrate_medium >= 1e6) || (hashrate_large >= 1e6)) {
if ((hashrate_short.second >= 1e6) || (hashrate_medium.second >= 1e6) || (hashrate_large.second >= 1e6)) {
scale = 1e-6;
hashrate_short.second *= scale;
hashrate_medium.second *= scale;
hashrate_large.second *= scale;
h = "MH/s";
}
@ -368,12 +373,16 @@ void xmrig::OclBackend::printHashrate(bool details)
size_t i = 0;
for (const auto& data : d_ptr->threads) {
Log::print("| %8zu | %8" PRId64 " | %8s | %8s | %8s |" CYAN_BOLD(" #%u") YELLOW(" %s") " %s",
auto h0 = hashrate()->calc(i, Hashrate::ShortInterval);
auto h1 = hashrate()->calc(i, Hashrate::MediumInterval);
auto h2 = hashrate()->calc(i, Hashrate::LargeInterval);
Log::print("| %8zu | %8" PRId64 " | %8s | %8s | %8s |" CYAN_BOLD(" #%u") YELLOW(" %s") " %s",
i,
data.affinity,
Hashrate::format(hashrate()->calc(i, Hashrate::ShortInterval) * scale, num, sizeof num / 3),
Hashrate::format(hashrate()->calc(i, Hashrate::MediumInterval) * scale, num + 16, sizeof num / 3),
Hashrate::format(hashrate()->calc(i, Hashrate::LargeInterval) * scale, num + 16 * 2, sizeof num / 3),
Hashrate::format(h0, num, sizeof num / 3),
Hashrate::format(h1, num + 16, sizeof num / 3),
Hashrate::format(h2, num + 16 * 2, sizeof num / 3),
data.device.index(),
data.device.topology().toString().data(),
data.device.printableName().data()
@ -383,9 +392,9 @@ void xmrig::OclBackend::printHashrate(bool details)
}
Log::print(WHITE_BOLD_S "| - | - | %8s | %8s | %8s |",
Hashrate::format(hashrate_short * scale, num, sizeof num / 3),
Hashrate::format(hashrate_medium * scale, num + 16, sizeof num / 3),
Hashrate::format(hashrate_large * scale, num + 16 * 2, sizeof num / 3)
Hashrate::format(hashrate_short , num, sizeof num / 3),
Hashrate::format(hashrate_medium, num + 16, sizeof num / 3),
Hashrate::format(hashrate_large , num + 16 * 2, sizeof num / 3)
);
}

View file

@ -170,7 +170,7 @@ void xmrig::OclWorker::start()
const uint64_t t = Chrono::steadyMSecs();
try {
m_runner->run(readUnaligned(m_job.nonce()), results);
m_runner->run(readUnaligned(m_job.nonce()), m_job.nonceOffset(), results);
}
catch (std::exception &ex) {
printError(id(), ex.what());
@ -190,7 +190,7 @@ void xmrig::OclWorker::start()
std::this_thread::yield();
}
if (!consumeJob()) {
if (isReady() && !consumeJob()) {
return;
}
}

View file

@ -22,8 +22,8 @@
#define ALGO_RX_WOW 0x72141177
#define ALGO_RX_ARQMA 0x72121061
#define ALGO_RX_SFX 0x72151273
#define ALGO_RX_KEVA 0x7214116b
#define ALGO_RX_GRAFT 0x72151267
#define ALGO_RX_YADA 0x72151279
#define ALGO_AR2_CHUKWA 0x61130000
#define ALGO_AR2_CHUKWA_V2 0x61140000
#define ALGO_AR2_WRKZ 0x61120000

View file

@ -34,9 +34,9 @@ static const char cryptonight_cl[61447] = {
0x31,0x35,0x31,0x32,0x30,0x30,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x52,0x58,0x5f,0x57,0x4f,0x57,0x20,0x30,0x78,0x37,0x32,0x31,
0x34,0x31,0x31,0x37,0x37,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x52,0x58,0x5f,0x41,0x52,0x51,0x4d,0x41,0x20,0x30,0x78,0x37,0x32,
0x31,0x32,0x31,0x30,0x36,0x31,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x52,0x58,0x5f,0x53,0x46,0x58,0x20,0x30,0x78,0x37,0x32,0x31,
0x35,0x31,0x32,0x37,0x33,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x52,0x58,0x5f,0x4b,0x45,0x56,0x41,0x20,0x30,0x78,0x37,0x32,0x31,
0x34,0x31,0x31,0x36,0x62,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x52,0x58,0x5f,0x47,0x52,0x41,0x46,0x54,0x20,0x30,0x78,0x37,0x32,
0x31,0x35,0x31,0x32,0x36,0x37,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x41,0x52,0x32,0x5f,0x43,0x48,0x55,0x4b,0x57,0x41,0x20,0x30,
0x35,0x31,0x32,0x37,0x33,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x52,0x58,0x5f,0x47,0x52,0x41,0x46,0x54,0x20,0x30,0x78,0x37,0x32,
0x31,0x35,0x31,0x32,0x36,0x37,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x52,0x58,0x5f,0x59,0x41,0x44,0x41,0x20,0x30,0x78,0x37,0x32,
0x31,0x35,0x31,0x32,0x37,0x39,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x41,0x52,0x32,0x5f,0x43,0x48,0x55,0x4b,0x57,0x41,0x20,0x30,
0x78,0x36,0x31,0x31,0x33,0x30,0x30,0x30,0x30,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x41,0x52,0x32,0x5f,0x43,0x48,0x55,0x4b,0x57,
0x41,0x5f,0x56,0x32,0x20,0x30,0x78,0x36,0x31,0x31,0x34,0x30,0x30,0x30,0x30,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x41,0x52,0x32,
0x5f,0x57,0x52,0x4b,0x5a,0x20,0x30,0x78,0x36,0x31,0x31,0x32,0x30,0x30,0x30,0x30,0x0a,0x23,0x64,0x65,0x66,0x69,0x6e,0x65,0x20,0x41,0x4c,0x47,0x4f,0x5f,0x4b,0x41,

View file

@ -138,6 +138,197 @@ __kernel void blake2b_initial_hash(__global void *out, __global const void* bloc
t[7] = hash[7];
}
void blake2b_512_process_double_block_variable(ulong *out, ulong* m, __global const ulong* in, uint in_len, uint out_len)
{
ulong v[16] =
{
iv0 ^ (0x01010000u | out_len), iv1, iv2, iv3, iv4 , iv5, iv6, iv7,
iv0 , iv1, iv2, iv3, iv4 ^ 128, iv5, iv6, iv7,
};
BLAKE2B_ROUNDS();
ulong h[8];
v[0] = h[0] = v[0] ^ v[8] ^ iv0 ^ (0x01010000u | out_len);
v[1] = h[1] = v[1] ^ v[9] ^ iv1;
v[2] = h[2] = v[2] ^ v[10] ^ iv2;
v[3] = h[3] = v[3] ^ v[11] ^ iv3;
v[4] = h[4] = v[4] ^ v[12] ^ iv4;
v[5] = h[5] = v[5] ^ v[13] ^ iv5;
v[6] = h[6] = v[6] ^ v[14] ^ iv6;
v[7] = h[7] = v[7] ^ v[15] ^ iv7;
v[8] = iv0;
v[9] = iv1;
v[10] = iv2;
v[11] = iv3;
v[12] = iv4 ^ in_len;
v[13] = iv5;
v[14] = ~iv6;
v[15] = iv7;
m[ 0] = (in_len > 128) ? in[16] : 0;
m[ 1] = (in_len > 136) ? in[17] : 0;
m[ 2] = (in_len > 144) ? in[18] : 0;
m[ 3] = (in_len > 152) ? in[19] : 0;
m[ 4] = (in_len > 160) ? in[20] : 0;
m[ 5] = (in_len > 168) ? in[21] : 0;
m[ 6] = (in_len > 176) ? in[22] : 0;
m[ 7] = (in_len > 184) ? in[23] : 0;
m[ 8] = (in_len > 192) ? in[24] : 0;
m[ 9] = (in_len > 200) ? in[25] : 0;
m[10] = (in_len > 208) ? in[26] : 0;
m[11] = (in_len > 216) ? in[27] : 0;
m[12] = (in_len > 224) ? in[28] : 0;
m[13] = (in_len > 232) ? in[29] : 0;
m[14] = (in_len > 240) ? in[30] : 0;
m[15] = (in_len > 248) ? in[31] : 0;
if (in_len % sizeof(ulong))
m[(in_len - 128) / sizeof(ulong)] &= (ulong)(-1) >> (64 - (in_len % sizeof(ulong)) * 8);
BLAKE2B_ROUNDS();
if (out_len > 0) out[0] = h[0] ^ v[0] ^ v[8];
if (out_len > 8) out[1] = h[1] ^ v[1] ^ v[9];
if (out_len > 16) out[2] = h[2] ^ v[2] ^ v[10];
if (out_len > 24) out[3] = h[3] ^ v[3] ^ v[11];
if (out_len > 32) out[4] = h[4] ^ v[4] ^ v[12];
if (out_len > 40) out[5] = h[5] ^ v[5] ^ v[13];
if (out_len > 48) out[6] = h[6] ^ v[6] ^ v[14];
if (out_len > 56) out[7] = h[7] ^ v[7] ^ v[15];
}
__attribute__((reqd_work_group_size(64, 1, 1)))
__kernel void blake2b_initial_hash_double(__global void *out, __global const void* blockTemplate, uint blockTemplateSize, uint start_nonce)
{
const uint global_index = get_global_id(0);
__global const ulong* p = (__global const ulong*) blockTemplate;
ulong m[16] = { p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7], p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15] };
const ulong nonce = start_nonce + global_index;
m[4] = (m[4] & ((ulong)(-1) >> 8)) | (nonce << 56);
m[5] = (m[5] & ((ulong)(-1) << 24)) | (nonce >> 8);
ulong hash[8];
blake2b_512_process_double_block_variable(hash, m, p, blockTemplateSize, 64);
__global ulong* t = ((__global ulong*) out) + global_index * 8;
t[0] = hash[0];
t[1] = hash[1];
t[2] = hash[2];
t[3] = hash[3];
t[4] = hash[4];
t[5] = hash[5];
t[6] = hash[6];
t[7] = hash[7];
}
void blake2b_512_process_big_block(ulong *out, __global const ulong* in, uint in_len, uint out_len, uint nonce, uint nonce_offset)
{
ulong h[8] = { iv0 ^ (0x01010000u | out_len), iv1, iv2, iv3, iv4, iv5, iv6, iv7 };
for (uint t = 128; t < in_len; t += 128, in += 16) {
ulong m[16] = { in[0], in[1], in[2], in[3], in[4], in[5], in[6], in[7], in[8], in[9], in[10], in[11], in[12], in[13], in[14], in[15] };
const uint k0 = (nonce_offset + 0) - (t - 128);
const uint k1 = (nonce_offset + 1) - (t - 128);
const uint k2 = (nonce_offset + 2) - (t - 128);
const uint k3 = (nonce_offset + 3) - (t - 128);
if (k0 < 128) m[k0 / 8] |= (ulong)((nonce >> 0) & 255) << ((k0 % 8) * 8);
if (k1 < 128) m[k1 / 8] |= (ulong)((nonce >> 8) & 255) << ((k1 % 8) * 8);
if (k2 < 128) m[k2 / 8] |= (ulong)((nonce >> 16) & 255) << ((k2 % 8) * 8);
if (k3 < 128) m[k3 / 8] |= (ulong)((nonce >> 24) & 255) << ((k3 % 8) * 8);
ulong v[16] = { h[0], h[1], h[2], h[3], h[4], h[5], h[6], h[7], iv0, iv1, iv2, iv3, iv4 ^ t, iv5, iv6, iv7 };
BLAKE2B_ROUNDS();
h[0] ^= v[0] ^ v[ 8];
h[1] ^= v[1] ^ v[ 9];
h[2] ^= v[2] ^ v[10];
h[3] ^= v[3] ^ v[11];
h[4] ^= v[4] ^ v[12];
h[5] ^= v[5] ^ v[13];
h[6] ^= v[6] ^ v[14];
h[7] ^= v[7] ^ v[15];
}
uint k = in_len & 127;
if (k == 0) k = 128;
ulong m[16] = {
(k > 0) ? in[ 0] : 0,
(k > 8) ? in[ 1] : 0,
(k > 16) ? in[ 2] : 0,
(k > 24) ? in[ 3] : 0,
(k > 32) ? in[ 4] : 0,
(k > 40) ? in[ 5] : 0,
(k > 48) ? in[ 6] : 0,
(k > 56) ? in[ 7] : 0,
(k > 64) ? in[ 8] : 0,
(k > 72) ? in[ 9] : 0,
(k > 80) ? in[10] : 0,
(k > 88) ? in[11] : 0,
(k > 96) ? in[12] : 0,
(k > 104) ? in[13] : 0,
(k > 112) ? in[14] : 0,
(k > 120) ? in[15] : 0
};
const uint t = in_len - k;
const uint k0 = nonce_offset + 0 - t;
const uint k1 = nonce_offset + 1 - t;
const uint k2 = nonce_offset + 2 - t;
const uint k3 = nonce_offset + 3 - t;
if (k0 < k) m[k0 / 8] |= (ulong)((nonce >> 0) & 255) << ((k0 % 8) * 8);
if (k1 < k) m[k1 / 8] |= (ulong)((nonce >> 8) & 255) << ((k1 % 8) * 8);
if (k2 < k) m[k2 / 8] |= (ulong)((nonce >> 16) & 255) << ((k2 % 8) * 8);
if (k3 < k) m[k3 / 8] |= (ulong)((nonce >> 24) & 255) << ((k3 % 8) * 8);
if (k % 8) {
m[k / 8] &= (ulong)(-1) >> (64 - (k % 8) * 8);
}
ulong v[16] = { h[0], h[1], h[2], h[3], h[4], h[5], h[6], h[7], iv0, iv1, iv2, iv3, iv4 ^ in_len, iv5, ~iv6, iv7 };
BLAKE2B_ROUNDS();
if (out_len > 0) out[0] = h[0] ^ v[0] ^ v[8];
if (out_len > 8) out[1] = h[1] ^ v[1] ^ v[9];
if (out_len > 16) out[2] = h[2] ^ v[2] ^ v[10];
if (out_len > 24) out[3] = h[3] ^ v[3] ^ v[11];
if (out_len > 32) out[4] = h[4] ^ v[4] ^ v[12];
if (out_len > 40) out[5] = h[5] ^ v[5] ^ v[13];
if (out_len > 48) out[6] = h[6] ^ v[6] ^ v[14];
if (out_len > 56) out[7] = h[7] ^ v[7] ^ v[15];
}
__attribute__((reqd_work_group_size(64, 1, 1)))
__kernel void blake2b_initial_hash_big(__global void *out, __global const void* blockTemplate, uint blockTemplateSize, uint start_nonce, uint nonce_offset)
{
const uint global_index = get_global_id(0);
__global const ulong* p = (__global const ulong*) blockTemplate;
ulong hash[8];
blake2b_512_process_big_block(hash, p, blockTemplateSize, 64, start_nonce + global_index, nonce_offset);
__global ulong* t = ((__global ulong*) out) + global_index * 8;
t[0] = hash[0];
t[1] = hash[1];
t[2] = hash[2];
t[3] = hash[3];
t[4] = hash[4];
t[5] = hash[5];
t[6] = hash[6];
t[7] = hash[7];
}
#define in_len 256
#define out_len 32

View file

@ -1,13 +1,11 @@
#include "../cn/algorithm.cl"
#if (ALGO == ALGO_RX_0)
#if ((ALGO == ALGO_RX_0) || (ALGO == ALGO_RX_YADA))
#include "randomx_constants_monero.h"
#elif (ALGO == ALGO_RX_WOW)
#include "randomx_constants_wow.h"
#elif (ALGO == ALGO_RX_ARQMA)
#include "randomx_constants_arqma.h"
#elif (ALGO == ALGO_RX_KEVA)
#include "randomx_constants_keva.h"
#elif (ALGO == ALGO_RX_GRAFT)
#include "randomx_constants_graft.h"
#endif

File diff suppressed because it is too large Load diff

View file

@ -1,96 +0,0 @@
/*
Copyright (c) 2019 SChernykh
This file is part of RandomX OpenCL.
RandomX OpenCL is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
RandomX OpenCL is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with RandomX OpenCL. If not, see <http://www.gnu.org/licenses/>.
*/
//Dataset base size in bytes. Must be a power of 2.
#define RANDOMX_DATASET_BASE_SIZE 2147483648
//Dataset extra size. Must be divisible by 64.
#define RANDOMX_DATASET_EXTRA_SIZE 33554368
//Scratchpad L3 size in bytes. Must be a power of 2.
#define RANDOMX_SCRATCHPAD_L3 1048576
//Scratchpad L2 size in bytes. Must be a power of two and less than or equal to RANDOMX_SCRATCHPAD_L3.
#define RANDOMX_SCRATCHPAD_L2 131072
//Scratchpad L1 size in bytes. Must be a power of two (minimum 64) and less than or equal to RANDOMX_SCRATCHPAD_L2.
#define RANDOMX_SCRATCHPAD_L1 16384
//Jump condition mask size in bits.
#define RANDOMX_JUMP_BITS 8
//Jump condition mask offset in bits. The sum of RANDOMX_JUMP_BITS and RANDOMX_JUMP_OFFSET must not exceed 16.
#define RANDOMX_JUMP_OFFSET 8
//Integer instructions
#define RANDOMX_FREQ_IADD_RS 16
#define RANDOMX_FREQ_IADD_M 7
#define RANDOMX_FREQ_ISUB_R 16
#define RANDOMX_FREQ_ISUB_M 7
#define RANDOMX_FREQ_IMUL_R 16
#define RANDOMX_FREQ_IMUL_M 4
#define RANDOMX_FREQ_IMULH_R 4
#define RANDOMX_FREQ_IMULH_M 1
#define RANDOMX_FREQ_ISMULH_R 4
#define RANDOMX_FREQ_ISMULH_M 1
#define RANDOMX_FREQ_IMUL_RCP 8
#define RANDOMX_FREQ_INEG_R 2
#define RANDOMX_FREQ_IXOR_R 15
#define RANDOMX_FREQ_IXOR_M 5
#define RANDOMX_FREQ_IROR_R 8
#define RANDOMX_FREQ_IROL_R 2
#define RANDOMX_FREQ_ISWAP_R 4
//Floating point instructions
#define RANDOMX_FREQ_FSWAP_R 4
#define RANDOMX_FREQ_FADD_R 16
#define RANDOMX_FREQ_FADD_M 5
#define RANDOMX_FREQ_FSUB_R 16
#define RANDOMX_FREQ_FSUB_M 5
#define RANDOMX_FREQ_FSCAL_R 6
#define RANDOMX_FREQ_FMUL_R 32
#define RANDOMX_FREQ_FDIV_M 4
#define RANDOMX_FREQ_FSQRT_R 6
//Control instructions
#define RANDOMX_FREQ_CBRANCH 25
#define RANDOMX_FREQ_CFROUND 1
//Store instruction
#define RANDOMX_FREQ_ISTORE 16
//No-op instruction
#define RANDOMX_FREQ_NOP 0
#define RANDOMX_DATASET_ITEM_SIZE 64
#define RANDOMX_PROGRAM_SIZE 256
#define HASH_SIZE 64
#define ENTROPY_SIZE (128 + RANDOMX_PROGRAM_SIZE * 8)
#define REGISTERS_SIZE 256
#define IMM_BUF_SIZE (RANDOMX_PROGRAM_SIZE * 4 - REGISTERS_SIZE)
#define IMM_INDEX_COUNT ((IMM_BUF_SIZE / 4) - 2)
#define VM_STATE_SIZE (REGISTERS_SIZE + IMM_BUF_SIZE + RANDOMX_PROGRAM_SIZE * 4)
#define ROUNDING_MODE (RANDOMX_FREQ_CFROUND ? -1 : 0)
// Scratchpad L1/L2/L3 bits
#define LOC_L1 (32 - 14)
#define LOC_L2 (32 - 17)
#define LOC_L3 (32 - 20)

View file

@ -64,7 +64,7 @@ public:
virtual uint32_t deviceIndex() const = 0;
virtual void build() = 0;
virtual void init() = 0;
virtual void run(uint32_t nonce, uint32_t *hashOutput) = 0;
virtual void run(uint32_t nonce, uint32_t nonce_offset, uint32_t *hashOutput) = 0;
virtual void set(const Job &job, uint8_t *blob) = 0;
virtual void jobEarlyNotification(const Job&) = 0;

View file

@ -0,0 +1,59 @@
/* XMRig
* Copyright 2010 Jeff Garzik <jgarzik@pobox.com>
* Copyright 2012-2014 pooler <pooler@litecoinpool.org>
* Copyright 2014 Lucas Jones <https://github.com/lucasjones>
* Copyright 2014-2016 Wolf9466 <https://github.com/OhGodAPet>
* Copyright 2016 Jay D Dee <jayddee246@gmail.com>
* Copyright 2017-2018 XMR-Stak <https://github.com/fireice-uk>, <https://github.com/psychocrypt>
* Copyright 2018-2019 SChernykh <https://github.com/SChernykh>
* Copyright 2016-2019 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include "backend/opencl/kernels/rx/Blake2bInitialHashBigKernel.h"
#include "backend/opencl/wrappers/OclLib.h"
void xmrig::Blake2bInitialHashBigKernel::enqueue(cl_command_queue queue, size_t threads)
{
const size_t gthreads = threads;
static const size_t lthreads = 64;
enqueueNDRange(queue, 1, nullptr, &gthreads, &lthreads);
}
// __kernel void blake2b_initial_hash_double(__global void *out, __global const void* blockTemplate, uint blockTemplateSize, uint start_nonce)
void xmrig::Blake2bInitialHashBigKernel::setArgs(cl_mem out, cl_mem blockTemplate)
{
setArg(0, sizeof(cl_mem), &out);
setArg(1, sizeof(cl_mem), &blockTemplate);
}
void xmrig::Blake2bInitialHashBigKernel::setBlobSize(size_t size)
{
const uint32_t s = size;
setArg(2, sizeof(uint32_t), &s);
}
void xmrig::Blake2bInitialHashBigKernel::setNonce(uint32_t nonce, uint32_t nonce_offset)
{
setArg(3, sizeof(uint32_t), &nonce);
setArg(4, sizeof(uint32_t), &nonce_offset);
}

View file

@ -0,0 +1,50 @@
/* XMRig
* Copyright 2010 Jeff Garzik <jgarzik@pobox.com>
* Copyright 2012-2014 pooler <pooler@litecoinpool.org>
* Copyright 2014 Lucas Jones <https://github.com/lucasjones>
* Copyright 2014-2016 Wolf9466 <https://github.com/OhGodAPet>
* Copyright 2016 Jay D Dee <jayddee246@gmail.com>
* Copyright 2017-2018 XMR-Stak <https://github.com/fireice-uk>, <https://github.com/psychocrypt>
* Copyright 2018-2019 SChernykh <https://github.com/SChernykh>
* Copyright 2016-2019 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef XMRIG_BLAKE2BINITIALHASHBIGKERNEL_H
#define XMRIG_BLAKE2BINITIALHASHBIGKERNEL_H
#include "backend/opencl/wrappers/OclKernel.h"
namespace xmrig {
class Blake2bInitialHashBigKernel : public OclKernel
{
public:
inline Blake2bInitialHashBigKernel(cl_program program) : OclKernel(program, "blake2b_initial_hash_big") {}
void enqueue(cl_command_queue queue, size_t threads);
void setArgs(cl_mem out, cl_mem blockTemplate);
void setBlobSize(size_t size);
void setNonce(uint32_t nonce, uint32_t nonce_offset);
};
} // namespace xmrig
#endif /* XMRIG_BLAKE2BINITIALHASHBIGKERNEL_H */

View file

@ -0,0 +1,58 @@
/* XMRig
* Copyright 2010 Jeff Garzik <jgarzik@pobox.com>
* Copyright 2012-2014 pooler <pooler@litecoinpool.org>
* Copyright 2014 Lucas Jones <https://github.com/lucasjones>
* Copyright 2014-2016 Wolf9466 <https://github.com/OhGodAPet>
* Copyright 2016 Jay D Dee <jayddee246@gmail.com>
* Copyright 2017-2018 XMR-Stak <https://github.com/fireice-uk>, <https://github.com/psychocrypt>
* Copyright 2018-2019 SChernykh <https://github.com/SChernykh>
* Copyright 2016-2019 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include "backend/opencl/kernels/rx/Blake2bInitialHashDoubleKernel.h"
#include "backend/opencl/wrappers/OclLib.h"
void xmrig::Blake2bInitialHashDoubleKernel::enqueue(cl_command_queue queue, size_t threads)
{
const size_t gthreads = threads;
static const size_t lthreads = 64;
enqueueNDRange(queue, 1, nullptr, &gthreads, &lthreads);
}
// __kernel void blake2b_initial_hash_double(__global void *out, __global const void* blockTemplate, uint blockTemplateSize, uint start_nonce)
void xmrig::Blake2bInitialHashDoubleKernel::setArgs(cl_mem out, cl_mem blockTemplate)
{
setArg(0, sizeof(cl_mem), &out);
setArg(1, sizeof(cl_mem), &blockTemplate);
}
void xmrig::Blake2bInitialHashDoubleKernel::setBlobSize(size_t size)
{
const uint32_t s = size;
setArg(2, sizeof(uint32_t), &s);
}
void xmrig::Blake2bInitialHashDoubleKernel::setNonce(uint32_t nonce)
{
setArg(3, sizeof(uint32_t), &nonce);
}

View file

@ -0,0 +1,50 @@
/* XMRig
* Copyright 2010 Jeff Garzik <jgarzik@pobox.com>
* Copyright 2012-2014 pooler <pooler@litecoinpool.org>
* Copyright 2014 Lucas Jones <https://github.com/lucasjones>
* Copyright 2014-2016 Wolf9466 <https://github.com/OhGodAPet>
* Copyright 2016 Jay D Dee <jayddee246@gmail.com>
* Copyright 2017-2018 XMR-Stak <https://github.com/fireice-uk>, <https://github.com/psychocrypt>
* Copyright 2018-2019 SChernykh <https://github.com/SChernykh>
* Copyright 2016-2019 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef XMRIG_BLAKE2BINITIALHASHDOUBLEKERNEL_H
#define XMRIG_BLAKE2BINITIALHASHDOUBLEKERNEL_H
#include "backend/opencl/wrappers/OclKernel.h"
namespace xmrig {
class Blake2bInitialHashDoubleKernel : public OclKernel
{
public:
inline Blake2bInitialHashDoubleKernel(cl_program program) : OclKernel(program, "blake2b_initial_hash_double") {}
void enqueue(cl_command_queue queue, size_t threads);
void setArgs(cl_mem out, cl_mem blockTemplate);
void setBlobSize(size_t size);
void setNonce(uint32_t nonce);
};
} // namespace xmrig
#endif /* XMRIG_BLAKE2BINITIALHASHDOUBLEKERNEL_H */

View file

@ -80,6 +80,8 @@ if (WITH_OPENCL)
if (WITH_RANDOMX)
list(APPEND HEADERS_BACKEND_OPENCL
src/backend/opencl/kernels/rx/Blake2bHashRegistersKernel.h
src/backend/opencl/kernels/rx/Blake2bInitialHashBigKernel.h
src/backend/opencl/kernels/rx/Blake2bInitialHashDoubleKernel.h
src/backend/opencl/kernels/rx/Blake2bInitialHashKernel.h
src/backend/opencl/kernels/rx/ExecuteVmKernel.h
src/backend/opencl/kernels/rx/FillAesKernel.h
@ -96,6 +98,8 @@ if (WITH_OPENCL)
list(APPEND SOURCES_BACKEND_OPENCL
src/backend/opencl/generators/ocl_generic_rx_generator.cpp
src/backend/opencl/kernels/rx/Blake2bHashRegistersKernel.cpp
src/backend/opencl/kernels/rx/Blake2bInitialHashBigKernel.cpp
src/backend/opencl/kernels/rx/Blake2bInitialHashDoubleKernel.cpp
src/backend/opencl/kernels/rx/Blake2bInitialHashKernel.cpp
src/backend/opencl/kernels/rx/ExecuteVmKernel.cpp
src/backend/opencl/kernels/rx/FillAesKernel.cpp

View file

@ -87,7 +87,7 @@ size_t xmrig::OclCnRunner::bufferSize() const
}
void xmrig::OclCnRunner::run(uint32_t nonce, uint32_t *hashOutput)
void xmrig::OclCnRunner::run(uint32_t nonce, uint32_t /*nonce_offset*/, uint32_t *hashOutput)
{
static const cl_uint zero = 0;

View file

@ -42,7 +42,7 @@ public:
protected:
size_t bufferSize() const override;
void run(uint32_t nonce, uint32_t *hashOutput) override;
void run(uint32_t nonce, uint32_t nonce_offset, uint32_t *hashOutput) override;
void set(const Job &job, uint8_t *blob) override;
void build() override;
void init() override;

View file

@ -75,7 +75,7 @@ OclKawPowRunner::~OclKawPowRunner()
}
void OclKawPowRunner::run(uint32_t nonce, uint32_t *hashOutput)
void OclKawPowRunner::run(uint32_t nonce, uint32_t /*nonce_offset*/, uint32_t *hashOutput)
{
const size_t local_work_size = m_workGroupSize;
const size_t global_work_offset = nonce;

View file

@ -40,7 +40,7 @@ public:
~OclKawPowRunner() override;
protected:
void run(uint32_t nonce, uint32_t *hashOutput) override;
void run(uint32_t nonce, uint32_t nonce_offset, uint32_t *hashOutput) override;
void set(const Job &job, uint8_t *blob) override;
void build() override;
void init() override;

View file

@ -25,6 +25,8 @@
#include "backend/opencl/runners/OclRxBaseRunner.h"
#include "backend/opencl/kernels/rx/Blake2bHashRegistersKernel.h"
#include "backend/opencl/kernels/rx/Blake2bInitialHashKernel.h"
#include "backend/opencl/kernels/rx/Blake2bInitialHashDoubleKernel.h"
#include "backend/opencl/kernels/rx/Blake2bInitialHashBigKernel.h"
#include "backend/opencl/kernels/rx/FillAesKernel.h"
#include "backend/opencl/kernels/rx/FindSharesKernel.h"
#include "backend/opencl/kernels/rx/HashAesKernel.h"
@ -71,6 +73,8 @@ xmrig::OclRxBaseRunner::~OclRxBaseRunner()
delete m_fillAes4Rx4_entropy;
delete m_hashAes1Rx4;
delete m_blake2b_initial_hash;
delete m_blake2b_initial_hash_double;
delete m_blake2b_initial_hash_big;
delete m_blake2b_hash_registers_32;
delete m_blake2b_hash_registers_64;
delete m_find_shares;
@ -83,16 +87,34 @@ xmrig::OclRxBaseRunner::~OclRxBaseRunner()
}
void xmrig::OclRxBaseRunner::run(uint32_t nonce, uint32_t *hashOutput)
void xmrig::OclRxBaseRunner::run(uint32_t nonce, uint32_t nonce_offset, uint32_t *hashOutput)
{
static const uint32_t zero = 0;
m_blake2b_initial_hash->setNonce(nonce);
if (m_jobSize <= 128) {
m_blake2b_initial_hash->setNonce(nonce);
}
else if (m_jobSize <= 256) {
m_blake2b_initial_hash_double->setNonce(nonce);
}
else {
m_blake2b_initial_hash_big->setNonce(nonce, nonce_offset);
}
m_find_shares->setNonce(nonce);
enqueueWriteBuffer(m_output, CL_FALSE, sizeof(cl_uint) * 0xFF, sizeof(uint32_t), &zero);
m_blake2b_initial_hash->enqueue(m_queue, m_intensity);
if (m_jobSize <= 128) {
m_blake2b_initial_hash->enqueue(m_queue, m_intensity);
}
else if (m_jobSize <= 256) {
m_blake2b_initial_hash_double->enqueue(m_queue, m_intensity);
}
else {
m_blake2b_initial_hash_big->enqueue(m_queue, m_intensity);
}
m_fillAes1Rx4_scratchpad->enqueue(m_queue, m_intensity);
const uint32_t programCount = RxAlgo::programCount(m_algorithm);
@ -132,9 +154,16 @@ void xmrig::OclRxBaseRunner::set(const Job &job, uint8_t *blob)
memset(blob + job.size(), 0, Job::kMaxBlobSize - job.size());
}
memset(blob + job.nonceOffset(), 0, job.nonceSize());
enqueueWriteBuffer(m_input, CL_TRUE, 0, Job::kMaxBlobSize, blob);
m_jobSize = job.size();
m_blake2b_initial_hash->setBlobSize(job.size());
m_blake2b_initial_hash_double->setBlobSize(job.size());
m_blake2b_initial_hash_big->setBlobSize(job.size());
m_find_shares->setTarget(job.target());
}
@ -166,6 +195,12 @@ void xmrig::OclRxBaseRunner::build()
m_blake2b_initial_hash = new Blake2bInitialHashKernel(m_program);
m_blake2b_initial_hash->setArgs(m_hashes, m_input);
m_blake2b_initial_hash_double = new Blake2bInitialHashDoubleKernel(m_program);
m_blake2b_initial_hash_double->setArgs(m_hashes, m_input);
m_blake2b_initial_hash_big = new Blake2bInitialHashBigKernel(m_program);
m_blake2b_initial_hash_big->setArgs(m_hashes, m_input);
m_blake2b_hash_registers_32 = new Blake2bHashRegistersKernel(m_program, "blake2b_hash_registers_32");
m_blake2b_hash_registers_64 = new Blake2bHashRegistersKernel(m_program, "blake2b_hash_registers_64");

View file

@ -35,6 +35,8 @@ namespace xmrig {
class Blake2bHashRegistersKernel;
class Blake2bInitialHashKernel;
class Blake2bInitialHashDoubleKernel;
class Blake2bInitialHashBigKernel;
class FillAesKernel;
class FindSharesKernel;
class HashAesKernel;
@ -52,27 +54,31 @@ protected:
size_t bufferSize() const override;
void build() override;
void init() override;
void run(uint32_t nonce, uint32_t *hashOutput) override;
void run(uint32_t nonce, uint32_t nonce_offset, uint32_t *hashOutput) override;
void set(const Job &job, uint8_t *blob) override;
protected:
virtual void execute(uint32_t iteration) = 0;
Blake2bHashRegistersKernel *m_blake2b_hash_registers_32 = nullptr;
Blake2bHashRegistersKernel *m_blake2b_hash_registers_64 = nullptr;
Blake2bInitialHashKernel *m_blake2b_initial_hash = nullptr;
Blake2bHashRegistersKernel *m_blake2b_hash_registers_32 = nullptr;
Blake2bHashRegistersKernel *m_blake2b_hash_registers_64 = nullptr;
Blake2bInitialHashKernel *m_blake2b_initial_hash = nullptr;
Blake2bInitialHashDoubleKernel *m_blake2b_initial_hash_double = nullptr;
Blake2bInitialHashBigKernel* m_blake2b_initial_hash_big = nullptr;
Buffer m_seed;
cl_mem m_dataset = nullptr;
cl_mem m_entropy = nullptr;
cl_mem m_hashes = nullptr;
cl_mem m_rounding = nullptr;
cl_mem m_scratchpads = nullptr;
FillAesKernel *m_fillAes1Rx4_scratchpad = nullptr;
FillAesKernel *m_fillAes4Rx4_entropy = nullptr;
FindSharesKernel *m_find_shares = nullptr;
HashAesKernel *m_hashAes1Rx4 = nullptr;
uint32_t m_gcn_version = 12;
uint32_t m_worksize = 8;
cl_mem m_dataset = nullptr;
cl_mem m_entropy = nullptr;
cl_mem m_hashes = nullptr;
cl_mem m_rounding = nullptr;
cl_mem m_scratchpads = nullptr;
FillAesKernel *m_fillAes1Rx4_scratchpad = nullptr;
FillAesKernel *m_fillAes4Rx4_entropy = nullptr;
FindSharesKernel *m_find_shares = nullptr;
HashAesKernel *m_hashAes1Rx4 = nullptr;
uint32_t m_gcn_version = 12;
uint32_t m_worksize = 8;
size_t m_jobSize = 0;
};

View file

@ -1,6 +1,6 @@
/* XMRig
* Copyright (c) 2018-2023 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2023 XMRig <https://github.com/xmrig>, <support@xmrig.com>
* Copyright (c) 2018-2024 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2024 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@ -25,6 +25,8 @@
#include "base/crypto/keccak.h"
#include "base/io/Env.h"
#include "base/io/json/Json.h"
#include "base/io/log/Log.h"
#include "base/io/log/Tags.h"
#include "base/kernel/Base.h"
#include "base/tools/Chrono.h"
#include "base/tools/Cvt.h"
@ -91,7 +93,11 @@ xmrig::Api::Api(Base *base) :
xmrig::Api::~Api()
{
# ifdef XMRIG_FEATURE_HTTP
delete m_httpd;
if (m_httpd) {
m_httpd->stop();
delete m_httpd;
m_httpd = nullptr; // Ensure the pointer is set to nullptr after deletion
}
# endif
}
@ -109,8 +115,15 @@ void xmrig::Api::start()
genWorkerId(m_base->config()->apiWorkerId());
# ifdef XMRIG_FEATURE_HTTP
m_httpd = new Httpd(m_base);
m_httpd->start();
if (!m_httpd) {
m_httpd = new Httpd(m_base);
if (!m_httpd->start()) {
LOG_ERR("%s " RED_BOLD("HTTP API server failed to start."), Tags::network());
delete m_httpd; // Properly handle failure to start
m_httpd = nullptr;
}
}
# endif
}
@ -118,7 +131,9 @@ void xmrig::Api::start()
void xmrig::Api::stop()
{
# ifdef XMRIG_FEATURE_HTTP
m_httpd->stop();
if (m_httpd) {
m_httpd->stop();
}
# endif
}
@ -126,13 +141,15 @@ void xmrig::Api::stop()
void xmrig::Api::tick()
{
# ifdef XMRIG_FEATURE_HTTP
if (m_httpd->isBound() || !m_base->config()->http().isEnabled()) {
if (!m_httpd || !m_base->config()->http().isEnabled() || m_httpd->isBound()) {
return;
}
if (++m_ticks % 10 == 0) {
m_ticks = 0;
m_httpd->start();
if (m_httpd) {
m_httpd->start();
}
}
# endif
}

View file

@ -1,6 +1,6 @@
/* XMRig
* Copyright (c) 2018-2023 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2023 XMRig <https://github.com/xmrig>, <support@xmrig.com>
* Copyright (c) 2018-2024 SChernykh <https://github.com/SChernykh>
* Copyright (c) 2016-2024 XMRig <https://github.com/xmrig>, <support@xmrig.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by

View file

@ -81,7 +81,7 @@ const char *Algorithm::kRX_WOW = "rx/wow";
const char *Algorithm::kRX_ARQ = "rx/arq";
const char *Algorithm::kRX_GRAFT = "rx/graft";
const char *Algorithm::kRX_SFX = "rx/sfx";
const char *Algorithm::kRX_KEVA = "rx/keva";
const char *Algorithm::kRX_YADA = "rx/yada";
#endif
#ifdef XMRIG_ALGO_ARGON2
@ -147,7 +147,7 @@ static const std::map<uint32_t, const char *> kAlgorithmNames = {
ALGO_NAME(RX_ARQ),
ALGO_NAME(RX_GRAFT),
ALGO_NAME(RX_SFX),
ALGO_NAME(RX_KEVA),
ALGO_NAME(RX_YADA),
# endif
# ifdef XMRIG_ALGO_ARGON2
@ -261,8 +261,8 @@ static const std::map<const char *, Algorithm::Id, aliasCompare> kAlgorithmAlias
ALGO_ALIAS(RX_GRAFT, "randomgraft"),
ALGO_ALIAS_AUTO(RX_SFX), ALGO_ALIAS(RX_SFX, "randomx/sfx"),
ALGO_ALIAS(RX_SFX, "randomsfx"),
ALGO_ALIAS_AUTO(RX_KEVA), ALGO_ALIAS(RX_KEVA, "randomx/keva"),
ALGO_ALIAS(RX_KEVA, "randomkeva"),
ALGO_ALIAS_AUTO(RX_YADA), ALGO_ALIAS(RX_YADA, "randomx/yada"),
ALGO_ALIAS(RX_YADA, "randomyada"),
# endif
# ifdef XMRIG_ALGO_ARGON2
@ -350,7 +350,7 @@ std::vector<xmrig::Algorithm> xmrig::Algorithm::all(const std::function<bool(con
CN_HEAVY_0, CN_HEAVY_TUBE, CN_HEAVY_XHV,
CN_PICO_0, CN_PICO_TLO,
CN_UPX2,
RX_0, RX_WOW, RX_ARQ, RX_GRAFT, RX_SFX, RX_KEVA,
RX_0, RX_WOW, RX_ARQ, RX_GRAFT, RX_SFX, RX_YADA,
AR2_CHUKWA, AR2_CHUKWA_V2, AR2_WRKZ,
KAWPOW_RVN,
GHOSTRIDER_RTM

Some files were not shown because too many files have changed in this diff Show more