diff options
Diffstat (limited to 'Documentation/admin-guide')
25 files changed, 377 insertions, 166 deletions
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 261b7b4cca1f..35314b63008c 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -226,10 +226,11 @@ Configuring the kernel all module options to built in (=y) options. You can also preserve modules by LMC_KEEP. - "make kvmconfig" Enable additional options for kvm guest kernel support. + "make kvm_guest.config" Enable additional options for kvm guest kernel + support. - "make xenconfig" Enable additional options for xen dom0 guest kernel - support. + "make xen.config" Enable additional options for xen dom0 guest kernel + support. "make tinyconfig" Configure the tiniest possible kernel. diff --git a/Documentation/admin-guide/auxdisplay/cfag12864b.rst b/Documentation/admin-guide/auxdisplay/cfag12864b.rst index 18c2865bd322..da385d851acc 100644 --- a/Documentation/admin-guide/auxdisplay/cfag12864b.rst +++ b/Documentation/admin-guide/auxdisplay/cfag12864b.rst @@ -3,7 +3,7 @@ cfag12864b LCD Driver Documentation =================================== :License: GPLv2 -:Author & Maintainer: Miguel Ojeda Sandonis +:Author & Maintainer: Miguel Ojeda <ojeda@kernel.org> :Date: 2006-10-27 diff --git a/Documentation/admin-guide/auxdisplay/ks0108.rst b/Documentation/admin-guide/auxdisplay/ks0108.rst index c0b7faf73136..a7d3fe509373 100644 --- a/Documentation/admin-guide/auxdisplay/ks0108.rst +++ b/Documentation/admin-guide/auxdisplay/ks0108.rst @@ -3,7 +3,7 @@ ks0108 LCD Controller Driver Documentation ========================================== :License: GPLv2 -:Author & Maintainer: Miguel Ojeda Sandonis +:Author & Maintainer: Miguel Ojeda <ojeda@kernel.org> :Date: 2006-10-27 diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 52688ae34461..0936412e044e 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -963,21 +963,21 @@ References 2. Singh, Balbir. Memory Controller (RSS Control), http://lwn.net/Articles/222762/ 3. Emelianov, Pavel. Resource controllers based on process cgroups - http://lkml.org/lkml/2007/3/6/198 + https://lore.kernel.org/r/45ED7DEC.7010403@sw.ru 4. Emelianov, Pavel. RSS controller based on process cgroups (v2) - http://lkml.org/lkml/2007/4/9/78 + https://lore.kernel.org/r/461A3010.90403@sw.ru 5. Emelianov, Pavel. RSS controller based on process cgroups (v3) - http://lkml.org/lkml/2007/5/30/244 + https://lore.kernel.org/r/465D9739.8070209@openvz.org 6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/ 7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control subsystem (v3), http://lwn.net/Articles/235534/ 8. Singh, Balbir. RSS controller v2 test results (lmbench), - http://lkml.org/lkml/2007/5/17/232 + https://lore.kernel.org/r/464C95D4.7070806@linux.vnet.ibm.com 9. Singh, Balbir. RSS controller v2 AIM9 results - http://lkml.org/lkml/2007/5/18/1 + https://lore.kernel.org/r/464D267A.50107@linux.vnet.ibm.com 10. Singh, Balbir. Memory controller v6 test results, - http://lkml.org/lkml/2007/8/19/36 + https://lore.kernel.org/r/20070819094658.654.84837.sendpatchset@balbir-laptop 11. Singh, Balbir. Memory controller introduction (v6), - http://lkml.org/lkml/2007/8/17/69 + https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop 12. Corbet, Jonathan, Controlling memory use in cgroups, http://lwn.net/Articles/243795/ diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 1de8695c264b..64c62b979f2f 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1,3 +1,5 @@ +.. _cgroup-v2: + ================ Control Group v2 ================ @@ -172,7 +174,6 @@ disabling controllers in v1 and make them always available in v2. cgroup v2 currently supports the following mount options. nsdelegate - Consider cgroup namespaces as delegation boundaries. This option is system wide and can only be set on mount or modified through remount from the init namespace. The mount option is @@ -180,7 +181,6 @@ cgroup v2 currently supports the following mount options. Delegation section for details. memory_localevents - Only populate memory.events with data for the current cgroup, and not any subtrees. This is legacy behaviour, the default behaviour without this option is to include subtree counts. @@ -189,7 +189,6 @@ cgroup v2 currently supports the following mount options. option is ignored on non-init namespace mounts. memory_recursiveprot - Recursively apply memory.min and memory.low protection to entire subtrees, without requiring explicit downward propagation into leaf cgroups. This allows protecting entire @@ -786,7 +785,6 @@ Core Interface Files All cgroup core files are prefixed with "cgroup." cgroup.type - A read-write single value file which exists on non-root cgroups. @@ -954,6 +952,8 @@ All cgroup core files are prefixed with "cgroup." Controllers =========== +.. _cgroup-v2-cpu: + CPU --- @@ -1259,9 +1259,9 @@ PAGE_SIZE multiple when read back. can show up in the middle. Don't rely on items remaining in a fixed position; use the keys to look up specific values! - If the entry has no per-node counter(or not show in the - mempry.numa_stat). We use 'npn'(non-per-node) as the tag - to indicate that it will not show in the mempry.numa_stat. + If the entry has no per-node counter (or not show in the + memory.numa_stat). We use 'npn' (non-per-node) as the tag + to indicate that it will not show in the memory.numa_stat. anon Amount of memory used in anonymous mappings such as @@ -1277,11 +1277,11 @@ PAGE_SIZE multiple when read back. pagetables Amount of memory allocated for page tables. - percpu(npn) + percpu (npn) Amount of memory used for storing per-cpu kernel data structures. - sock(npn) + sock (npn) Amount of memory used in network transmission buffers shmem @@ -1299,6 +1299,10 @@ PAGE_SIZE multiple when read back. Amount of cached filesystem data that was modified and is currently being written back to disk + swapcached + Amount of swap cached in memory. The swapcache is accounted + against both memory and swap usage. + anon_thp Amount of memory used in anonymous mappings backed by transparent hugepages @@ -1329,7 +1333,7 @@ PAGE_SIZE multiple when read back. Part of "slab" that cannot be reclaimed on memory pressure. - slab(npn) + slab (npn) Amount of memory used for storing in-kernel data structures. @@ -1357,39 +1361,39 @@ PAGE_SIZE multiple when read back. workingset_nodereclaim Number of times a shadow node has been reclaimed - pgfault(npn) + pgfault (npn) Total number of page faults incurred - pgmajfault(npn) + pgmajfault (npn) Number of major page faults incurred - pgrefill(npn) + pgrefill (npn) Amount of scanned pages (in an active LRU list) - pgscan(npn) + pgscan (npn) Amount of scanned pages (in an inactive LRU list) - pgsteal(npn) + pgsteal (npn) Amount of reclaimed pages - pgactivate(npn) + pgactivate (npn) Amount of pages moved to the active LRU list - pgdeactivate(npn) + pgdeactivate (npn) Amount of pages moved to the inactive LRU list - pglazyfree(npn) + pglazyfree (npn) Amount of pages postponed to be freed under memory pressure - pglazyfreed(npn) + pglazyfreed (npn) Amount of reclaimed lazyfree pages - thp_fault_alloc(npn) + thp_fault_alloc (npn) Number of transparent hugepages which were allocated to satisfy a page fault. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE is not set. - thp_collapse_alloc(npn) + thp_collapse_alloc (npn) Number of transparent hugepages which were allocated to allow collapsing an existing range of pages. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE is not set. @@ -1558,7 +1562,7 @@ IO Interface Files 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 dbytes=50331648 dios=3021 io.cost.qos - A read-write nested-keyed file with exists only on the root + A read-write nested-keyed file which exists only on the root cgroup. This file configures the Quality of Service of the IO cost @@ -1613,7 +1617,7 @@ IO Interface Files automatic mode can be restored by setting "ctrl" to "auto". io.cost.model - A read-write nested-keyed file with exists only on the root + A read-write nested-keyed file which exists only on the root cgroup. This file configures the cost model of the IO cost model based @@ -2000,10 +2004,12 @@ Cpuset Interface Files cpuset-enabled cgroups. This flag is owned by the parent cgroup and is not delegatable. - It accepts only the following input values when written to. + It accepts only the following input values when written to. - "root" - a partition root - "member" - a non-root member of a partition + ======== ================================ + "root" a partition root + "member" a non-root member of a partition + ======== ================================ When set to be a partition root, the current cgroup is the root of a new partition or scheduling domain that comprises @@ -2044,9 +2050,11 @@ Cpuset Interface Files root to change. On read, the "cpuset.sched.partition" file can show the following values. - "member" Non-root member of a partition - "root" Partition root - "root invalid" Invalid partition root + ============== ============================== + "member" Non-root member of a partition + "root" Partition root + "root invalid" Invalid partition root + ============== ============================== It is a partition root if the first 2 partition root conditions above are true and at least one CPU from "cpuset.cpus" is @@ -2090,7 +2098,7 @@ If the program returns 0, the attempt fails with -EPERM, otherwise it succeeds. An example of BPF_CGROUP_DEVICE program may be found in the kernel -source tree in the tools/testing/selftests/bpf/dev_cgroup.c file. +source tree in the tools/testing/selftests/bpf/progs/dev_cgroup.c file. RDMA @@ -2219,7 +2227,7 @@ Without cgroup namespace, the "/proc/$PID/cgroup" file shows the complete path of the cgroup of a process. In a container setup where a set of cgroups and namespaces are intended to isolate processes the "/proc/$PID/cgroup" file may leak potential system level information -to the isolated processes. For Example:: +to the isolated processes. For example:: # cat /proc/self/cgroup 0::/batchjobs/container_id1 diff --git a/Documentation/admin-guide/cifs/authors.rst b/Documentation/admin-guide/cifs/authors.rst index b02d6dd6c070..5c1d2f0fa7d1 100644 --- a/Documentation/admin-guide/cifs/authors.rst +++ b/Documentation/admin-guide/cifs/authors.rst @@ -5,10 +5,10 @@ Authors Original Author --------------- -Steve French (sfrench@samba.org) +Steve French (smfrench@gmail.com, sfrench@samba.org) The author wishes to express his appreciation and thanks to: -Andrew Tridgell (Samba team) for his early suggestions about smb/cifs VFS +Andrew Tridgell (Samba team) for his early suggestions about SMB/CIFS VFS improvements. Thanks to IBM for allowing me time and test resources to pursue this project, to Jim McDonough from IBM (and the Samba Team) for his help, to the IBM Linux JFS team for explaining many esoteric Linux filesystem features. @@ -51,7 +51,7 @@ Patch Contributors - Ronnie Sahlberg (for SMB3 xattr work, bug fixes, and lots of great work on compounding) - Shirish Pargaonkar (for many ACL patches over the years) - Sachin Prabhu (many bug fixes, including for reconnect, copy offload and security) -- Paulo Alcantara +- Paulo Alcantara (for some excellent work in DFS, and in booting from SMB3) - Long Li (some great work on RDMA, SMB Direct) diff --git a/Documentation/admin-guide/cifs/changes.rst b/Documentation/admin-guide/cifs/changes.rst index 71f2ecb62299..3147bbae9c43 100644 --- a/Documentation/admin-guide/cifs/changes.rst +++ b/Documentation/admin-guide/cifs/changes.rst @@ -3,6 +3,7 @@ Changes ======= See https://wiki.samba.org/index.php/LinuxCIFSKernel for summary -information (that may be easier to read than parsing the output of -"git log fs/cifs") about fixes/improvements to CIFS/SMB2/SMB3 support (changes +information about fixes/improvements to CIFS/SMB2/SMB3 support (changes to cifs.ko module) by kernel version (and cifs internal module version). +This may be easier to read than parsing the output of "git log fs/cifs" +by release. diff --git a/Documentation/admin-guide/cifs/introduction.rst b/Documentation/admin-guide/cifs/introduction.rst index cc2851d93d17..53ea62906aa5 100644 --- a/Documentation/admin-guide/cifs/introduction.rst +++ b/Documentation/admin-guide/cifs/introduction.rst @@ -7,19 +7,19 @@ Introduction protocol which was the successor to the Server Message Block (SMB) protocol, the native file sharing mechanism for most early PC operating systems. New and improved versions of CIFS are now - called SMB2 and SMB3. Use of SMB3 (and later, including SMB3.1.1) - is strongly preferred over using older dialects like CIFS due to - security reasons. All modern dialects, including the most recent, - SMB3.1.1 are supported by the CIFS VFS module. The SMB3 protocol - is implemented and supported by all major file servers - such as all modern versions of Windows (including Windows 2016 - Server), as well as by Samba (which provides excellent - CIFS/SMB2/SMB3 server support and tools for Linux and many other - operating systems). Apple systems also support SMB3 well, as - do most Network Attached Storage vendors, so this network - filesystem client can mount to a wide variety of systems. - It also supports mounting to the cloud (for example - Microsoft Azure), including the necessary security features. + called SMB2 and SMB3. Use of SMB3 (and later, including SMB3.1.1 + the most current dialect) is strongly preferred over using older + dialects like CIFS due to security reasons. All modern dialects, + including the most recent, SMB3.1.1, are supported by the CIFS VFS + module. The SMB3 protocol is implemented and supported by all major + file servers such as Windows (including Windows 2019 Server), as + well as by Samba (which provides excellent CIFS/SMB2/SMB3 server + support and tools for Linux and many other operating systems). + Apple systems also support SMB3 well, as do most Network Attached + Storage vendors, so this network filesystem client can mount to a + wide variety of systems. It also supports mounting to the cloud + (for example Microsoft Azure), including the necessary security + features. The intent of this module is to provide the most advanced network file system function for SMB3 compliant servers, including advanced @@ -27,8 +27,8 @@ Introduction POSIX compliance, secure per-user session establishment, encryption, high performance safe distributed caching (leases/oplocks), optional packet signing, large files, Unicode support and other internationalization - improvements. Since both Samba server and this filesystem client support - the CIFS Unix extensions (and in the future SMB3 POSIX extensions), + improvements. Since both Samba server and this filesystem client support the + CIFS Unix extensions, and the Linux client also suppors SMB3 POSIX extensions, the combination can provide a reasonable alternative to other network and cluster file systems for fileserving in some Linux to Linux environments, not just in Linux to Windows (or Linux to Mac) environments. diff --git a/Documentation/admin-guide/cifs/todo.rst b/Documentation/admin-guide/cifs/todo.rst index 25f11576e7b9..2646ed2e2d3e 100644 --- a/Documentation/admin-guide/cifs/todo.rst +++ b/Documentation/admin-guide/cifs/todo.rst @@ -13,24 +13,26 @@ is a partial list of the known problems and missing features: a) SMB3 (and SMB3.1.1) missing optional features: - - multichannel (started), integration with RDMA - - directory leases (improved metadata caching), started (root dir only) + - multichannel (partially integrated), integration of multichannel with RDMA + - directory leases (improved metadata caching). Currently only implemented for root dir - T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl currently the only two server side copy mechanisms supported) b) improved sparse file support (fiemap and SEEK_HOLE are implemented - but additional features would be supportable by the protocol). + but additional features would be supportable by the protocol such + as FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSERT_RANGE) c) Directory entry caching relies on a 1 second timer, rather than using Directory Leases, currently only the root file handle is cached longer + by leveraging Directory Leases -d) quota support (needs minor kernel change since quota calls - to make it to network filesystems or deviceless filesystems) +d) quota support (needs minor kernel change since quota calls otherwise + won't make it to network filesystems or deviceless filesystems). e) Additional use cases can be optimized to use "compounding" (e.g. open/query/close and open/setinfo/close) to reduce the number of roundtrips to the server and improve performance. Various cases - (stat, statfs, create, unlink, mkdir) already have been improved by + (stat, statfs, create, unlink, mkdir, xattrs) already have been improved by using compounding but more can be done. In addition we could significantly reduce redundant opens by using deferred close (with handle caching leases) and better using reference counters on file @@ -60,7 +62,9 @@ k) Add tools to take advantage of more smb3 specific ioctls and features metadata attributes easier from tools (e.g. extending what was done in smb-info tool). -l) encrypted file support +l) encrypted file support (currently the attribute showing the file is + encrypted on the server is reported, but changing the attribute is not + supported). m) improved stats gathering tools (perhaps integration with nfsometer?) to extend and make easier to use what is currently in /proc/fs/cifs/Stats @@ -69,14 +73,13 @@ n) Add support for claims based ACLs ("DAC") o) mount helper GUI (to simplify the various configuration options on mount) -p) Add support for witness protocol (perhaps ioctl to cifs.ko from user space - tool listening on witness protocol RPC) to allow for notification of share - move, server failover, and server adapter changes. And also improve other - failover scenarios, e.g. when client knows multiple DFS entries point to - different servers, and the server we are connected to has gone down. +p) Expand support for witness protocol to allow for notification of share + move, and server network adapter changes. Currently only notifications by + the witness protocol for server move is supported by the Linux client. q) Allow mount.cifs to be more verbose in reporting errors with dialect - or unsupported feature errors. + or unsupported feature errors. This would now be easier due to the + implementation of the new mount API. r) updating cifs documentation, and user guide. @@ -87,11 +90,10 @@ t) split cifs and smb3 support into separate modules so legacy (and less secure) CIFS dialect can be disabled in environments that don't need it and simplify the code. -v) POSIX Extensions for SMB3.1.1 (started, create and mkdir support added - so far). +v) Additional testing of POSIX Extensions for SMB3.1.1 w) Add support for additional strong encryption types, and additional spnego - authentication mechanisms (see MS-SMB2) + authentication mechanisms (see MS-SMB2). GCM-256 is now partially implemented. x) Finish support for SMB3.1.1 compression diff --git a/Documentation/admin-guide/cifs/usage.rst b/Documentation/admin-guide/cifs/usage.rst index b6d9f02bc12b..13783dc68ab7 100644 --- a/Documentation/admin-guide/cifs/usage.rst +++ b/Documentation/admin-guide/cifs/usage.rst @@ -83,7 +83,7 @@ and encrypted shares and stronger signing and authentication algorithms. There are additional mount options that may be helpful for SMB3 to get improved POSIX behavior (NB: can use vers=3.0 to force only SMB3, never 2.1): - ``mfsymlinks`` and ``cifsacl`` and ``idsfromsid`` + ``mfsymlinks`` and either ``cifsacl`` or ``modefromsid`` (usually with ``idsfromsid``) Allowing User Mounts ==================== diff --git a/Documentation/admin-guide/cpu-load.rst b/Documentation/admin-guide/cpu-load.rst index f3ada90e9ca8..21a984337080 100644 --- a/Documentation/admin-guide/cpu-load.rst +++ b/Documentation/admin-guide/cpu-load.rst @@ -107,7 +107,7 @@ will lead to quite erratic information inside ``/proc/stat``:: References ---------- -- http://lkml.org/lkml/2007/2/12/6 +- https://lore.kernel.org/r/loom.20070212T063225-663@post.gmane.org - Documentation/filesystems/proc.rst (1.8) diff --git a/Documentation/admin-guide/device-mapper/dm-crypt.rst b/Documentation/admin-guide/device-mapper/dm-crypt.rst index 1a6753b76dbb..aa2d04d95df6 100644 --- a/Documentation/admin-guide/device-mapper/dm-crypt.rst +++ b/Documentation/admin-guide/device-mapper/dm-crypt.rst @@ -67,7 +67,7 @@ Parameters:: the value passed in <key_size>. <key_type> - Either 'logon', 'user' or 'encrypted' kernel key type. + Either 'logon', 'user', 'encrypted' or 'trusted' kernel key type. <key_description> The kernel keyring key description crypt target should look for diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst index 2cc5488acbd9..8db172efa272 100644 --- a/Documentation/admin-guide/device-mapper/dm-integrity.rst +++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst @@ -143,8 +143,8 @@ recalculate journal_crypt:algorithm(:key) (the key is optional) Encrypt the journal using given algorithm to make sure that the attacker can't read the journal. You can use a block cipher here - (such as "cbc(aes)") or a stream cipher (for example "chacha20", - "salsa20" or "ctr(aes)"). + (such as "cbc(aes)") or a stream cipher (for example "chacha20" + or "ctr(aes)"). The journal contains history of last writes to the block device, an attacker reading the journal could see the last sector numbers @@ -186,6 +186,17 @@ fix_padding space-efficient. If this option is not present, large padding is used - that is for compatibility with older kernels. +fix_hmac + Improve security of internal_hash and journal_mac: + + - the section number is mixed to the mac, so that an attacker can't + copy sectors from one journal section to another journal section + - the superblock is protected by journal_mac + - a 16-byte salt stored in the superblock is mixed to the mac, so + that the attacker can't detect that two disks have the same hmac + key and also to disallow the attacker to move sectors from one + disk to another + legacy_recalculate Allow recalculating of volumes with HMAC keys. This is disabled by default for security reasons - an attacker could modify the volume, diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index 682ab28b5c94..1132796a8d96 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -60,7 +60,7 @@ Note that for the special case of a range one can split the range into equal sized groups and for each group use some amount from the beginning of that group: - <cpu number>-cpu number>:<used size>/<group size> + <cpu number>-<cpu number>:<used size>/<group size> For example one can add to the command line following parameter: diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a10b545c2070..04545725f187 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -373,6 +373,12 @@ arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards Format: <io>,<irq>,<nodeID> + arm64.nobti [ARM64] Unconditionally disable Branch Target + Identification support + + arm64.nopauth [ARM64] Unconditionally disable Pointer Authentication + support + ataflop= [HW,M68k] atarimouse= [HW,MOUSE] Atari Mouse @@ -600,7 +606,7 @@ kernel/dma/contiguous.c cma_pernuma=nn[MG] - [ARM64,KNL] + [ARM64,KNL,CMA] Sets the size of kernel per-numa memory area for contiguous memory allocations. A value of 0 disables per-numa CMA altogether. And If this option is not @@ -802,13 +808,14 @@ insecure, please do not use on production kernels. debug_locks_verbose= - [KNL] verbose self-tests - Format=<0|1> + [KNL] verbose locking self-tests + Format: <int> Print debugging info while doing the locking API self-tests. - We default to 0 (no extra messages), setting it to - 1 will print _a lot_ more information - normally - only useful to kernel developers. + Bitmask for the various LOCKTYPE_ tests. Defaults to 0 + (no extra messages), setting it to -1 (all bits set) + will print _a_lot_ more information - normally only + useful to lockdep developers. debug_objects [KNL] Enable object debugging @@ -944,12 +951,6 @@ causing system reset or hang due to sending INIT from AP to BSP. - perf_v4_pmi= [X86,INTEL] - Format: <bool> - Disable Intel PMU counter freezing feature. - The feature only exists starting from - Arch Perfmon v4 (Skylake and newer). - disable_ddw [PPC/PSERIES] Disable Dynamic DMA Window support. Use this to workaround buggy firmware. @@ -1433,6 +1434,11 @@ to enforce probe and suspend/resume ordering. rpm -- Like "on", but also use to order runtime PM. + fw_devlink.strict=<bool> + [KNL] Treat all inferred dependencies as mandatory + dependencies. This only applies for fw_devlink=on|rpm. + Format: <bool> + gamecon.map[2|3]= [HW,JOY] Multisystem joystick and NES/SNES/PSX pad support via parallel port (up to 5 devices per port) @@ -1524,12 +1530,12 @@ hpet_mmap= [X86, HPET_MMAP] Allow userspace to mmap HPET registers. Default set by CONFIG_HPET_MMAP_DEFAULT. - hugetlb_cma= [HW] The size of a cma area used for allocation + hugetlb_cma= [HW,CMA] The size of a CMA area used for allocation of gigantic hugepages. Format: nn[KMGTPE] - Reserve a cma area of given size and allocate gigantic - hugepages using the cma allocator. If enabled, the + Reserve a CMA area of given size and allocate gigantic + hugepages using the CMA allocator. If enabled, the boot-time allocation of gigantic hugepages is skipped. hugepages= [HW] Number of HugeTLB pages to allocate at boot. @@ -1673,6 +1679,12 @@ In such case C2/C3 won't be used again. idle=nomwait: Disable mwait for CPU C-states + idxd.sva= [HW] + Format: <bool> + Allow force disabling of Shared Virtual Memory (SVA) + support for the idxd driver. By default it is set to + true (1). + ieee754= [MIPS] Select IEEE Std 754 conformance mode Format: { strict | legacy | 2008 | relaxed } Default: strict @@ -1746,7 +1758,7 @@ ima_policy= [IMA] The builtin policies to load during IMA setup. Format: "tcb | appraise_tcb | secure_boot | - fail_securely" + fail_securely | critical_data" The "tcb" policy measures all programs exec'd, files mmap'd for exec, and all files opened with the read @@ -1765,6 +1777,9 @@ filesystems with the SB_I_UNVERIFIABLE_SIGNATURE flag. + The "critical_data" policy measures kernel integrity + critical data. + ima_tcb [IMA] Deprecated. Use ima_policy= instead. Load a policy which meets the needs of the Trusted Computing Base. This means IMA will measure all @@ -2257,6 +2272,9 @@ kvm-arm.mode= [KVM,ARM] Select one of KVM/arm64's modes of operation. + nvhe: Standard nVHE-based mode, without support for + protected guests. + protected: nVHE-based mode with support for guests whose state is kept private from the host. Not valid if the kernel is running in EL2. @@ -3266,9 +3284,14 @@ parameter, xsave area per process might occupy more memory on xsaves enabled systems. - nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or - wfi(ARM) instruction doesn't work correctly and not to - use it. This is also useful when using JTAG debugger. + nohlt [ARM,ARM64,MICROBLAZE,SH] Forces the kernel to busy wait + in do_idle() and not use the arch_cpu_idle() + implementation; requires CONFIG_GENERIC_IDLE_POLL_SETUP + to be effective. This is useful on platforms where the + sleep(SH) or wfi(ARM,ARM64) instructions do not work + correctly or when doing power measurements to evalute + the impact of the sleep instructions. This is also + useful when using JTAG debugger. no_file_caps Tells the kernel not to honor file capabilities. The only way then for a file to be executed with privilege @@ -3281,6 +3304,21 @@ in certain environments such as networked servers or real-time systems. + no_hash_pointers + Force pointers printed to the console or buffers to be + unhashed. By default, when a pointer is printed via %p + format string, that pointer is "hashed", i.e. obscured + by hashing the pointer value. This is a security feature + that hides actual kernel addresses from unprivileged + users, but it also makes debugging the kernel more + difficult since unequal pointers can no longer be + compared. However, if this command-line option is + specified, then all normal pointers will have their true + value printed. Pointers printed via %pK may still be + hashed. This option should only be specified when + debugging the kernel. Please do not use on production + kernels. + nohibernate [HIBERNATION] Disable hibernation and resume. nohz= [KNL] Boottime enable/disable dynamic ticks @@ -3458,20 +3496,6 @@ For example, to override I2C bus2: omap_mux=i2c2_scl.i2c2_scl=0x100,i2c2_sda.i2c2_sda=0x100 - oprofile.timer= [HW] - Use timer interrupt instead of performance counters - - oprofile.cpu_type= Force an oprofile cpu type - This might be useful if you have an older oprofile - userland or if you want common events. - Format: { arch_perfmon } - arch_perfmon: [X86] Force use of architectural - perfmon on Intel CPUs instead of the - CPU specific event set. - timer: [X86] Force use of architectural NMI - timer mode (see also oprofile.timer - for generic hr timer mode) - oops=panic Always panic on oopses. Default is to just kill the process, but there is a small probability of deadlocking the machine. @@ -3916,6 +3940,13 @@ Format: {"off"} Disable Hardware Transactional Memory + preempt= [KNL] + Select preemption mode if you have CONFIG_PREEMPT_DYNAMIC + none - Limited to cond_resched() calls + voluntary - Limited to cond_resched() and might_sleep() calls + full - Any section that isn't explicitly preempt disabled + can be preempted anytime. + print-fatal-signals= [KNL] debug: print fatal signals @@ -4092,6 +4123,10 @@ value, meaning that RCU_SOFTIRQ is used by default. Specify rcutree.use_softirq=0 to use rcuc kthreads. + But note that CONFIG_PREEMPT_RT=y kernels disable + this kernel boot parameter, forcibly setting it + to zero. + rcutree.rcu_fanout_exact= [KNL] Disable autobalancing of the rcu_node combining tree. This is used by rcutorture, and might @@ -4179,12 +4214,6 @@ Set wakeup interval for idle CPUs that have RCU callbacks (RCU_FAST_NO_HZ=y). - rcutree.rcu_idle_lazy_gp_delay= [KNL] - Set wakeup interval for idle CPUs that have - only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y). - Lazy RCU callbacks are those which RCU can - prove do nothing more than free memory. - rcutree.rcu_kick_kthreads= [KNL] Cause the grace-period kthread to get an extra wake_up() if it sleeps three times longer than @@ -4338,6 +4367,14 @@ stress RCU, they don't participate in the actual test, hence the "fake". + rcutorture.nocbs_nthreads= [KNL] + Set number of RCU callback-offload togglers. + Zero (the default) disables toggling. + + rcutorture.nocbs_toggle= [KNL] + Set the delay in milliseconds between successive + callback-offload toggling attempts. + rcutorture.nreaders= [KNL] Set number of RCU readers. The value -1 selects N-1, where N is the number of CPUs. A value @@ -4470,6 +4507,13 @@ only normal grace-period primitives. No effect on CONFIG_TINY_RCU kernels. + But note that CONFIG_PREEMPT_RT=y kernels enables + this kernel boot parameter, forcibly setting + it to the value one, that is, converting any + post-boot attempt at an expedited RCU grace + period to instead use normal non-expedited + grace-period processing. + rcupdate.rcu_task_ipi_delay= [KNL] Set time in jiffies during which RCU tasks will avoid sending IPIs, starting with the beginning @@ -4557,6 +4601,12 @@ refscale.verbose= [KNL] Enable additional printk() statements. + refscale.verbose_batched= [KNL] + Batch the additional printk() statements. If zero + (the default) or negative, print everything. Otherwise, + print every Nth verbose statement, where N is the value + specified. + relax_domain_level= [KNL, SMP] Set scheduler's default relax_domain_level. See Documentation/admin-guide/cgroup-v1/cpusets.rst. @@ -4854,14 +4904,6 @@ last alloc / free. For more information see Documentation/vm/slub.rst. - slub_memcg_sysfs= [MM, SLUB] - Determines whether to enable sysfs directories for - memory cgroup sub-caches. 1 to enable, 0 to disable. - The default is determined by CONFIG_SLUB_MEMCG_SYSFS_ON. - Enabling this can lead to a very high number of debug - directories and files being created under - /sys/kernel/slub. - slub_max_order= [MM, SLUB] Determines the maximum allowed order for slabs. A high setting may cause OOMs due to memory @@ -5140,6 +5182,12 @@ growing up) the main stack are reserved for no other mapping. Default value is 256 pages. + stack_depot_disable= [KNL] + Setting this to true through kernel command line will + disable the stack depot thereby saving the static memory + consumed by the stack hash table. By default this is set + to false. + stacktrace [FTRACE] Enabled the stack tracer on boot up. @@ -5331,6 +5379,14 @@ are running concurrently, especially on systems with rotating-rust storage. + torture.verbose_sleep_frequency= [KNL] + Specifies how many verbose printk()s should be + emitted between each sleep. The default of zero + disables verbose-printk() sleeping. + + torture.verbose_sleep_duration= [KNL] + Duration of each verbose-printk() sleep in jiffies. + tp720= [HW,PS2] tpm_suspend_pcr=[HW,TPM] @@ -5932,12 +5988,6 @@ default x2apic cluster mode on platforms supporting x2apic. - x86_intel_mid_timer= [X86-32,APBT] - Choose timer option for x86 Intel MID platform. - Two valid options are apbt timer only and lapic timer - plus one apbt timer for broadcast timer. - x86_intel_mid_timer=apbt_only | lapic_and_apbt - xen_512gb_limit [KNL,X86-64,XEN] Restricts the kernel running paravirtualized under Xen to use only up to 512 GB of RAM. The reason to do so is diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst index dc36aeb65d0a..531f689311f2 100644 --- a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst +++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst @@ -273,7 +273,7 @@ To reduce its OS jitter, do any of the following: However, there is an RFC patch from Christoph Lameter (based on an earlier one from Gilad Ben-Yossef) that reduces or even eliminates vmstat overhead for some - workloads at https://lkml.org/lkml/2013/9/4/379. + workloads at https://lore.kernel.org/r/00000140e9dfd6bd-40db3d4f-c1be-434f-8132-7820f81bb586-000000@email.amazonses.com. e. If running on high-end powerpc servers, build with CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS daemon from running on each CPU every second or so. diff --git a/Documentation/admin-guide/laptops/thinkpad-acpi.rst b/Documentation/admin-guide/laptops/thinkpad-acpi.rst index 5fe1ade88c17..91fd6846ce17 100644 --- a/Documentation/admin-guide/laptops/thinkpad-acpi.rst +++ b/Documentation/admin-guide/laptops/thinkpad-acpi.rst @@ -51,6 +51,7 @@ detailed description): - UWB enable and disable - LCD Shadow (PrivacyGuard) enable and disable - Lap mode sensor + - Setting keyboard language A compatibility table by model and feature is maintained on the web site, http://ibm-acpi.sf.net/. I appreciate any success or failure @@ -1466,6 +1467,30 @@ Sysfs notes rfkill controller switch "tpacpi_uwb_sw": refer to Documentation/driver-api/rfkill.rst for details. + +Setting keyboard language +------------------------- + +sysfs: keyboard_lang + +This feature is used to set keyboard language to ECFW using ASL interface. +Fewer thinkpads models like T580 , T590 , T15 Gen 1 etc.. has "=", "(', +")" numeric keys, which are not displaying correctly, when keyboard language +is other than "english". This is because the default keyboard language in ECFW +is set as "english". Hence using this sysfs, user can set the correct keyboard +language to ECFW and then these key's will work correctly. + +Example of command to set keyboard language is mentioned below:: + + echo jp > /sys/devices/platform/thinkpad_acpi/keyboard_lang + +Text corresponding to keyboard layout to be set in sysfs are: be(Belgian), +cz(Czech), da(Danish), de(German), en(English), es(Spain), et(Estonian), +fr(French), fr-ch(French(Switzerland)), hu(Hungarian), it(Italy), jp (Japan), +nl(Dutch), nn(Norway), pl(Polish), pt(portugese), sl(Slovenian), sv(Sweden), +tr(Turkey) + + Adaptive keyboard ----------------- diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index 5c4432c96c4b..5307f90738aa 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -160,16 +160,16 @@ Under each memory block, you can see 5 files: "online_movable", "online", "offline" command which will be performed on all sections in the block. -``phys_device`` read-only: designed to show the name of physical memory - device. This is not well implemented now. -``removable`` read-only: contains an integer value indicating - whether the memory block is removable or not - removable. A value of 1 indicates that the memory - block is removable and a value of 0 indicates that - it is not removable. A memory block is removable only if - every section in the block is removable. -``valid_zones`` read-only: designed to show which zones this memory block - can be onlined to. +``phys_device`` read-only: legacy interface only ever used on s390x to + expose the covered storage increment. +``removable`` read-only: legacy interface that indicated whether a memory + block was likely to be offlineable or not. Newer kernel + versions return "1" if and only if the kernel supports + memory offlining. +``valid_zones`` read-only: designed to show by which zone memory provided by + a memory block is managed, and to show by which zone memory + provided by an offline memory block could be managed when + onlining. The first column shows it`s default zone. diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index 904e4eb37f99..34aa334320ca 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -72,7 +72,7 @@ monitoring and observability operations, thus, bypass *scope* permissions checks in the kernel. CAP_PERFMON implements the principle of least privilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and observability operations in the kernel and provides a secure approach to -perfomance monitoring and observability in the system. +performance monitoring and observability in the system. For backward compatibility reasons the access to perf_events monitoring and observability operations is also open for CAP_SYS_ADMIN privileged diff --git a/Documentation/admin-guide/perf/arm-cmn.rst b/Documentation/admin-guide/perf/arm-cmn.rst index 0e4809346014..796e25b7027b 100644 --- a/Documentation/admin-guide/perf/arm-cmn.rst +++ b/Documentation/admin-guide/perf/arm-cmn.rst @@ -17,7 +17,7 @@ PMU events ---------- The PMU driver registers a single PMU device for the whole interconnect, -see /sys/bus/event_source/devices/arm_cmn. Multi-chip systems may link +see /sys/bus/event_source/devices/arm_cmn_0. Multi-chip systems may link more than one CMN together via external CCIX links - in this situation, each mesh counts its own events entirely independently, and additional PMU devices will be named arm_cmn_{1..n}. diff --git a/Documentation/admin-guide/spkguide.txt b/Documentation/admin-guide/spkguide.txt index 5ff6a0fe87d1..977ab3f5a0a8 100644 --- a/Documentation/admin-guide/spkguide.txt +++ b/Documentation/admin-guide/spkguide.txt @@ -1033,7 +1033,9 @@ speakup + keypad 3, you would hear: The speakup key is depressed, so the name of the key state is speakup. This part of the message comes from the states collection. -14.2. Loading Your Own Messages +14.2. Changing language + +14.2.1. Loading Your Own Messages The files under the i18n subdirectory all follow the same format. They consist of lines, with one message per line. @@ -1066,8 +1068,50 @@ echo '1 azul' > /speakup/i18n/colors The next time that Speakup says message 1 from the colors group, it will say "azul", rather than "blue." +14.2.2. Choose a language + In the future, translations into various languages will be made available, -and most users will just load the files necessary for their language. +and most users will just load the files necessary for their language. So far, +only French language is available beyond native Canadian English language. + +French is only available after you are logged in. + +Canadian English is the default language. To toggle another language, +download the source of Speakup and untar it in your home directory. The +following command should let you do this: + +tar xvjf speakup-<version>.tar.bz2 + +where <version> is the version number of the application. + +Next, change to the newly created directory, then into the tools/ directory, and +run the script speakup_setlocale. You are asked the language that you want to +use. Type the number associated to your language (e.g. fr for French) then press +Enter. Needed files are copied in the i18n directory. + +Note: the speakupconf must be installed on your system so that settings are saved. +Otherwise, you will have an error: your language will be loaded but you will +have to run the script again every time Speakup restarts. +See section 16.1. for information about speakupconf. + +You will have to repeat these steps for any change of locale, i.e. if you wish +change the speakup's language or charset (iso-8859-15 ou UTF-8). + +If you wish store the settings, note that at your next login, you will need to +do: + +speakup load + +Alternatively, you can add the above line to your file +~/.bashrc or ~/.bash_profile. + +If your system administrator ran himself the script, all the users will be able +to change from English to the language choosed by root and do directly +speakupconf load (or add this to the ~/.bashrc or +~/.bash_profile file). If there are several languages to handle, the +administrator (or every user) will have to run the first steps until speakupconf +save, choosing the appropriate language, in every user's home directory. Every +user will then be able to do speakupconf load, Speakup will load his own settings. 14.3. No Support for Non-Western-European Languages diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst index f48277a0a850..2a501c9ddc55 100644 --- a/Documentation/admin-guide/sysctl/fs.rst +++ b/Documentation/admin-guide/sysctl/fs.rst @@ -380,5 +380,5 @@ This configuration option sets the maximum number of "watches" that are allowed for each user. Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes on a 64bit one. -The current default value for max_user_watches is the 1/32 of the available -low memory, divided for the "watch" cost in bytes. +The current default value for max_user_watches is the 1/25 (4%) of the +available low memory, divided for the "watch" cost in bytes. diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index e35a3f2fb006..586cd4b86428 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -983,11 +983,11 @@ that benefit from having their data cached, zone_reclaim_mode should be left disabled as the caching effect is likely to be more important than data locality. -zone_reclaim may be enabled if it's known that the workload is partitioned -such that each partition fits within a NUMA node and that accessing remote -memory would cause a measurable performance reduction. The page allocator -will then reclaim easily reusable pages (those page cache pages that are -currently not used) before allocating off node pages. +Consider enabling one or more zone_reclaim mode bits if it's known that the +workload is partitioned such that each partition fits within a NUMA node +and that accessing remote memory would cause a measurable performance +reduction. The page allocator will take additional actions before +allocating off node pages. Allowing zone reclaim to write out pages stops processes that are writing large amounts of data from dirtying pages on other nodes. Zone diff --git a/Documentation/admin-guide/thunderbolt.rst b/Documentation/admin-guide/thunderbolt.rst index 613cb24c76c7..f18e881373c4 100644 --- a/Documentation/admin-guide/thunderbolt.rst +++ b/Documentation/admin-guide/thunderbolt.rst @@ -47,6 +47,9 @@ be DMA masters and thus read contents of the host memory without CPU and OS knowing about it. There are ways to prevent this by setting up an IOMMU but it is not always available for various reasons. +Some USB4 systems have a BIOS setting to disable PCIe tunneling. This is +treated as another security level (nopcie). + The security levels are as follows: none @@ -77,6 +80,10 @@ The security levels are as follows: Display Port in a dock. All PCIe links downstream of the dock are removed. + nopcie + PCIe tunneling is disabled/forbidden from the BIOS. Available in some + USB4 systems. + The current security level can be read from ``/sys/bus/thunderbolt/devices/domainX/security`` where ``domainX`` is the Thunderbolt domain the host controller manages. There is typically @@ -153,6 +160,22 @@ If the user still wants to connect the device they can either approve the device without a key or write a new key and write 1 to the ``authorized`` file to get the new key stored on the device NVM. +De-authorizing devices +---------------------- +It is possible to de-authorize devices by writing ``0`` to their +``authorized`` attribute. This requires support from the connection +manager implementation and can be checked by reading domain +``deauthorization`` attribute. If it reads ``1`` then the feature is +supported. + +When a device is de-authorized the PCIe tunnel from the parent device +PCIe downstream (or root) port to the device PCIe upstream port is torn +down. This is essentially the same thing as PCIe hot-remove and the PCIe +toplogy in question will not be accessible anymore until the device is +authorized again. If there is storage such as NVMe or similar involved, +there is a risk for data loss if the filesystem on that storage is not +properly shut down. You have been warned! + DMA protection utilizing IOMMU ------------------------------ Recent systems from 2018 and forward with Thunderbolt ports may natively diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst index 86de8a1ad91c..5422407a96d7 100644 --- a/Documentation/admin-guide/xfs.rst +++ b/Documentation/admin-guide/xfs.rst @@ -284,6 +284,9 @@ The following sysctls are available for the XFS filesystem: removes unused preallocation from clean inodes and releases the unused space back to the free pool. + fs.xfs.speculative_cow_prealloc_lifetime + This is an alias for speculative_prealloc_lifetime. + fs.xfs.error_level (Min: 0 Default: 3 Max: 11) A volume knob for error reporting when internal errors occur. This will generate detailed messages & backtraces for filesystem @@ -356,12 +359,13 @@ The following sysctls are available for the XFS filesystem: Deprecated Sysctls ================== -=========================== ================ - Name Removal Schedule -=========================== ================ -fs.xfs.irix_sgid_inherit September 2025 -fs.xfs.irix_symlink_mode September 2025 -=========================== ================ +=========================================== ================ + Name Removal Schedule +=========================================== ================ +fs.xfs.irix_sgid_inherit September 2025 +fs.xfs.irix_symlink_mode September 2025 +fs.xfs.speculative_cow_prealloc_lifetime September 2025 +=========================================== ================ Removed Sysctls @@ -495,3 +499,45 @@ the class and error context. For example, the default values for "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults to "fail immediately" behaviour. This is done because ENODEV is a fatal, unrecoverable error no matter how many times the metadata IO is retried. + +Workqueue Concurrency +===================== + +XFS uses kernel workqueues to parallelize metadata update processes. This +enables it to take advantage of storage hardware that can service many IO +operations simultaneously. This interface exposes internal implementation +details of XFS, and as such is explicitly not part of any userspace API/ABI +guarantee the kernel may give userspace. These are undocumented features of +the generic workqueue implementation XFS uses for concurrency, and they are +provided here purely for diagnostic and tuning purposes and may change at any +time in the future. + +The control knobs for a filesystem's workqueues are organized by task at hand +and the short name of the data device. They all can be found in: + + /sys/bus/workqueue/devices/${task}!${device} + +================ =========== + Task Description +================ =========== + xfs_iwalk-$pid Inode scans of the entire filesystem. Currently limited to + mount time quotacheck. + xfs-blockgc Background garbage collection of disk space that have been + speculatively allocated beyond EOF or for staging copy on + write operations. +================ =========== + +For example, the knobs for the quotacheck workqueue for /dev/nvme0n1 would be +found in /sys/bus/workqueue/devices/xfs_iwalk-1111!nvme0n1/. + +The interesting knobs for XFS workqueues are as follows: + +============ =========== + Knob Description +============ =========== + max_active Maximum number of background threads that can be started to + run the work. + cpumask CPUs upon which the threads are allowed to run. + nice Relative priority of scheduling the threads. These are the + same nice levels that can be applied to userspace processes. +============ =========== |