首页 > Linux操作系统 > Linux操作系统 > 准确理解VMM及Paging Spaces

准确理解VMM及Paging Spaces

原创 Linux操作系统 作者:fjmingyang 时间:2019-04-30 13:24:05 0 删除 编辑

As an AIX* educator, one of the most frequently asked questions I hear is: "How much paging space do I need to configure on a modern AIX server?" Experts have made numerous attempts to quantify this value as one or two more times the amount of real memory. How relevant or accurate are those predictions? Suppose a customer is purchasing a fully loaded p690 with 256 GB of real memory. Does it seem reasonable that 256 or 512 GB of paging disks be purchased also?

My response to the paging space question is: "I don't want to use any paging space." Why? Because, in general, if paging space is used, performance and throughput suffer. Therefore, it's best to eliminate page-outs to paging space when possible.

In this article, I'll explain the use of paging space, tips for correctly identifying memory over commitment (versus "fake" paging, since fake paging is the unnecessary use of paging space caused by inappropriate settings in vmtune), and how to configure paging space(s) on modern AIX servers if necessary. Misunderstanding this topic can lead to poorly configured, and potentially, poorly performing, servers.

AIX 4.3.2+ Deferred Paging Space
Prior to AIX Version 4.3.2, paging space was allocated for the running process when the memory page was requested or accessed. This required a paging-space slot for all pages in real memory, if saving the page image was necessary. After Version 4.3.2, paging-space allocation policy changed to allow a deferred paging-space allocation with the introduction of 64-bit hardware, which can address substantial amounts of real memory and makes the paging-space-to-real-memory ratio a fallacy.

Currently, the default paging-space-slot-allocation method is called the Deferred Page Space Allocation (DPSA) policy. DPSA delays allocating paging space slots until it's necessary to page out the frame. to paging space. Since only modified frames are paged out to paging space, this policy results in no wasted paging space allocation.

The result is that on some systems, paging space may never be needed, even if all pages accessed have been modified. This is particularly true for systems with large amounts of real memory. A situation of over-commitment of paging space can occur where more working storage is accessed than available memory.

Introduction to the Virtual Memory Manager
Before thoroughly explaining paging space, some background information on the virtual memory manager (VMM) system and paging activities is required. 
Figure 1(see below) shows an overview of the VMM system. VMM ensures that the desired piece of data is loaded into memory (to be subsequently fetched into the L1 cache(s) ), when it's required by the processor(s). To meet this demand, the VMM guesses which pages are required next and pages them in using locality of reference and working sets (see, "Understanding Locality of Reference and Working Sets").

When the required page isn't brought into memory, the result is an initial page fault (this usually occurs at the translation lookaside buffer or similar hardware cache). Most initial page faults are resolved without physical I/O. Generally, the desired pages are in memory, via direct memory access (DMA) used on high-speed adapters. The system doesn't know the pieces of data are in memory. It's the VMM's job to locate which page is desired and then page it into memory.

What is Page Stealing?
For the memory subsystem to work effectively, a buffer of available frames must be maintained in real memory. The VMM maintains a threshold parameter called minfree, which has a default value of 120 pages and defines the minimum number of physical memory frames on the free list. When the number of free memory frames drops below minfree, the VMM runs a page-stealer routine, which uses a page-replacement algorithm to free up memory frames.

A common page-replacement algorithm is based on the Least Recently Used (LRU) method. In AIX, the algorithm is called the clock-hand algorithm because it acts like a clock hand, constantly pointing at frames in order. When the number of free memory frames drops below the minfree parameter, the page stealer begins scanning entries in the page frame. table (PFT) looking for frames to "steal." Figure 2 shows a svmon view of memory-frame. status.

The page stealer determines whether a page should be stolen by examining the reference flag. If the reference flag is turned on, it indicates that the page has been referenced recently. In that case, the page stealer turns off the reference flag and moves on to the next frame. If the reference flag is still turned off the next time the page stealer examines that frame, the page is scheduled to be stolen.

Where the page stealer finds a PFT entry with the reference flag turned off (on the first pass), it selects the frame. for stealing. If the dirty flag (modbit) is turned off, the frame. is placed on the free list. If the dirty flag is turned on, the page is scheduled for pageout, and the frame. is placed on the free list. When the number of pages in the free list reaches maxfree, which has a default value of 128 pages, the page-stealing operations are suspended.

The page replacement algorithm keeps track of both initial page faults and re-page faults by using a history buffer containing the IDs of the most recent page faults. The VMM balances file (persistent storage) page-outs with data (working storage) page-outs.

Page replacement is done within the scope of the thread if it's running on a uniprocessor. On a multiprocessor, page replacement is done via the lrud kproc, which is dispatched to a CPU when the minfree threshold has been reached. Starting in AIX 4.3.3, the lrud kproc is multi-threaded with one thread per memory pool.

AIX Memory Buckets
On large memory systems, scanning the PFT becomes inefficient because of the relative size of the table. Starting in 4.3.0, the lrubucket parameter was added. This parameter effectively breaks up memory into buckets of frames, with the default size of 131072 4-KB frames. The memory bucket size is based on the number of CPUs and the amount of real memory. The number of buckets will be the maximum of:

  • Number of CPUs/8 or RAM in GB/16, but
  • Not more than the number of CPUs, and
  • Not less than one.

The page stealer scans frames in a bucket and then starts over on that bucket for a second pass, before moving to the next bucket.

Carving Memory in AIX
One comment that I hear frequently from students is, " memory is not just memory in AIX." That's correct because memory is logically segregated into various components. To understand paging space use, you must understand memory and the various logical distinctions.

Memory is carved up into 4096-byte chunks (AIX 5.1 allows for 16 KB paging model in the VMM scheme) called frames or pages. A 4-KB chunk of data in memory is called a frame. If the same 4-KB chunk of data later resides on paging space it's a page or slot. The terms frames, slots and pages are used interchangeably in this article.
Memory is carved into pinned pages, persistent pages, working pages and NFS client pages (see Figure 3 below).

  • Pinned memory consists of memory frames that always reside in real memory. They're never paged out to backing store and have been marked ineligible to the page stealer for replacement. Some examples include pages belonging to various kernel functions, the swapper process (PID 0), device drivers, network buffers, etc.
  • Persistent pages represent a program's code (called instruction or text pages). A program in machine code (the representation of a computer program that's actually read and interpreted by a computer) consists of a sequence of machine instructions, which are binary strings normally of the size of a 32- or 64-bit word. The instructions also are referred to as program " text." The basic execution cycle consists of fetching the next instruction from main memory, decoding it and then executing it. A program's text pages can't be modified (i.e., they're read-only), and are therefore never paged out. Text pages that are no longer needed are purged. Data-file pages also are comprised of persistent pages. These can be modified, and thus, paged out.
  • Working pages are transient entities that exist only during their use by a process. They don't have inodes and therefore no permanent disk-storage location. Process stack and data regions (data are the user variables of the program and their values, and the system information about the process) are mapped to working segments, as are the kernel text segment, kernel-extension text segments, shared library text and data segment. These segments are backed by page space.
  • NFS Client pages are used to map remote files (e.g., files that are accessed via NFS), including remote executables. Remote file pages can be either text or data file pages. Remote data file pages from these segments are backed over the network to their permanent location, not on the local disk or paging space. CD-ROM page-ins and compressed pages are classified as client pages.

Working pages and persistent pages can be combined and classified as noncomputational and computational memory (see Figure 4 below). Noncomputational pages represent pages from a file (normally on DASD). Noncomputational pages are also called data file pages. They are persistent and backed to the original file. These pages can be modified. Computational pages are made up of processes' text or data pages (i.e., persistent and working storage). The size of computational memory indicates the magnitude of the memory footprint of the applications residing on a system.

Memory pages can be referred to as being in a particular state, either modified or not. A clean page is a page in memory that hasn't been modified since it was paged in. A clean page doesn't need to be paged out and is simply purged. A dirty page is a page in memory that has been modified since it was paged in from backing store. Dirty pages are only applicable to working pages and data file pages. This follows because program text is read-only, non-modifiable, and is simply purged when no longer needed.

Because paging-space usage is the topic, the key point to understand memory is the notion of computational memory. Because computational memory is comprised of working storage and non-modifiable persistent storage, it's the working pages (which are backed to paging space) that are of interest. Processes that use large amounts of working storage (e.g., RDBMS, Java*, Lotus* Notes* databases or binaries that have been compiled or patched into what is called the large data model format) are of particular interest because incorrect VMM configuration can lead to high paging rates, running out of paging space or thrashing. The normal 32-bit process image in AIX consists of 16 256 MB segments. The process private segment (segment two) contains the data and stack for the running process. This one segment effectively limits the heap to be less than 256 MB. In large data model format, segments 3-12 and 14 can be used for the heap. This potentially gives the running process 2.75 GB for data and 256 MB for stack. All of this could be working storage.

The composition of the pages in memory is also controlled by the VMM. Two parameters control the ratio of computational and noncomputational pages--minperm and maxperm. The default setting for minperm is 20, and maxperm is 80. These settings would be considered good for a data-file caching server, such as an NFS server. The values mean:

  • If the amount of noncomputational memory is greater than 80 percent, the page stealer will page-out only noncomputational memory pages.
  • If the amount of noncomputational memory pages is greater than or equal to 20 percent and less than or equal to 80 percent, only noncomputational pages are stolen--unless the repage rate for noncomputational pages is higher than that for computational pages. In that case, computational pages are paged out.
  • If the amount of noncomputational pages is less than 20 percent, both computational and noncomputational pages are paged-out without regard to repage rate.

Part two of this two-part article, scheduled for the June issue, covers memory over commitment, excessive paging space use, determining a page problem versus a memory problem and more.

来自 “ ITPUB博客 ” ,链接:,如需转载,请注明出处,否则将追究法律责任。

上一篇: VI 大全


  • 博文量
  • 访问量