This post demystifies the Linux /proc/<pid>/smaps file and its Shared_Clean, Shared_Dirty, Private_Clean and Private_Dirty fields.
What is smaps
The smaps file in Linux provides detailed information for each of the process’s VMAs (Virtual Memory Areas), represented by struct vm_area_struct. The content of smaps file of a process is constructed by iterating through each VMA and calling show_smap. The Shared_Clean, Shared_Dirty, Private_Clean and Private_Dirty fields are printed here and calculated here.
Shared vs Private
From the perspective of smaps, whether a page is considered shared or private is determined by its page_mapcount which is basically page->_mapcount NOT by the MAP_SHARED or MAP_PRIVATE flags used in mmap. _mapcount of a page is a counter tracks how many page table entries (PTEs) point to this physical page across ALL processes. A page is considered shared if there are at least 2 PTEs point to it.
Since MAP_PRIVATE is copy-on-write (COW) and we only read from the file so the two VMAs are backed by the same physical pages and hence those pages are considered shared.
Multiple MAP_PRIVATE mmaps with MAP_POPULATE in single process
If the mmap is PROT_WRITE, MAP_PRIVATE and MAP_POPULATE, then kernel will not only populate the page tables but also eagerly triggers COW so each VMA is backed by their own private physcial pages.
Clean vs Dirty
From the perspective of smaps, whether a page is considered clean or dirty is determined by this. Conceptually a page is dirty if it cannot just be discarded without data loss, which means it has dirty writes that haven’t been flushed to the underlying file or swap space yet.