You seem to be conflating two separate concepts:
- The rendering rate, usually referred to by gamers as the frame rate. This is how fast the game itself can render frames on the current hardware.
- The physical frame rate, usually referred to by gamers as the refresh rate. This is how fast the display is physically updating what it is displaying.
On some really old hardware, such as the Atari 2600, there was objectively no difference between the two because you were literally rendering the frame line-by-line in real time (that is, the graphics are being rendered directly into the video output signal).
On modern systems though, the two are much more decoupled. The rendering rate is largely just a matter of the computational requirements of the game and the capabilities of the CPU/GPU (and whatever other coprocessors and/or peripherals are involved), while the physical frame rate is a function of the monitor.
Internally, all modern GPUs have what is known as a video output buffer for every output on the GPU. This buffer stores the frame that is currently being sent over that output to the connected display. In most cases, instead of rendering directly to the video output buffer, a GPU will render to an internal buffer and then copy the data from that to the video output buffer.¹ This ensures that the intermediate state of a partially rendered frame doesn’t go out over the display output, reducing flickering, tearing, and stuttering. ‘vsync’ is just a matter of ensuring that the copy only happens in between physical frames, and can help reduce tearing further.
This, in turn, leads to three possible situations regarding the rendering rate and physical frame rate:
- If they match exactly, then things functionally work much like the old systems that rendered in real time, just with some more steps involved.
- If the physical frame rate is higher than the rendering rate, then some frames may be duplicated (these duplicate frames are often known as ‘lag frames’ when talking about old video game consoles, because they usually represent the whole game essentially pausing to wait for the rendering process to re-synchronize with the display). Alternatively, a frame that is only partially rendered may be displayed, leading to visual artifacts.
- If the rendering rate is higher than the physical frame rate, then the rendering process can idle after each frame instead of having to run constantly. This is functionally how capping the frame rate in a game works to reduce resource usage. Alternatively, it may start on the next frame immediately, in which case either the previous frame may be dropped completely (if the rendering process of the new frame finishes before the frame could be displayed), or a partially rendered frame will be displayed.
This gets complicated though because it’s rarely the case that the rendering rate is constant. Most games have at least some situations where more work needs to be done to prepare a frame than ‘normal’, and some where less work needs to be done than ‘normal’. For example, in most games that use sprite graphics, the number of sprites on screen has a direct impact on the time taken to render a frame.
And, to make matters more interesting, newer displays often allow for a variable refresh rate (branded variously as FreeSync, G-Sync, Adaptive-Sync, ProMotion, Q-Sync, or generically as VRR). This lets the display derive it’s physical refresh rate from the rendering rate of the game, which in theory completely eliminates tearing (whether it does so or not depends on a few other factors, not least of which being how the display implements VRR itself).
1: Depending on the GPU design itself, this may actually involve just flipping a few bits in a register instead of actually copying data. Most modern GPUs are like this, as it significantly reduces the chances of tearing due to copying taking too long. Depending on the software involved, there may also be more than one buffer that data is rendered into, which generally allows for better resource utilization on the GPU, though the way this is implemented may vary (see https://en.wikipedia.org/wiki/Multiple_buffering and https://en.wikipedia.org/wiki/Swap_chain for some of the high-level details of the two common approaches).