Video images need to be decodable rapidly.
If it takes too long to decode an image, it will take too long to be shown, degrading the quality of the video.
Encoding (i.e., recording) the video can be slower.
The major difference between JPEG and MPEG compression is the exploitation of the interframe similarities.
Like JPEG, MPEG uses the DCT algorithm to encode a frame.
For the actual compression phase, both the spatial aspect (as in JPEG) and the temporal aspect between frames is addressed.
In addition, un-time-compressed frames are inserted now and then into the file:
This is so random access can be rapidly achieved when the viewer, say, wants to rapidly move to a particular frame.
Interpolating between frames as in the decompressing would be too slow.
These can be thought of as JPEG images thrown into the image stream.
To find a particular image, the closest frame without any time compression is found and the frame extrapolated from that.
The video encoding for motion predicting is done on 16x16 blocks (as JPEG uses 8x8 blocks).
Once the motion predicting is done, the algorithm proceeds similarly to JPEG. The quantization phase is more complex because there are 2 types of frames - those encoded for time and those that are not.
Loss is difficult to predict because depends on the type of movie and the eye of the observer.
Prof Ed Cox on MPEG Interframe compression