It will depend on the representation of the parameters. Most models support bfloat16, where each parameters is 16-bits (2 Bytes). For these models, every Billion parameters needs roughly 2 GB of VRAM.
It is possible to reduce the memory footprint by using 8 bits for each param, and some models support this, but they start to get very stupid.
It will depend on the representation of the parameters. Most models support bfloat16, where each parameters is 16-bits (2 Bytes). For these models, every Billion parameters needs roughly 2 GB of VRAM.
It is possible to reduce the memory footprint by using 8 bits for each param, and some models support this, but they start to get very stupid.