In parallel computing, speedup refers to how much a parallel algorithm is faster than a corresponding sequential algorithm.
Speedup is defined by the following formula:
- p is the number of processors
- T1 is the execution time of the sequential algorithm
- Tp is the execution time of the parallel algorithm with p processors
Linear speedup or ideal speedup is obtained when . When running an algorithm with linear speedup, doubling the number of processors doubles the speed. As this is ideal, it is considered very good scalability.
The speedup is called absolute speedup when T1 is the execution time of the best sequential algorithm, and relative speedup when T1 is the execution time of the same parallel algorithm on one processor. Relative speedup is usually implied if the type of speedup is not specified, because it doesn’t require implementation of the sequential algorithm.
What is the Maximum Speedup?
Several factors will appear as overhead in the parallel version and limit the speedup, notably:
- Periods when not all the processors can be performing useful work and are simply idle.
- Extra computations in the parallel version not appearing in the sequential version; for example, to recompute constants locally.
- Communication time between processes.
It is reasonable to expect that some part of computation cannot be divided into concurrent processes and must be performed sequentially. Let us assume that during some period, perhaps an initialization period or the period before concurrent processes are set up, only one processor is doing useful work, and for the rest of the computation additional processes are operating on processes.
Assuming there will be some parts that are only executed on one processor, the ideal situation would be for all the variable processors to operate simultaneously for the other times. If the fraction of the computation that cannot be divided into concurrent tasks is f, and no overhead is incurred when the computation is divided into concurrent parts, the time to perform the computation with p procesors is given by fts + (1 – f)ts/p .
where f (0 < f < 1) is the fraction of time (before the improvement) spent in the part that was not improved. For example (see picture on right):
- If part B is made five times faster (p = 5), tA = 3, tB = 1 and f = tA / (tA + tB) = 0.75, then
- If part A is made to run twice as fast (p = 2), tB = 1, tA = 3 and f = tB / (tA + tB) = 0.25, then
Therefore, making A twice as fast is better than making B five times faster. The percentage improvement in speed can be calculated as
- Improving part A by a factor of two will increase overall program speed by a factor of 1.6, which makes it 37.5% faster than the original computation.
- However, improving part B by a factor of five, which presumably requires more effort, will only achieve an overall speedup factor of 1.25, which makes it 20% faster.
This equation is known as Amdahl’s law(Amdahl, 1967). We see that indeed a speed improvement is indicated. However, the fraction of the computation that is executed by concurrent processes needs to be a substantial fraction of the overall computation if a significant increase in speed is to be achieved.
Parallel Programming – Techniques and Applications 2th Edition, Barry Wilkinson