Benchmarking, as defined here, is a procedure in which the computer predictions are compared with either measurements or results of other computer codes.

An experimental benchmark is a comparison to measured data in order to establish the accuracy, reliability, and other characteristics of computer predictions. Benchmark data are provided with detailed descriptions of the experimental conditions and uncertainties of the results.
For cases where measurements are not available, a numerical benchmark may be performed where reference data is obtained from the best-available computer predictions using codes that make few approximations in their treatment of radiation transport and the underlying nuclear data, and which have previously been benchmarked against experimental data for a similar radiation transport problem.

Ideal Characteristics of a Medical Radiation Transport Benchmark Problem

Benchmark computational radiation physics have traditionally provided a test of the accuracy of predictive radiation transport codes. Many benchmark problems are available for nuclear engineering and health physics problems, but benchmark problems for medical applications are extremely limited. The QUADOS code inter-comparison is perhaps the best recent example benchmark problems that include medical applications. Here, we define the characteristics of an ideal medical physics benchmark problem.

1. The problem is relevant to practicing medical physicist engaged in radiation therapy or diagnosic imaging or to researchers that are investigating these topics.
2. The problem is commonly occurring in the practice of medical physics (not esoteric), and it may be self-contained or part of a larger set of problems.
3. The problem is posed in the simplest possible way that achieves the benchmarking goals and requirements.
4. The problem is completely described, i.e., there are no ambiguities in the statement of the radiation source, geometry, materials, and quantities of interest.
5. The problem provides for verification of traditional metrics.
i. Accuracy of physics models (source, radiation transport, tallies).
ii. Freedom from subtle geometric and material specification errors.
iii. User competence and appropriate configuration of code options.
6. The problem provides for tests of advanced metrics.
i. Execution speed and efficiency.
ii. Scalability (parallel processing).
iii. Robustness and reliability of the results, e.g., solution convergence.
7. High-quality measured and/or simulated solutions to the problems are available, preferably in the peer reviewed literature. For problems involving absolute quantities, i.e., source strength of isotopic sources, the measured data should be traceable to a secondary or primary standards laboratory.
8. The benchmark problem design should be driven by real problems of interest in medical physics, as described in (2). As such, it may exclude from the benchmarking process codes that are inherently incapable of addressing the physics of the problem. However, an effort should be made to design the problem code- and platform-neutral so as not to prefer particular algorithms, or computer systems. For example, in external beam radiation therapy, a variety of dose computation algorithms are of interest, including broad beam, pencil beam, forward and adjoint Monte Carlo, method of characteristics, and discrete ordinates.)
9. The problems and example solutions should be freely available, should contain no proprietary information, should not require license agreements, etc.
10. The problems should promote collaboration between disciplines of medical physics, nuclear engineering, and computer science.
11. The problem should promote interaction and collaboration between clinical research and transport code developers. It should be broadly useful to the medical community, code developers, and other academic research communities.

Benchmark Specifications

The following format will be required from the benchmarks, except for numerical benchmarks, where the items 1 and 2 (pertaining to the experiments) will be omitted.

1. Detailed description of the experimental benchmark
a. Overview of the experiment, including objectives
b. Experimental configuration (materials and methods), including physical dimensions
c. Description of material data
2. Experimental data
a. Numerical data and file formats
b. Experimental uncertainties
3. Benchmark Problem Definition
a. Description of the model and the physical problem
b. Dimensions and geometries
c. Material composition data
d. Environmental data
4. Results of sample calculations
5. Computer code inputs

Benchmark Classification

The benchmarks will be classified in accordance to their medical physics applications; some benchmarks will be applicable to more then one of the groups.

1. Radiation therapy (RT)
2. Imaging (IM)
3. Nuclear medicine (NM)
4. Health physics (HP)

Some examples: dose distribution on a heterogeneous phantom (RT), CT density phantom (IM), internal dosimetry (NM-HP), MIRD phantoms (HP), dose distributions on the phantoms acquired with MVCT (IM-RT), photo-nuclear production during radiation therapy (RT-HP)

The benchmarks will also be classified in accordance to their nature; some will be applicable to more then one:

1. testing consistency of the codes
2. Clinical benchmarks (CLI): testing clinical – real world – problems
3. Experimental benchmarks (EXP):

Some examples: pencil beam voxel calculation (THE), electron beam backscattering (CLI), thick-target bremsstrahlung production measurements (EXP), heterogeneous phantom dose calculations (THE), if supported by experiments (THE-EXP)