Computer Simulation in Real Time Rail Traffic Control
Edwin Reese Kraft
December 1983
Preface
The following is excerpted from the authors 1983 Masters thesis, pp. 18-23. Although simulation models of train dispatching typically treat running times as a fixed and deterministic quantity, real world trains dont always perform that way. This 1983 research explicitly considered the impact of errors in running time prediction on train dispatchers ability to plan meets and passes. It demonstrated how levels of train delay rise in direct proportion to increases in running time variability.
Unfortunately, in spite of its significance to the performance of dispatching algorithms in real-world settings, so far as is known, other authors have never pursued this line of research. For that reason, run time variability has been called the "forgotten parameter." Hopefully the importance of this critical variable will soon be rediscovered, allowing the railroad industry to discover more effective ways to better manage the problems and reduce costs imposed by train running time unpredictability.
The following reproduces Chapter 2 from this 1983 research.
Prediction of Train Performance
2.1 Run Time Prediction Methods
The first step in planning train meets is the prediction of point-to-point train running times. Either a simulation or a statistical approach can be used. If adequate, accurate data are available, the statistical approach is preferred.
The time a train requires to traverse any given segment of track is a function of many variables. These include the trains weight, locomotives, length, engineer, and the tracks curvature, gradient, speed limit and physical condition. While many such variables exist, if a trains performance is observed as it traverses several segments, the aggregate impact of these variables may be summarized by the trains historical performance record.
Existing dispatcher assist systems utilize a "running time/speed table" to predict performance. This table contains standard run times or speeds over each section of the line. It may have separate entries for various train classes and for each direction. The values contained in the table may be updated to account for normal seasonal changes in train performance.
The primary advantage of the speed table is its simplicity. It is not necessary to store or process past performance data. However, a more sophisticated predictor which utilizes historical information is more accurate.
Assume a train has traversed segments 1 through n, but has yet to cross segments n+1 through m. Then the run times of each segment n+1 through m may be estimated as a function of the observed run times of segments 1 through n. The functional form may be nonlinear and may include all previously observed run times. For the purposes of this study, the simple linear functional form with only one independent variable has been assumed.
2.2 Regression Predictor
Systematic differences between trains tend to lead to high correlations among run times over various segments. For example, if a train has only one working locomotive, its run time should be consistently slower than another train having two locomotives.
In figure 1, a train has arrived at B and its run time crossing segment #1 has been measured. Using this run time, the run times on segments 2 and 3 can be estimated. (In actual practice, the run time A-B might be checked against a reasonable maximum. If if exceeds this maximum, the train has presumably stopped enroute, so the observation would not be used.) Segment #2 is "adjacent" to segment 1; it is the first segment downstream. Segment 3 is the second segment downstream from segment 1.

In general, regression equations can predict run time Tp on any downstream segment given the measured run time To on any observed segment. The form of the predictor is:
Tp = a + b To
Where the parameters a and b have been established from data.
The first segment of the line will have an equation of every other segment of the line. The final segment, of course, needs no equations since no segments are "downstream" from it.
An analysis of data from four Chessie rail lines is performed here. While it is not claimed the results are necessarily typical of all rail lines, the techniques used to analyze the data may be applied to any rail line.
2.3 Data Source and Analysis
The source of the data was the Chessie centralized traffic control center in Richmond, Va. Since the data used in this study was computer generated, the data base is believed to be very accurate.
First, time distance charts or stringlines of the operation were drawn. This gave a visual picture of the operations. By inspection, delays and stops to perform switching were identified and removed from the data base. The final result was a data base of free, unimpeded point to point running times.
Using this data base, a set of regressions were performed. The correlation between run times of trains on various segments was measured, and regression coefficients and root mean square errors were estimated.
2.4 Results
The estimated mean run times are reasonable: they are within the speed limits and performance capabilities of the trains. The variance in run times is large. Among apparently identical trains, the standard deviation is 20% to 35% of the mean run time.
Train performance is not only highly variable, but it is also inconsistent. Trains speed up or slow down unpredictably. Downstream run times are only weakly correlated with previously observed times. While the regression results are statistically significant, the use of the regression models does not result in a dramatic reduction in root mean square error. Regression run time root mean square error is the conditional standard deviation in running times, S, the amount of "spread" in the data from the regression line, while the speed table error is simply the unconditional standard deviation. The average improvement in error is only 12.4% with the two segments are adjacent. The average R value for adjacent segments is 0.41.
As the number of segments downstream increases, the correlation in running times decreases. For the second segment downstream, the R value declines to 0.29; for the third segment, R is 0.26. The improvement in error resulting from use of the regression model declines to 6.6% and 5.9%, respectively.
The physical limitations of loaded trains results in a measurably greater correlation of run times between segments, and an improved ability to predict train performance. For loaded eastbound trains, R is .44, while for empty westbounds, R is .38.
While these results are statistically significant, they suggest that only small improvements in root mean square error can be achieved through use of the regression models. If the results of this analysis are typical, the reduction in error from use of a regression model may not justify the added effort and expense. This work justifies continued use of the speed table method (as described on page 18), and provides an estimate of the root mean square error.
For the remainder of this research, the speed table approach will be used.
Last Updated: June 29, 2001