Linear Molecules in SPFIT/SPCAT
Quick Summary
We have installed all the required software, looked into the spectrum of a simple linear molecule, and estimated its rotational constant and centrifugal distortion constant. Furthermore, we got a high-level overview of SPFIT and SPCAT. Now, we want to set up the model we derived for OCS by hand in SPFIT/SPCAT.
The *.par and *.int file
In our previous model/approximation, we determined the rotational constant $B=6081.49 \text{ MHz}$ and the centrifugal distortion constant $D = 1.29 \text{ kHz}$. This will be our starting point for the model in SPFIT/SPCAT. You can download the initial *.var file and *.int file and inspect them in your favorite text editor. Both files are adapted from the respective CDMS entry. The *.var file has the following content:
OCS
2 100 10 0 0.0000E+00 1e+37 -1.0000E+00 1.0000000000
l -1 1 0 0 0 1 1 1 0 -1
100 6081.49 1e-37 /B
200 -1.29e-3 1e-37 /-D
Compare the format you see here with the formate specified in the format documentation. Let us go line-by-line:
- Title of the project/molecule for your own convenience
- Settings connected to the fitting of the Hamiltonian, for now only the first two parameters are important:
- Maximum number of parameters
- Maximum number of assignments
- Options regarding the molecule of interest
- Definition of parameter $B$
- Definition of parameter $-D$
It is important, that the maximum number of parameters is greater or equal the actual number of parameters in your model. The maximum number of assignments is important once we are fitting the model as it has to be greater or equal to the actual number of assignments. However, if you specify a value larger than the actual number of parameters or assignments, running SPFIT will update these values to exactly the number of parameters or assignments. The parameter lines all follow the same format:
- Paramter identifier
- Paramter value
- In *.var files: Parameter uncertainty (in *.var files); In *.par files: How much the parameter is allowed to float (change) in the fit
- A label of up to 10 characters that is delimited by '/'
The parameter coding is explained in greater detail in the documentation. For now it is sufficient to understand that the two last digits (00) specify the vibrational state, here our default state, the ground vibrational state. The third last digit, $j$, specifies the rank of the total angular momentum operator $J^{2j}$. For our example, the parameter value in the first row is multiplied with $J^2$ and the parameter value in the second row is multiplied with $J^4$ (and therefore specifies $-D$ instead of $D$). Thus the complete rotational Hamiltonian is
$$ \mathbf{H}_\text{rot} = B \mathbf{J}^2 - D \mathbf{J}^4 $$which results in the energy expression
$$ E_\text{rot}/h =BJ(J+1)-D\left(J(J+1)\right)^{2} $$Advanced Details on *.par files for linear molecules
SPFIT and SPCAT allow to fit a wide variety of different molecules and hence also molecule classes. For linear molecules, some special considerations apply.
- The CHR parameter (first parameter in the third row) is set to "l" by convention to indicate a linear molecule. However, this is purely didactical and does not change the model in any way.
- The sign of SPIND (the second parameter in the third row) is negative. Otherwise unnecessary quantum numbers will be printed in the *.cat file.
- The values for $K_\text{min}$ and $K_\text{max}$ are set to zero (fourth and fifth parameters in the third row).
The *.int file looks as following:
OCS
1 60503 1028.6544 0 199 -6.5 -5.7 1300.
1 0.71520 /K. Tanaka et al. JCP 82 (1985) 2835
Again, we dissect the file line-by-line:
- Title
- Settings
- Dipole moment specifier, dipole moment value, comment (delimited by '/')
For now, we will forego the different settings in the second row. The dipole moment is important for the strength of the transitions and is specified in Debye.
The *.cat file
From the *.par and *.int file we can create the *.cat file (which holds the predicted transitions) by opening a terminal in a folder with the *.par and *.int file and by running spcat OCS The terminal output should read
OCS.int
OCS.var
OCS.out
OCS.cat
OCS.str
OCS.egy
OCS Tue Apr 29 14:39:51 2025
INITIAL Q = 1028.6544, NEW Q IS RELATIVE TO MIN.EGY.= 0.0000
NUMBER OF LINES = 99
TEMPERATURE - Q(SPIN-ROT.) - log Q(SPIN-ROT.)
300.000 1027.7543 3.0119
225.000 770.9829 2.8870
150.000 514.1557 2.7111
75.000 257.2727 2.4104
37.500 128.8104 2.1100
18.750 64.5744 1.8101
9.375 32.4559 1.5113
sorted 99 lines
This indicates that SPCAT ran successfully and three new files will have been created. The *.out file gives a summary of the SPCAT run, the *.egy file holds the calculated energy levels, and the *.cat file the corresponding transitions between the energy levels. The *.cat file is a fixed-width-format where the columns are
- Frequency in MHz
- Estimated uncertainty (which has no meaning yet, as we arbitrarily set the uncertainties of the parameters $B$ and $D$ to zero)
- The base 10 logarithm of the integrated intensity in units of $\text{nm}^2\text{ MHz}$
- Degrees of freedom in the rotational partition function (atoms: 0, linear: 2, non-linear: 3)
- Energy of the initial energy level in wavenumbers
- Degeneracy of the final energy level
- A tag to identify the molecule in the database
- Identifier for the quantum number format
- and following: Quantum numbers (in our case only two quantum numbers are present: the final ($J'$) and initial ($J''$) values of $J$
Take a second to verify that the values in the *.cat file agree with the values we calculated by hand (or in Python) for our model. This means, we are now exactly at the point where we were with our by-hand-model.
Improving the Model
Open LLWP and load the OCS spectrum and the *.cat file. In the Reference Series window, choose Transition and set the upper and lower quantum numbers to 1 and 0, respectively. The Inc checkbox beneath the quantum numbers should be checked (this indicates to LLWP, that we want this quantum number to increase in the Loomis-Wood plot). Make sure to set the number of plots to at least 10 (Ctrl + N), the relative offset to zero (Ctrl + G) and the a width to a few MHz (Ctrl + W).

Now you should see a Loomis-Wood plot of OCS. Depending on the value of plot_annotationfstring in your config, the top right of each plot shows the quantum numbers (for {qns}), the predicted position (for {x:.2f}). To go to higher or lower transitions in the series, you can increase or decrease the quantum numbers in the Reference Series window with the Inc or Dec buttons. For higher transitions, the predictions and center positions of the experimental lines do not match perfectly. They should deviate in a smooth trend. This is a hint, that our parameter values are a little off or additional higher-order parameters are missing.

To refine our model, we have to assign the predictions to the experimental spectrum. Verify, that you have selected a Gaussian as the lineshape (Fit > Choose Fit Function > Gauss) and assign all transitions in the spectrum by selecting the area around them with the mouse. The assignments appear in the New Assignments window. Save the new assignments by pressing the Save button in the New Assignments window, preferably into the same folder as the *.var and *.cat files. This should result in this *.lin file.
Before we can fit the Hamiltonian to the assignments, we need a model in *.par format. As the *.par and *.var formats are mostly identical, copy the content of the OCS.var file to a new OCS.par file. If we run SPFIT now (try it by running spfit OCS in the terminal), our model does not change at all. The output of the SPFIT run reads
OCS.par
OCS.lin
OCS.fit
OCS.bak
OCS.var
OCS Tue Apr 29 15:19:32 2025
LINES REQUESTED= 100 NUMBER OF PARAMETERS= 2 NUMBER OF ITERATIONS= 10
MARQUARDT PARAMETER =0.0000E+000 max (OBS-CALC)/ERROR =1.0000E+037
Converting Line 16
Finished Quantum 16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
MICROWAVE AVG = -0.016228 MHz, IR AVG = 0.00000
MICROWAVE RMS = 0.044302 MHz, IR RMS = 0.00000
END OF ITERATION 1 OLD, NEW RMS ERROR= 0.88603 0.88603
OCS Tue Apr 29 15:19:32 2025
FIT COMPLETE
You can read that a single iteration of 10 requested iterations ran through before the fit converged. With SPFIT you can remember that one (iteration) is none. So, why did nothing change?
- The number of parameters is high enough (2)
- The number of requested lines is high enough (100 requested, 16 lines in the *.lin file)
But we have to tell SPFIT that we actually want to float the $B$ and $D$ parameters. Therefore change their uncertainties to 1e+37. The correct *.par file should look like the following
OCS
2 100 10 0 0.0000E+00 1e+37 -1.0000E+00 1.0000000000
l -1 1 0 0 0 1 1 1 0 -1
100 6081.49 1e+37 /B
200 -1.29e-3 1e+37 /-D
Now, running SPFIT (via spfit OCS in the terminal) will result in the following output:
OCS.par
OCS.lin
OCS.fit
OCS.bak
OCS.var
OCS Tue Apr 29 15:28:02 2025
LINES REQUESTED= 100 NUMBER OF PARAMETERS= 2 NUMBER OF ITERATIONS= 10
MARQUARDT PARAMETER =0.0000E+000 max (OBS-CALC)/ERROR =1.0000E+037
Converting Line 16
Finished Quantum 16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
MICROWAVE AVG = -0.016228 MHz, IR AVG = 0.00000
MICROWAVE RMS = 0.044302 MHz, IR RMS = 0.00000
END OF ITERATION 1 OLD, NEW RMS ERROR= 0.88603 0.02467
Finished Quantum 16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
MICROWAVE AVG = 0.000116 MHz, IR AVG = 0.00000
MICROWAVE RMS = 0.001234 MHz, IR RMS = 0.00000
END OF ITERATION 2 OLD, NEW RMS ERROR= 0.02467 0.02467
OCS Tue Apr 29 15:28:02 2025
FIT COMPLETE
You can see, that the initial iteration had a microwave root-mean-square (RMS) of $44\text{ kHz}$ and then the fit improved to $1\text{ kHz}$ in the second iteration and then converged. The RMS value is given by
$$ \text{RMS} = \sqrt{\frac{1}{N} \sum_i \left( \nu_i - \nu_{i.0} \right)^2} $$with the experimental frequencies $\nu_i$, the calculated frequencies $\nu_{i,0}$, and the number of lines $N$. You can think of it as the average deviation between the experimental and predicted positions. The weighted root-mean-square (WRMS) values (stated as the OLD, NEW RMS ERROR in the SPFIT output) is defined as
$$ \text{WRMS} = \sqrt{\frac{1}{N} \sum_i \left( \frac{ \nu_i - \nu_{i.0} }{\Delta \nu_i}\right)^2} $$where $\Delta \nu_i$ are the respective experimental uncertainties. The WRMS value indicates if the deviations between experiment and predictions on average do excede (WRMS > 1) or fall below (WRMS < 1) the experimental uncertainties. Hence a WRMS value of 1 is perfect and indicates that (given your model has no discrepancies) your experimental uncertainties were chosen adequately. It is important to keep in mind that both the RMS and WRMS values are statistical values, meaning that very few blatant outliers can greatly influence them. Thus also residual plots should be used to evaluate the quality of the fit.
Check the resulting parameter values for $B$ and $D$ in the *.par or the *.fit file which should be $B= 6.08149\text{ MHz}$ and $D=1.302\text{ kHz}$. If we run SPFIT again (try it!) only a single iteration will run through as the model is already converged. Seeing a single iteration in the output can therefore either mean that your model is already converged or something is configured incorrectly. To test which of the two cases it is, change a rotational parameter slightly which should result in more than one iteration running through if everything is configured correctly.
Uncertainties of our Assignments
So far we have given our assignments standard uncertainties of $50 \text{ kHz}$ (this value is the default value in LLWP; See the Default Uncertainty input in the New Assignments window). However, our model indicates an average deviation between the model and the experimental positions of less than $2 \text{ kHz}$. Unfortunately, the parameter errors in SPFIT depend on the absolute and not the relative uncertainties of the assignments. The consequences are nicely summarized on the PROSPE website. To make a long story short, you have a few separate choices to make sure that the parameter uncertainties stated by SPFIT are (approximately) standard errors:
- Multiply the parameter uncertainties by $C = \text{RMS} * \sqrt{N_\text{Lines} / (N_\text{Lines} - N_\text{Fitted Constants}) }$
- Multiply all assignment uncertainties by the same factor $C$
- Set the FRAC parameter to the Microwave RMS value (or even better by $C$) and refit. However, this can be tedious to keep updated
- Use negative FRAC values, -1 corresponds to standard errors, -2 to $2\sigma$, ...
For large datasets this means a WRMS value of about 1 results in parameter uncertainties being standard values. The *.par and *.var files we have used already had FRAC set to -1 meaning the uncertainties in the *.fit and *.var files are actually standard errors.
Adding More Parameters to the Model
The inclusion of $-D$ significantly improved our model. This raises the question, should we maybe add some higher-order parameter? For OCS, the next parameter would be $H$ which is the parameter corresponding to $J^6$. We can just add it to the model by adding the following line to the *.par file
300 1.0e-37 1e+37 /H
and increasing the number of parameters NPAR (first parameter in second row) to at least three. Then run SPFIT again to get the following output
OCS.par
OCS.lin
OCS.fit
OCS.bak
OCS.var
OCS Tue Apr 29 18:47:18 2025
LINES REQUESTED= 16 NUMBER OF PARAMETERS= 21 NUMBER OF ITERATIONS= 10
MARQUARDT PARAMETER =0.0000E+000 max (OBS-CALC)/ERROR =1.0000E+037
Converting Line 16
Finished Quantum 16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
MICROWAVE AVG = 0.000116 MHz, IR AVG = 0.00000
MICROWAVE RMS = 0.001234 MHz, IR RMS = 0.00000
END OF ITERATION 1 OLD, NEW RMS ERROR= 0.02467 0.02446
Finished Quantum 16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
MICROWAVE AVG = 0.000150 MHz, IR AVG = 0.00000
MICROWAVE RMS = 0.001223 MHz, IR RMS = 0.00000
END OF ITERATION 2 OLD, NEW RMS ERROR= 0.02446 0.02446
OCS Tue Apr 29 18:47:18 2025
FIT COMPLETE
The fit improved only marginally (compare the Microwave RMS values from the first and second iteration). However, the real problem are the uncertainties of the resulting parameters (taken from the *.fit file)
NEW PARAMETER (EST. ERROR) -- CHANGE THIS ITERATION
1 100 B 6081.492171( 75) -0.000000
2 200 -D -1.30137( 54)E-03 0.00000E-03
3 300 H -0.59(118)E-09 -0.00E-09
The $H$ parameter is clearly undefined as its standard uncertainty is higher than the value itself. Always check your parameter values and parameter uncertainties when adding new parameters to the model. If the RMS is not improving and/or undefined parameters are clear signs that the data might be overfitted.
Additional Literature Data
We are a little disappointed and go back to our previous model by removing $H$ from the *.par file. But not all hope is lost! Our spectrum does not cover any higher transitions (meaning higher in $J$ and thus also frequency) which we need to determine $H$. But we can use literature data from other scientists and combine it with our data. Go to the CDMS and find the correct tag for OCS. The first three digits of the tag are the molecular mass in atom masses, then follows one digit to identify the database (5 for the CDMS), and the last two digits are a running index for all molecules with that mass. Add up the atom masses of O, C, and S to find the correct starting point (or just hit CTRL + F and search for OCS).
Solution
Equipped with the molecule tag open the VAMDC portal version of the CDMS and go to the Catalog section. Enter the tag in the Tag search field and click on doc of the catalog entry. Then go to Files and download the *.lin file. The columns in the *.lin file are $J'$, $J''$, the frequency, the frequency uncertainty, and a comment indicating the publication. You should see assignments ranging from $12\text{ GHz}$ to beyond $1\text{ THz}$ and from $J=0$ to $J=90$.
There is one last problem before we can combine our *.lin file with the literature data. Our data has still the wrong uncertainties. When fitting only data of the same quality this can be no problem (when accounting for the correct uncertainties via some other way) but this is essential when mixing different datasets. As the RMS of our best-model was $1.2 \text{ kHz}$, we will assume uncertainties of $1.2\text{ kHz}$ for our data. The *.lin after correcting the uncertainties and adding in the literature data is available here for comparison.
Make sure to sufficiently increase the number of lines in your *.par file before running SPFIT. The RMS and WRMS will both be quite high. Create some predictions with SPCAT and load both the *.lin and the *.cat file into LLWP (remove all other *.lin and *.cat files via Files > Edit Files). Then open the Residuals via the Modules menu. Increse the size of the window and press update to see the residuals $\nu_i - \nu_{i.0}$ of your current model.

The residuals clearly hint towards missing higher-order parameters as transitions with high $J$ values are increasingly deviating from the model. Try to add $H$ to the model and see what happens.

The residuals distribution is much better now but the varying quality (different uncertainties) of the dataset is quite apparent. Also check that all parameters are well determined (SPOILER: They are). Instead of the residuals plot the weighted residuals by inserting the following expression into the y-axis field:
(x_lin - x_cat) / error_lin

The weighted residuals look really good and indicate that no higher-order parameters are required but we should test that this is true (add the parameter $L$ with the parameter code 400 to the model and check). Additionally, the WRMS value of 0.8 indicates that the agreement is already better than the experimental uncertainty. When data of different quality is fit together, the RMS value has little meaning as it is only a good indicator for the assignments with the highest uncertainties.
Solution (Adding $L$)
1 100 B 6081.4921180( 54) 0.0000001
2 200 -D -1.3014284( 49)E-03 -0.0000001E-03
3 300 H -0.08938(179)E-09 0.00005E-09
4 400 L -4(151)E-18 -4E-18
Clearly, the parameter $L$ is not at all determined and also the improvement in both RMS and WRMS is negligible.
Therefore, it does not make sense to add $L$ to the model.
Summary
We have learnt a lot in this section. The *.par, *.lin, and *.int formats are now known to us and we can set up simple models in SPFIT (even though some parameters are still a little mystery to us). We can improve these models by making assignments in LLWP, saving them to *.lin files and running SPFIT on them. We have learnt how to check if higher-order parameters are missing or superfluous.
But most importantly, we have learnt to always increase the number of parameters or lines when adding more assignments or parameters to our model, respectively!