Linear Molecules in SPFIT/SPCAT

Quick Summary

We have installed all the required software, looked into the spectrum of a simple linear molecule, and estimated its rotational constant and centrifugal distortion constant. Furthermore, we got a high-level overview of SPFIT and SPCAT. Now, we want to set up the model we derived for OCS by hand in SPFIT/SPCAT.

The .par and .int file

In our previous model/approximation, we determined the rotational constant $B=6081.49 \text{ MHz}$ and the centrifugal distortion constant $D = 1.29 \text{ kHz}$. This will be our starting point for the model in SPFIT/SPCAT. You can download the initial *.var file and *.int file and inspect them in your favorite text editor. Both files are adapted from the respective CDMS entry. The *.var file has the following content:

	
OCS
   2   100    10    0     0.0000E+00     1e+37    -1.0000E+00 1.0000000000
l   -1    1    0    0    0    1    1    1         0   -1
          100   6081.49    1e-37 /B
          200     -1.29e-3 1e-37 /-D

Compare the format you see here with the formate specified in the format documentation. Let us go line-by-line:

Title of the project/molecule for your own convenience
Settings connected to the fitting of the Hamiltonian, for now only the first two parameters are important:

Maximum number of parameters
Maximum number of assignments

Options regarding the molecule of interest
Definition of parameter $B$
Definition of parameter $-D$

It is important, that the maximum number of parameters is greater or equal the actual number of parameters in your model. The maximum number of assignments is important once we are fitting the model as it has to be greater or equal to the actual number of assignments. However, if you specify a value larger than the actual number of parameters or assignments, running SPFIT will update these values to exactly the number of parameters or assignments. The parameter lines all follow the same format:

Paramter identifier
Paramter value
In *.var files: Parameter uncertainty (in *.var files); In *.par files: How much the parameter is allowed to float (change) in the fit
A label of up to 10 characters that is delimited by '/'

The parameter coding is explained in greater detail in the documentation and a short summary is given in the following.

Parameter Coding

The parameter codes in the *.par and *.var files are coded in decimal digit form with the following order:

	EX	FF	I2	I1	NS	TYP	KSQ	NSQ	V2	V1
Digits	1	2	1	1	1	2	1	1	1-3	1-3

V2 and V1 specify the (vibrational) state the parameter belongs to. Depending on the number of defined states (defined by the NVIB parameter, the 3rd parameter in the third row) they take 1-3 digits. For less than 10 states, one digit is used, for less than 100 states, two digits are used. It is important to note here, that the highest possible value (9 for one digit, 99 for two digits, 999 for three digits) has a special meaning as these parameter values will be applied to all states. This can be handy when defining e.g., vibrational states via rotation-vibration interaction parameters. NSQ (short for N squared) is the power of $N(N+1)$ which the parameter value is multiplied with and similarly KSQ specifies the power of $N_z^2$. Here, $N$ denotes the total rotational angular momentum quantum numbers (excluding any spins! This means $N + S = J$). TYP specifies the projection type where 1-3 are reserved for the projections onto the $a$, $b$, and $c$ axes, respectively. Subsequently, powers of $N_+^{2n} + N_-^{2n}$ are specified, with TYP = 3 corresponding to $N_+^{2} + N_-^{2}$, TYP = 4 to $N_+^{4} + N_-^{4}$, and so on till TYP = 10.

For the beginning, this is fully sufficient to understand all the parameters code we will use. With the knowledge gathered here, find the correct parameter codes for the $B$, $D$, and $H$ parameters of a linear molecule.

It is important that not all parameters that can be coded also should be used. Depending on the Hamiltonian you are using, only a specific subset of parameters should be used. See the information on A- and S-reduction e.g., on the PROSPE page.

When new parameters have to be added to the model, you typically want to test the parameters that increase either NSQ, KSQ, or TYP by 1 from already included parameters. For a linear molecule, this is trivial as you would always increase the value of NSQ by 1. However, for asymmetric tops this can result in quite a few parameter candidates. Last, after you have added a parameter to your model, always check its value and its uncertainty. If the value is unphysical (e.g., $D$ is larger than $B$) or is not properly determined, this can hint toward an effective of overfitting model.

For now it is sufficient to understand that the two last digits (00) specify the vibrational state, here our default state, the ground vibrational state. The third last digit, $j$, specifies the rank of the total angular momentum operator $J^{2j}$. For our example, the parameter value in the first row is multiplied with $J^2$ and the parameter value in the second row is multiplied with $J^4$ (and therefore specifies $-D$ instead of $D$). Thus the complete rotational Hamiltonian is

$$ \mathbf{H}_\text{rot} = B \mathbf{J}^2 - D \mathbf{J}^4 $$

which results in the energy expression

$$ E_\text{rot}/h =BJ(J+1)-D\left(J(J+1)\right)^{2} $$

Advanced Details on *.par files for linear molecules

SPFIT and SPCAT allow to fit a wide variety of different molecules and hence also molecule classes. For linear molecules, some special considerations apply.

The CHR parameter (first parameter in the third row) is set to "l" by convention to indicate a linear molecule. However, this is purely didactical and does not change the model in any way.
The sign of SPIND (the second parameter in the third row) is negative. Otherwise unnecessary quantum numbers will be printed in the *.cat file.
The values for $K_\text{min}$ and $K_\text{max}$ are set to zero (fourth and fifth parameters in the third row).

The *.int file looks as following:

	
OCS
1  60503   1028.6544   0   199   -6.5   -5.7   1300.
  1   0.71520 /K. Tanaka et al. JCP 82 (1985) 2835

Again, we dissect the file line-by-line:

Title
Settings
Dipole moment specifier, dipole moment value, comment (delimited by '/')

For now, we will forego the different settings in the second row. The dipole moment is important for the strength of the transitions and is specified in Debye.

The *.cat file

From the *.par and *.int file we can create the *.cat file (which holds the predicted transitions) by opening a terminal in a folder with the *.par and *.int file and by running spcat OCS The terminal output should read

	
OCS.int
OCS.var
OCS.out
OCS.cat
OCS.str
OCS.egy
OCS                                                     Tue Apr 29 14:39:51 2025
INITIAL Q =      1028.6544, NEW Q IS RELATIVE TO MIN.EGY.=         0.0000
 NUMBER OF LINES =     99
TEMPERATURE - Q(SPIN-ROT.) - log Q(SPIN-ROT.)
    300.000      1027.7543    3.0119
    225.000       770.9829    2.8870
    150.000       514.1557    2.7111
     75.000       257.2727    2.4104
     37.500       128.8104    2.1100
     18.750        64.5744    1.8101
      9.375        32.4559    1.5113
sorted       99 lines

This indicates that SPCAT ran successfully and three new files will have been created. The *.out file gives a summary of the SPCAT run, the *.egy file holds the calculated energy levels, and the *.cat file the corresponding transitions between the energy levels. The *.cat file is a fixed-width-format where the columns are

Frequency in MHz
Estimated uncertainty (which has no meaning yet, as we arbitrarily set the uncertainties of the parameters $B$ and $D$ to zero)
The base 10 logarithm of the integrated intensity in units of $\text{nm}^2\text{ MHz}$
Degrees of freedom in the rotational partition function (atoms: 0, linear: 2, non-linear: 3)
Energy of the initial energy level in wavenumbers
Degeneracy of the final energy level
A tag to identify the molecule in the database
Identifier for the quantum number format
and following: Quantum numbers (in our case only two quantum numbers are present: the final ($J'$) and initial ($J''$) values of $J$

Take a second to verify that the values in the *.cat file agree with the values we calculated by hand (or in Python) for our model. This means, we are now exactly at the point where we were with our by-hand-model.

Improving the Model

Open LLWP and load the OCS spectrum and the *.cat file. In the Reference Series window, choose Transition and set the upper and lower quantum numbers to 1 and 0, respectively. The Inc checkbox beneath the quantum numbers should be checked (this indicates to LLWP, that we want this quantum number to increase in the Loomis-Wood plot). Make sure to set the number of plots to at least 10 (Ctrl + N), the relative offset to zero (Ctrl + G) and the a width to a few MHz (Ctrl + W).

Set the values in your *Reference Series* window as seen here.

Now you should see a Loomis-Wood plot of OCS. Depending on the value of plot_annotationfstring in your config, the top right of each plot shows the quantum numbers (for {qns}), the predicted position (for {x:.2f}). To go to higher or lower transitions in the series, you can increase or decrease the quantum numbers in the Reference Series window with the Inc or Dec buttons. For higher transitions, the predictions and center positions of the experimental lines do not match perfectly. They should deviate in a smooth trend. This is a hint, that our parameter values are a little off or additional higher-order parameters are missing.

The Loomis-Wood plot of our current model. The predicted and experimental positions deviate for higher transitions.

To refine our model, we have to assign the predictions to the experimental spectrum. Verify, that you have selected a Gaussian as the lineshape (Fit > Choose Fit Function > Gauss) and assign all transitions in the spectrum by selecting the area around them with the mouse. The assignments appear in the New Assignments window. Save the new assignments by pressing the Save button in the New Assignments window, preferably into the same folder as the *.var and *.cat files. This should result in this *.lin file.

Before we can fit the Hamiltonian to the assignments, we need a model in *.par format. As the *.par and *.var formats are mostly identical, copy the content of the OCS.var file to a new OCS.par file. If we run SPFIT now (try it by running spfit OCS in the terminal), our model does not change at all. The output of the SPFIT run reads

	
OCS.par
OCS.lin
OCS.fit
OCS.bak
OCS.var
OCS                                                     Tue Apr 29 15:19:32 2025

LINES REQUESTED=  100 NUMBER OF PARAMETERS=  2 NUMBER OF ITERATIONS= 10
  MARQUARDT PARAMETER =0.0000E+000 max (OBS-CALC)/ERROR =1.0000E+037
Converting Line 16
Finished Quantum  16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
 MICROWAVE AVG =       -0.016228 MHz, IR AVG =        0.00000
 MICROWAVE RMS =        0.044302 MHz, IR RMS =        0.00000
 END OF ITERATION  1 OLD, NEW RMS ERROR=        0.88603         0.88603
OCS                                                     Tue Apr 29 15:19:32 2025
FIT COMPLETE

You can read that a single iteration of 10 requested iterations ran through before the fit converged. With SPFIT you can remember that one (iteration) is none. So, why did nothing change?

The number of parameters is high enough (2)
The number of requested lines is high enough (100 requested, 16 lines in the *.lin file)

But we have to tell SPFIT that we actually want to float the $B$ and $D$ parameters. Therefore change their uncertainties to 1e+37. The correct *.par file should look like the following

	
OCS
   2   100    10    0     0.0000E+00     1e+37    -1.0000E+00 1.0000000000
l   -1    1    0    0    0    1    1    1         0   -1
          100   6081.49    1e+37 /B
          200     -1.29e-3 1e+37 /-D

Now, running SPFIT (via spfit OCS in the terminal) will result in the following output:

	
OCS.par
OCS.lin
OCS.fit
OCS.bak
OCS.var
OCS                                                     Tue Apr 29 15:28:02 2025

LINES REQUESTED=  100 NUMBER OF PARAMETERS=  2 NUMBER OF ITERATIONS= 10
  MARQUARDT PARAMETER =0.0000E+000 max (OBS-CALC)/ERROR =1.0000E+037
Converting Line 16
Finished Quantum  16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
 MICROWAVE AVG =       -0.016228 MHz, IR AVG =        0.00000
 MICROWAVE RMS =        0.044302 MHz, IR RMS =        0.00000
 END OF ITERATION  1 OLD, NEW RMS ERROR=        0.88603         0.02467
Finished Quantum  16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
 MICROWAVE AVG =        0.000116 MHz, IR AVG =        0.00000
 MICROWAVE RMS =        0.001234 MHz, IR RMS =        0.00000
 END OF ITERATION  2 OLD, NEW RMS ERROR=        0.02467         0.02467
OCS                                                     Tue Apr 29 15:28:02 2025
FIT COMPLETE

You can see, that the initial iteration had a microwave root-mean-square (RMS) of $44\text{ kHz}$ and then the fit improved to $1\text{ kHz}$ in the second iteration and then converged. The RMS value is given by

$$ \text{RMS} = \sqrt{\frac{1}{N} \sum_i \left( \nu_i - \nu_{i.0} \right)^2} $$

with the experimental frequencies $\nu_i$, the calculated frequencies $\nu_{i,0}$, and the number of lines $N$. You can think of it as the average deviation between the experimental and predicted positions. The weighted root-mean-square (WRMS) values (stated as the OLD, NEW RMS ERROR in the SPFIT output) is defined as

$$ \text{WRMS} = \sqrt{\frac{1}{N} \sum_i \left( \frac{ \nu_i - \nu_{i.0} }{\Delta \nu_i}\right)^2} $$

where $\Delta \nu_i$ are the respective experimental uncertainties. The WRMS value indicates if the deviations between experiment and predictions on average do excede (WRMS > 1) or fall below (WRMS < 1) the experimental uncertainties. Hence a WRMS value of 1 is perfect and indicates that (given your model has no discrepancies) your experimental uncertainties were chosen adequately. It is important to keep in mind that both the RMS and WRMS values are statistical values, meaning that very few blatant outliers can greatly influence them. Thus also residual plots should be used to evaluate the quality of the fit.

Check the resulting parameter values for $B$ and $D$ in the *.par or the *.fit file which should be $B= 6.08149\text{ MHz}$ and $D=1.302\text{ kHz}$. If we run SPFIT again (try it!) only a single iteration will run through as the model is already converged. Seeing a single iteration in the output can therefore either mean that your model is already converged or something is configured incorrectly. To test which of the two cases it is, change a rotational parameter slightly which should result in more than one iteration running through if everything is configured correctly.

Uncertainties of our Assignments

So far we have given our assignments standard uncertainties of $50 \text{ kHz}$ (this value is the default value in LLWP; See the Default Uncertainty input in the New Assignments window). However, our model indicates an average deviation between the model and the experimental positions of less than $2 \text{ kHz}$. Unfortunately, the parameter errors in SPFIT depend on the absolute and not the relative uncertainties of the assignments. The consequences are nicely summarized on the PROSPE website. To make a long story short, you have a few separate choices to make sure that the parameter uncertainties stated by SPFIT are (approximately) standard errors:

Multiply the parameter uncertainties by $C = \text{RMS} * \sqrt{N_\text{Lines} / (N_\text{Lines} - N_\text{Fitted Constants}) }$
Multiply all assignment uncertainties by the same factor $C$
Set the FRAC parameter to the Microwave RMS value (or even better by $C$) and refit. However, this can be tedious to keep updated
Use negative FRAC values, -1 corresponds to standard errors, -2 to $2\sigma$, ...

For large datasets this means a WRMS value of about 1 results in parameter uncertainties being standard values. The *.par and *.var files we have used already had FRAC set to -1 meaning the uncertainties in the *.fit and *.var files are actually standard errors.

Adding More Parameters to the Model

The inclusion of $-D$ significantly improved our model. This raises the question, should we maybe add some higher-order parameter? For OCS, the next parameter would be $H$ which is the parameter corresponding to $J^6$. We can just add it to the model by adding the following line to the *.par file

	
          300      1.0e-37 1e+37 /H

and increasing the number of parameters NPAR (first parameter in second row) to at least three. Then run SPFIT again to get the following output

	
OCS.par
OCS.lin
OCS.fit
OCS.bak
OCS.var
OCS                                                     Tue Apr 29 18:47:18 2025

LINES REQUESTED=   16 NUMBER OF PARAMETERS= 21 NUMBER OF ITERATIONS= 10
  MARQUARDT PARAMETER =0.0000E+000 max (OBS-CALC)/ERROR =1.0000E+037
Converting Line 16
Finished Quantum  16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
 MICROWAVE AVG =        0.000116 MHz, IR AVG =        0.00000
 MICROWAVE RMS =        0.001234 MHz, IR RMS =        0.00000
 END OF ITERATION  1 OLD, NEW RMS ERROR=        0.02467         0.02446
Finished Quantum  16
MARQUARDT PARAMETER = 0, TRUST EXPANSION = 1.00
 MICROWAVE AVG =        0.000150 MHz, IR AVG =        0.00000
 MICROWAVE RMS =        0.001223 MHz, IR RMS =        0.00000
 END OF ITERATION  2 OLD, NEW RMS ERROR=        0.02446         0.02446
OCS                                                     Tue Apr 29 18:47:18 2025
FIT COMPLETE

The fit improved only marginally (compare the Microwave RMS values from the first and second iteration). However, the real problem are the uncertainties of the resulting parameters (taken from the *.fit file)

	
                                NEW PARAMETER (EST. ERROR) -- CHANGE THIS ITERATION
   1           100          B      6081.492171( 75)        -0.000000    
   2           200         -D         -1.30137( 54)E-03      0.00000E-03
   3           300          H            -0.59(118)E-09        -0.00E-09

The $H$ parameter is clearly undefined as its standard uncertainty is higher than the value itself. Always check your parameter values and parameter uncertainties when adding new parameters to the model. If the RMS is not improving and/or undefined parameters are clear signs that the data might be overfitted.

Additional Literature Data

We are a little disappointed and go back to our previous model by removing $H$ from the *.par file. But not all hope is lost! Our spectrum does not cover any higher transitions (meaning higher in $J$ and thus also frequency) which we need to determine $H$. But we can use literature data from other scientists and combine it with our data. Go to the CDMS and find the correct tag for OCS. The first three digits of the tag are the molecular mass in atom masses, then follows one digit to identify the database (5 for the CDMS), and the last two digits are a running index for all molecules with that mass. Add up the atom masses of O, C, and S to find the correct starting point (or just hit CTRL + F and search for OCS).

Solution

The mass of OCS is 16 + 12 + 32 = 60 and the identifier for the CDMS is 5. Therefore we are looking for a tag like 0605xx. The tag for the ground vibrational state of OCS is 60503.

Equipped with the molecule tag open the VAMDC portal version of the CDMS and go to the Catalog section. Enter the tag in the Tag search field and click on doc of the catalog entry. Then go to Files and download the *.lin file. The columns in the *.lin file are $J'$, $J''$, the frequency, the frequency uncertainty, and a comment indicating the publication. You should see assignments ranging from $12\text{ GHz}$ to beyond $1\text{ THz}$ and from $J=0$ to $J=90$.

There is one last problem before we can combine our *.lin file with the literature data. Our data has still the wrong uncertainties. When fitting only data of the same quality this can be no problem (when accounting for the correct uncertainties via some other way) but this is essential when mixing different datasets. As the RMS of our best-model was $1.2 \text{ kHz}$, we will assume uncertainties of $1.2\text{ kHz}$ for our data. The *.lin after correcting the uncertainties and adding in the literature data is available here for comparison.

Make sure to sufficiently increase the number of lines in your *.par file before running SPFIT. The RMS and WRMS will both be quite high. Create some predictions with SPCAT and load both the *.lin and the *.cat file into LLWP (remove all other *.lin and *.cat files via Files > Edit Files). Then open the Residuals via the Modules menu. Increse the size of the window and press update to see the residuals $\nu_i - \nu_{i.0}$ of your current model.

The residuals show that especially transitions at high frequencies (which coincides with high $J$ values) are not reproduced well by the model. This hints towards a missing higher-order parameter.

The residuals clearly hint towards missing higher-order parameters as transitions with high $J$ values are increasingly deviating from the model. Try to add $H$ to the model and see what happens.

The residuals after adding $H$ to the Hamiltonian. No clear trend is visible in the residuals plot. However, the varying quality of the different datasets can be seen.

The residuals distribution is much better now but the varying quality (different uncertainties) of the dataset is quite apparent. Also check that all parameters are well determined (SPOILER: They are). Instead of the residuals plot the weighted residuals by inserting the following expression into the y-axis field:

	
(x_lin - x_cat) / error_lin

The weighted residuals are nicely (randomly) distributed and show no trends nor outliers.

The weighted residuals look really good and indicate that no higher-order parameters are required but we should test that this is true (add the parameter $L$ with the parameter code 400 to the model and check). Additionally, the WRMS value of 0.8 indicates that the agreement is already better than the experimental uncertainty. When data of different quality is fit together, the RMS value has little meaning as it is only a good indicator for the assignments with the highest uncertainties.

Solution (Adding $L$)

Adding $L$ to the model results in the following parameters:

		
   1           100          B     6081.4921180( 54)        0.0000001    
   2           200         -D       -1.3014284( 49)E-03   -0.0000001E-03
   3           300          H         -0.08938(179)E-09      0.00005E-09
   4           400          L               -4(151)E-18           -4E-18

Clearly, the parameter $L$ is not at all determined and also the improvement in both RMS and WRMS is negligible. Therefore, it does not make sense to add $L$ to the model.

Summary

We have learnt a lot in this section. The *.par, *.lin, and *.int formats are now known to us and we can set up simple models in SPFIT (even though some parameters are still a little mystery to us). We can improve these models by making assignments in LLWP, saving them to *.lin files and running SPFIT on them. We have learnt how to check if higher-order parameters are missing or superfluous.

But most importantly, we have learnt to always increase the number of parameters or lines when adding more assignments or parameters to our model, respectively!

First Steps with SPFIT/SPCAT

Isotopologues