Why COBOL?

COBOL is like a dragon. Every programmer has heard of it, and that it is formidable. We’ve heard stories about people fighting the beast, but very few of us have actually seen a live COBOL program out in the wild. If that isn’t reason enough to be intrigued, maybe its historical sgnificance brought you here, wondering about how programming was like in the days of Grace Hopper - inventor of the first compiler, rear admiral, and DPMA man of the year 1969, who democratized coding, was a passionate teacher and genuinely kept her staff happy as a manager. Whatever your angle is, I assume you have a reason for being here, so I won’t jabber any longer about why COBOL is still interesting.

Why this guide?

A lot of people have written amazing and detailed COBOL guides and here I am, having solved one exercise of Advent of Code 2024 in COBOL. Why should you listen to me introducing a language I barely know? Well, the COBOL guides I found value precision and proper contextualization over speed. This is the opposite: If you want to spend no less than 30 minutes but be able to write a rudimentary COBOL program after that, I’m here for you.

Hello world

Enough of the introduction. I promised speed, so here is “Hello World” in COBOL:

000001 IDENTIFICATION DIVISION.                                         HELLO
000002 PROGRAM-ID. HelloWorld.                                          HELLO
000003 PROCEDURE DIVISION.                                              HELLO
000004     DISPLAY "Hello World!".                                      HELLO

If you want to run this, you can install GnuCOBOL and run the following:

cobc -xjO hello.cob

COBOL file format

The above hello world program is written in fixed format, which was used for COBOL programs prior to 2002. There is also a free format now, which removes the restrictions about which columns source code has to start (and end) in. However, for the purpose of this guide, we go old-school because that’s more fun.

In fixed format, the actual code resides in columns 8-11 (Area A) and 12-72 (Area B). This seems crazy from today’s coding standards, but was actually pretty reasonable back in the day. You see, each of these lines of code would be written on a punchcard—yes, just one line of code per punchcard. Why punchcards? Because they were way cheaper than any form of electronic storage back in the day. So imagine you’ve finally written your 300 LOC program. You’re on the way to the computer room to test it, but on the way you bump into a colleague, also holding a stack of 300 punchcards. The cards go flying all over the place and now what? Don’t fret! COBOL has you covered (if you followed best practices, that is). You can just stack them in any order, put them into a sorting machine, and instruct it to first sort by column 73-80 (Program Name Area) and then by column 1-6 (Sequence Number Area). Voilà, now both of your programs are in the right order again, and can be separated from each other! Today, we don’t need this anymore, and we can just leave these areas empty, but I’m sure this feature saved countless work hours.

The only things we need to remember now are these:

  • Column 7 can be used to designate a line as a comment (*) or code that’s only used for debugging (d). There are a few other symbols that appear here, but they are not relevant for this guide.
  • Actual code starts in Area A (column 8), or sometimes Area B (column 12).
  • Code must end on column 72. Everyting beyond that will be ignored.

Side note: If you’re weird like me and this makes you want to dive deeper into how punchcards worked, feel free to have some fun with this interactive editor for IBM 5081 punch cards:

Image: Douglas W. Jones, Punched Card Collection, University of Iowa, Department of Computer Science.
Character arrangement: Reference Manual: IBM 29 Card Punch, 7. ed, IBM 1970.
Code Card punch Print text

COBOL syntax

You might have already noticed that the hello world program above only uses characters that we would also use when writing text in English. This is very much by design. Grace Hopper herself heavily pushed for using English over symbols because she wanted to democratize programming and make it more accessible.

It’s important to keep that in mind, because unlike most modern programming languages, COBOL doesn’t aim to have a minimal number of syntactic elements and keywords (if, for, …). Instead, it provides a plethora of keywords, which sometimes consist of more than one word and can have alternatives, to make code sound as natural as possible to a non-technical person that just writes their first program.

Time for another example: Let’s calculate how a user-defined amount of money grows with 1% interest over 10 years:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. Interest.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 balance PICTURE s9(7)V99.
       PROCEDURE DIVISION.
       calculate-interest.
           DISPLAY "Please enter starting balance:"
           ACCEPT balance.
           PERFORM 10 TIMES
               MULTIPLY balance BY 1.01 GIVING balance
               DISPLAY "New balance: " balance
           END-PERFORM.

Hierarchy

You probably already noticed in the hello world example that COBOL code has a document-like hierarchy. There are DIVISIONs, and SECTIONs that seem to structure the code. In fact, there are the following hierarchical levels:

  • Divisions
  • Sections
  • Paragraphs
  • Sentences
  • Statements

Divisions, sections, and paragraphs have pre-defined names and structures, with the exception of sections and paragraphs in the PROCEDURE DIVISION, which can be named freely by the programmer. Sentences are groups of statements (which again have pre-defined structures) that are terminated with a dot. One effect of this hierarchy is that the definition of variables (which are called data items in COBOL) in the WORKING-STORAGE SECTION of the DATA DIVISON is separated from the code in the PROCEDURE DIVISION.

Data items

The WORKING-STORAGE SECTION looks bizarre from the standpoint of modern programming languages:

      * level             signed
      * |   name          |digit
      * |   |             || 7 times
      * |   |             || | decimal point
      * |   |             || | |
       01 balance PICTURE s9(7)V99.

The number in the beginning is called the level. A level of 01 designates an elementary data item. A higher number would be used for an aggregate data item that belongs to a group item of the next lower level above it. This allows to define and reference nested data types of arbitrary complexity. We’ll go into a bit more detail about that later.

The name of the data item is pretty self-explanatory, but the PICTURE seems weird again. COBOL actually never forces you to think in binary or any low-level data types for that matter. Instead, you define your data types by their “picture”, i.e. by the format how you would write them in a text file. The syntax for this picture part almost looks like a form of proto-regex:

  • s denotes a sign (either the character + or -).
  • 9 denotes a single decimal digit.
  • (7) repeats the character in front of it 7 times.
  • V denotes where the decimal point is placed.

With that, the picture s9(7)V99 stands for a signed 7-figure number with two decimal places - a pretty reasonable data item for the account balance of most people.

Procedures

Our example program has a single procedure defined by the named paragraph calculate-interest. Naming the paragraph is only necessary if we aim to call it as a procedure later in the code. By default, the first paragraph (named or unnamed) of the PROCEDURE DIVISION will be called as the main procedure.

Inside a procedure, you can create sentences out of an arbitrary number of statements. The statements that we use here are:

  • DISPLAY some-value some-other-value ... to display an arbitrary number of values as a concatenated string on the terminal.
  • ACCEPT data-item to read user input and store it in a data item. Note how the PICTURE definition of that item gives you input validation for free.
  • PERFORM 10 TIMES [...] END PERFORM to repeat the statements in [...] in a loop.
  • MULTIPLY x BY y GIVING z to multiply the value of x by y and store the result in z.

As already mentioned, COBOL uses many more keywords and statements than most modern programming languages, so learning to program in COBOL consists largely of searching for the right keywords and associated statements. There are also the classical named functions that we are more used to, but those will be covered in the next section.

Solving a non-trivial problem in COBOL

We could end our quick-start instructions here, but I think the step from a toy example to one that solves an actual non-trivial task still brings a lot of insights with a good cost-benefit ratio. The following COBOL program solves the first part of day 2 from Advent of Code 2024.

For this exercise, we have to read a text file with a list of numbers on each line, separated by spaces and count how many of those are “safe”. A list is safe if:

  • The numbers are sorted in ascending or descending order.
  • And the absolute difference between adjacent numbers is between 1 and 3 (inclusive).
       IDENTIFICATION DIVISION.
       PROGRAM-ID. AoC-2024-Day2.
       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT input-file
           ASSIGN TO "input"
           ORGANIZATION IS LINE SEQUENTIAL.
       DATA DIVISION.
       FILE SECTION.
       FD input-file.
       01 input-file-line PICTURE x(1024).
       WORKING-STORAGE SECTION.
       01 FILLER PICTURE a.
           88 at-eof VALUE 'Y' FALSE 'N'.
       01 line-cursor PICTURE 999.
       01 previous-line-cursor PICTURE 999.
       01 number-pair.
           02 previous PICTURE 999.
           02 current PICTURE 999.
       01 line-length PICTURE 999.
       01 FILLER PICTURE a.
           88 all-increasing VALUE 'Y' FALSE 'N'.
       01 FILLER PICTURE a.
           88 all-decreasing VALUE 'Y' FALSE 'N'.
       01 safe-count PICTURE 999.
       01 difference-safe PICTURE a.
           88 difference-is-safe VALUE 'Y' FALSE 'N'.
       01 difference PICTURE 999.
       PROCEDURE DIVISION.
       parse-file.
           OPEN INPUT input-file.
           PERFORM UNTIL at-eof
               READ input-file INTO input-file-line
               AT END
                   SET at-eof TO TRUE
               NOT AT END
                   PERFORM process-line
               END-READ
           END-PERFORM.
           DISPLAY "Total count of safe sequences: " safe-count.
           CLOSE input-file.
           STOP RUN.
       process-line.
           MOVE FUNCTION LENGTH(FUNCTION TRIM(input-file-line))
           TO line-length.
           MOVE 1 TO line-cursor.
           SET all-increasing TO TRUE.
           SET all-decreasing TO TRUE.
           SET difference-is-safe TO TRUE.
           PERFORM UNTIL line-cursor GREATER line-length
               MOVE line-cursor TO previous-line-cursor
               PERFORM read-next-number
               PERFORM check-if-increasing
               PERFORM check-if-decreasing
               PERFORM check-difference
           END-PERFORM.
           IF (all-increasing OR all-decreasing) AND difference-is-safe
               ADD 1 to safe-count.
       read-next-number.
           MOVE current TO previous.
           UNSTRING input-file-line
           DELIMITED BY " " INTO current
           WITH POINTER line-cursor.
       check-if-increasing.
           IF previous-line-cursor IS GREATER THAN 1
           AND previous IS GREATER THAN current
               SET all-increasing TO FALSE.
       check-if-decreasing.
           IF previous-line-cursor IS GREATER THAN 1
           AND previous IS LESS THAN current
               SET all-decreasing TO FALSE.
       check-difference.
           COMPUTE difference EQUAL FUNCTION ABS(previous - current).
           IF previous-line-cursor IS GREATER THAN 1
           AND (
               difference IS LESS THAN 1
               OR difference IS GREATER THAN 3
           )
               SET difference-is-safe TO FALSE.

Reading an input file

The first challenge that we haven’t tackled yet is file access. The definition of what file is accessed is handleed in the INPUT-OUTPUT SECTION of the ENVIRONMENT DIVISION:

       FILE-CONTROL.
           SELECT input-file
           ASSIGN TO "input"
           ORGANIZATION IS LINE SEQUENTIAL.

All of this is pretty straightforward: We give our file the name input-file, provide the file name on the local file system ("input"), and tell COBOL that it should read the file line by line. If you’re wondering “how else would I read a file?”, consider that COBOL mostly operates on fixed-length records. This has the benefit of allowing random access to files, skipping to the nth record in the file in O(1) instead of having to read O(n) variable-width lines. In that regard, our file structure is the worst case scenario from an efficiency perspective.

There is one more part of the definition in the FILE SECTION of the DATA DIVISION that is a little less straightforward:

       FD input-file.
       01 input-file-line PICTURE x(1024).       

Here, we first define what type of tile input-file is. The type FD designates a normal file used for reading and writing data. Another alternative would be SD, which designates a file that is used as a working memory area for COBOL’s sorting and merging functionality. The next line, defines a data item that represents one record from the file. As our file has ORGANIZATION IS LINE SEQUENTIAL, we effectively define what data type should be used for reading a line. As we want to be agnostic about line size and content, we just pick 1024 alphanumerical characters. The downside of this is obviously that if a line is less than 1024 characters long, we will still fill 1024 bytes of memory padded with spaces.

The code for actually reading a line from the file has again an interesting property:

       READ input-file INTO input-file-line
       AT END
           SET at-eof TO TRUE
       NOT AT END
           PERFORM process-line
       END-READ

As you can see, the READ statement has an option to execute other statements depending on whether there was something to read or we already reached the end of the file. Here, we use this to set the condition at-eof, which controls the outer loop of the program. But more on conditions in the next section.

Defining a condition name

Let’s look a bit closer at the definition of at-eof:

       01 FILLER PICTURE a.
           88 at-eof VALUE 'Y' FALSE 'N'.

The first thing that’s new is the FILLER in place of the name of the top-level data item. This is just a way of saying that we will never access that data item itself, so it doesn’t need a name. We still need to give it a picture, which is just one alphabetic character.

Now it gets interesting: The second line defines a subitem with level 88. Normally, you can only use levels 01 through 49, but there are a few special levels in COBOL. Level 88 is used for defining so-called “condition names”, which can be used as booleans. If set to TRUE, a condition name will assume the value given after VALUE (or the first possible value in case a range of values is specified there). If set to FALSE, it assumes the value given after FALSE. So essentially, we have an alphabetic data item that can either be Y or N, but we can use at-eof directly in place of a condition in an IF or UNTIL, and we can assign it to TRUE or FALSE with the SET statement.

Defining a group item

We’ve seen the special level 88 for condition names, but we haven’t seen a true group item yet:

       01 number-pair.
           02 previous PICTURE 999.
           02 current PICTURE 999.

Here, number-pair is a group item and previous and current are its subitems. The level 02 for the subitems is arbitrary. The only rule is that the level has to be larger than the parent item and both subitems have to have the same level. With this, we have a number-pair consiting of two numbers. If we wanted to swap pairs around as a whole, we could access number-pair directly, but for the purpose of this task we actually don’t need that. I just created the group item for learning purposes. You can access the subitems just by their name, but in case you would have subitems with the same name in differently named group items, you could also reference them explicitly as previous OF number-pair, for example.

Calling a procedure

In the code for reading a line from our file, you might have already noticed the PERFORM process-line statement. This is how you call a procedure in COBOL. As long as it’s in the same file, there is no need to pass any arguments, as all data is shared between procedures. This makes it harder to track which procedure is responsible for changing which data items, but it allows you to divide your code into procedures at virtually zero cost: Just add one line in Area A that introduces a new paragraph and thus gives the procedure a name.

In this program, we make heavy use of this feature to make the code more readable. For example, consider the loop over the numbers in a line:

           PERFORM UNTIL line-cursor GREATER line-length
               MOVE line-cursor TO previous-line-cursor
               PERFORM read-next-number
               PERFORM check-if-increasing
               PERFORM check-if-decreasing
               PERFORM check-difference
           END-PERFORM.

If all the procedures were written out here instead of calling them, the code would become quite convoluted. There is the downside of losing track where the data item line-cursor in the loop condition is changed, but to some degree such problems can be alleviated by choosing speaking names for the procedures.

As you might have noticed, procedures are not the only way of breaking down code into smaller elements. COBOL has functions, which are called similar to how we are used to in modern languages. For example, FUNCTION TRIM(input-file-line) calls the function TRIM and passes input-file-line as an argument. You can, of course, define your own functions in COBOL, but this is beyond the scope of this quick start guide.

Overview of program logic

With all the syntactic peculiarities out of the way, let’s briefly discuss what the program actually does:

  • The main loop PERFORM UNTIL at-eof just ensures that process-line is called for each line in the input file.
  • process-line reads the numbers found in the line one by one into the number-pair data item to check for each pair whether one of the safety conditions is violated.
  • read-next-number advances the line-cursor by reading the next number in the line and storing it in current while saving the previous content of current in previous to do the comparions later on.
    • I admit that it took me a while to find out that UNSTRING is the statement I needed here.
  • check-if-increasing and check-if-decreasing update the data items all-increasing and all-decreasing if they find a pair of numbers that violates the condition that the numbers in the line are sorted in ascending or descending order respectively.
  • check-difference calculates the absolute difference between the numbers in the pair and sets difference-is-safe to false if the difference is too small or too large.
  • Based on the values of all-increasing, all-decreasing, and difference-is-safe, the counter safe-count is incremented if the line violated none of the safety conditions.
  • At the very end, we use DISPLAY to print the final value of safe-count and thus solve the Advent of Code exercise.

Learnings and outlook

So, what have we learned? By this point, you are no more of a COBOL expert than I am. You understood a very basic program that can be solved with a handful of lines in Python or Ruby, and we haven’t even touched on any advanced concepts such as copybooks, tables, or sorting. However, I hope that now that the very basic questions are out of the way, you are able to dig into the GnuCOBOL Programmer’s Guide or your COBOL manual of choice to find the statements you need for your program. And I hope that this will be much quicker for you than if you had to start reading that fine but long document from the beginning.

Alternatively (or maybe additionally), I hope you just had a bit of fun with a quirky old language and maybe learned to appreciate where it got its quirks from. Maybe COBOL doesn’t look so much like a dragon now but more like a bear. Sure, it can still kill you, and you might not want to get too close to it, but you can see how it has its place in nature. If you just let it sleep and don’t disturb it too much, it’ll probably be okay.