web analytics

So let’s assume you’ve read my preparatory posts (here, and here), and that you’re on board with the need to learn to do your own analytical work in the markets. Let’s also assume that you’ve taken a deep breath and decided that you need to learn to code. Great. What next? This post is about the “what next”, and I’ll tell you why I decided Python is the programming language for finance that makes the most sense for many of my readers.

What a program does

Particularly if you’ve never programmed, the whole idea of coding may seem very mysterious and complex. It helps to start at the beginning, and to think about what a program does on a very fundamental level. A computer program is nothing more than a set of instructions that the computer will follow exactly. Therein lies one of the problems that will frustrate you as a new programmer: the computer is so literal and precise it can be maddening. You will spend a lot of your time chasing down errors that are your own fault–imagine a child who has decided he will do exactly what you told him and nothing more (“oh, you said go to the bathroom? Ok. Done. You didn’t say put the lid up…”) The power of a computer program does not come from any magical analytical skill; rather it comes from doing simple things that you could, given enough time, do yourself with a pencil and paper, but doing them very fast and very precisely.

Programming languages are the tools through which we tell the computer what to do. Whether the program is written in Fortran, Basic, C, or Java, you are basically going to get the same end result. The differences, at least to me, lie primarily in speed and how easy it is on the programmer to get the job done. A program is basically about taking information (data) and doing something with it. For us finance-types, that something is usually some kind of number crunching to produce some results, but pretty much everything a computer does is based on that concept: take information, and do something with it. (People with formal educations in computer science might be having seizures reading my gross oversimplifications, but bear with me (and feel free to argue in the comments!))

To do that “something”, there are two main tools in programming languages. Languages have tools for making decisions based on logical conditions that are true or false, and tools for repeating an operation many times. No matter how complicated the program is (imagine some facial recognition software using “fuzzy logic” or perhaps a program that can read and understand the written or spoken word as examples of extreme complexity), when you drill down into it it’s still making all of its decisions on a list of “If A is true then do B, otherwise do C.” Much of your work of programming is based on thinking through what these logical tests should be and what they really mean. The next part of this, looping or repeating an operation, is where much of the power of a program comes in.

For a finance example, you could, theoretically, go through every bar of a 1 minute chart for 10 years and mark where a simple condition is true (e.g., “this bar’s close is higher than the two previous highs.”) It would take a while, but you could do it. I’m belaboring this point so you see there’s no magic–it might take you a day to do it, but you could. A program could likely do the same thing in a few seconds, or perhaps even a fraction of a single second, and the program will do it without error. A loop in a program would allow us to do this comparison over and over: “if this bar’s close is higher than the two previous highs, set a flag to True, otherwise do nothing.” (It might seem like magic because a computer can do an operation in seconds (certain kinds of regressions, for instance) that might take you months with a calculator. You could, theoretically do it by hand, but practically it would make no sense at all.)

The third piece of a program is data and getting it into and out of the program. In some cases, we may be dealing with millions of datapoints; managing data is one of the critical tasks of programming, and it’s where you will end up spending a lot of your time. All languages can do this to some extent (or they would be useless), but some make it much easier than others. Python, in particular, has excellent libraries for managing and manipulating data, and this is a key reason for choosing the language. Things just work, and work nicely, in Python when it comes to data management (with Numpy and Pandas).

Programming language for finance: choices and tradeoffs

If you want to learn to program with the goal of doing your own analysis, I think there are a few logical choices: Python, C, C++, Java, or VBA. Not on this list are specialized “niche” languages like Haskell, specialized statistical languages like MATLAB or R (bad news, friend: you’re probably going to need to know one of those too before this is all through, but walk before run), or retail-level languages like Amibroker or EasyLanguage. (Hint: someone should integrate a legitimate programming language into a brokerage platform…) I chose Python, and I think it’s probably the right answer for many of you reading this. Let me tell you why.

One of the key divisions between programming languages is that between high-level and low-level languages. In a low-level language, we “talk” to the computer using tools that are very close to the instruction set the processor uses itself. You probably know that computers “think in ones and zeros”; low-level languages do not actually use binary digits for everything, but they come pretty close. Some examples will help to make this clearer. The classic first program you learn in any language is called “Hello, World!”, and simply prints that text to the screen (or output device). It’s not a terribly useful program, but, because it’s simple, it gives us a good idea of the “flavor” and complexity of the language.

First, let’s assume that you wanted to write in machine language. Machine language is code that is very, very close to the “native language” of the computer processor, and is incomprehensible. In machine language, we talk to processor using numbers, and only numbers. For humans, this is difficult, to say the least, Here’s an example of Hello, World! in machine language: (I’m using this post as a reference, since writing machine code is far above my pay grade.)

b8    21 0a 00 00
a3    0c 10 00 06
b8    6f 72 6c 64
a3    08 10 00 06
b8    6f 2c 20 57
a3    04 10 00 06
b8    48 65 6c 6c
a3    00 10 00 06
b9    00 10 00 06
ba    10 00 00 00
bb    01 00 00 00
b8    04 00 00 00
cd    80
b8    01 00 00 00
cd    80

if you’re reading this post, it’s pretty certain that you will not be writing machine language. Except for very specialized applications, no one does this today. Most people who need to communicate with the processor on that level use a language that is one step above, called Assembly Language. (In my misguided youth, I spent many months learning to code 6502 Assembly.) In both Assembly and machine-level programming, you have to be responsible for everything; you, the programmer, manage the computers operations to a level that is kind of mind boggling: move this number to this location in memory, and put this command in a pile to be used later. You’re not quite saying “turn this pixel on the screen on to this color”, but you’re coming very close. (Here is one of the reasons for using a low-level language: if you had a need to tell the computer to manipulate a pixel like that, you could do so. You can access the hardware in a way that might be difficult or impossible in some high level languages.) Anyway, just for fun, here’s a Hello, World! example in 6502 Assembly:

a_cr    = $0d
bsout    = $ffd2

.code

ldx #0
printnext:
lda text,x
beq done
jsr bsout
inx
bne printnext
done:
rts

.rodata

text:
.byte    "Hello world!", a_cr, 0

If you’re new to programming, have these examples made you decide you don’t want to program?! Stick with me… basically, I’m showing you how much programming could suck; the next examples will suck a lot less. Now, pay attention, because you might–just might–find yourself writing C some day. C is not that far removed from assembly language; you still have to do a lot of the work yourself, but you can do so using structure that makes more sense to the human eye. Here’s an example of Hello, World in C:

#include <stdio.h>
int main()
{
printf("Hello world\n");
return 0;
}

Not quite so bad, right? There’s still some “stuff” in there, but you can basically see we’re telling the computer to print “Hello world” to the screen. Here’s what “Hello, World!” looks like in Python, which is a high-level language. This, I promise you, is not so bad:

print("Hello, World!")

Why did I go through all of this? First, I wanted to talk a bit about the tradeoffs in languages, but you’re going to be spending a lot of time learning your chosen language. Along the way, you’re going to run into people who tell you that what you’re doing is a stupid waste of time and that real programming is only done in Ada or Cobol–it’s hard to be certain the time you’re investing is going in the right direction. Here’s how I think about it, and why I think Python is probably the right choice for most of my readers:

The difference between high and low level languages is somewhat artificial, but it basically boils down to making things easier for the computer, or making things easier for the programmer. The tradeoff is mostly in speed: low level languages are wicked fast. Assembly, for instance, might sometimes be several thousand times faster than a high level language. This matters in some applications (HFT), but you are going to be able to learn a high level language in days or weeks, while it might take you months to learn to do even rudimentary operations in a low level language. Once you’ve learned the language, you might be able to write a few hundred lines of python code that would require a few thousand lines of C code. You’re going to be able to learn Python and to become proficient in the language much faster than you would in C.

For instance, you can read csv file into a perfectly formatted and timestamped Python data structure with a single command (pandas.read_csv()). I’ve never done the same work in C, but glance at this discussion to get some idea of the complexity involved. I’m wiling to sacrifice some speed for the convenience because my focus is on getting stuff done–the language is simply a tool to do what I need to be done. Another thing to consider: Forget your time spent writing the program, but think about what happens when you need to debug (which means finding mistakes, and there will be mistakes in your code) or change the program later. Maintaining well-commented high level code is almost a joy compared to the drudgery and complexity of tweaking low-level code.

This is a good place to stop for today. I realize this post might have been a little geeky, but I hope it was an interesting perspective. Next, I’ll dig a bit deeper into Python specifically, and get you started on learning the language in depth.