Wednesday, July 2, 2014

I use Fortran; it may be right for you, too (or not)

I don't know who would care to read this post. Maybe econ grad students.

Aruoba and Fernández-Villaverde have a nice paper comparing programming languages for solving a standard DSGE model. They have provided a nice public good here, and the paper is worth a look for economists who code. The headline finding is that C++ and Fortran are still the fastest, and (somewhat surprisingly) C++ is slightly faster.

I originally used Matlab for my dissertation model. It was taking a long time to solve, and people in my department finally convinced me to switch to Fortran.* The most time-intensive parts of my solution algorithm take about one tenth of the time they took in Matlab. Other parts got even bigger speedups.

A lot of people give me a hard time about Fortran and tell me I should switch to Python or something similar. The reason I won't do that is clear enough from the paper. Python is, by all accounts, a very intuitive and versatile language. But my model can sometimes take 24 hours to solve, and even multiplying that by two or three times would be very costly. To calibrate (or estimate) a model, one must solve it many times. Also, other people in my department use Fortran (it's pretty popular in macro), so there are some nice agglomeration returns. Fortran is very common in scientific computing, so there is a large library of algorithms you can take off the shelf (see, e.g., Numerical Recipes). It's a really easy language to learn--in fact, it's fairly similar to Matlab.

A common critique of Fortran (voiced by the first commenter here) is that, these days, hardware is cheap and programmers are expensive--so easier, more versatile languages are best. That's probably true in much of industry, particularly things like web design. But for tasks that require serious number crunching, and in an academic world with limited resources, hardware is still a binding constraint (and grad student labor--i.e., mine--is cheap). I've been solving my model on 180 processors. A lot of people don't have access to that kind of hardware (until a few months ago, I couldn't use more than 18). Furthermore, there are diminishing returns to parallelization: above 180, I get basically no speedup from adding workers. So I'm not even sure that better hardware could offset Fortran's speed advantage in my case. (Right now, other people in my department are probably wishing I would quit using 180 processors...).

If you are doing representative agent models, the speed differences between languages are probably irrelevant. In that case, you probably care more about ease of use and applications other than the number crunching, like making charts. Fortran is pretty bad in this department--I dump all of my output into Matlab and make charts there, and I've been meaning to move those codes over to Python or R so I won't be so reliant on license stuff. But if you plan to only do those kinds of models, Fortran is probably not the right choice. Use Dynare, which is awesome.

If you are planning to solve models with some nontrivial heterogeneity, you need to choose your language carefully. In case you don't know: in a model in which agents differ over a state space, equilibrium prices don't just fall out of a first-order condition. You have to solve for them. The usual way is to guess prices, obtain policy functions, add up everyone's choices, check market clearing, and guess again. While a rep agent model only requires you to find policy functions once, a het agent model requires you to do it many times while you search for the right prices. (A nice side effect of solving models this way is that you get to see partial equilibrium results while it solves). Computing time grows exponentially with the number of heterogeneity dimensions you have, due to the Curse. Also, the more prices you have to find, the longer it will take (here's a tip: constant returns to scale technology makes factor prices move in lockstep, so knowing one implies the other). When I went from needing to find one price to needing to find two, it more than doubled my computation time.

This stuff matters because I think some of the most interesting work being done in macro right now is the empirical stuff based on micro data. To me, heterogeneity is what makes macro interesting. The theories that have to go with the rich micro data are often going to require hard computational work.


*I'll save commenters some time and simply note that I've already heard the one about how you used Fortran in college in the 1970s. It is somewhat funny that this language is still in wide use in scientific computing; but it's also not a huge surprise since doing floating point calculations over and over again doesn't require the latest bells and whistles. We're not trying to build Instagram here. Also modern Fortran is a pretty different language from Fortran 77 (it was last updated in 2008).

10 comments:

  1. It's surprising that numpy (or similar) isn't the default high performance choice. Maybe these tools are not taught?

    ReplyDelete
    Replies
    1. Numpy won't work for high performance because it is just too slow; see https://modelingguru.nasa.gov/docs/DOC-1762. For the reasons I mention above, for some model problems speed is the main concern.

      But I have heard that econ departments are moving to it for their computational classes (as opposed to Matlab, which I think is the most common now).

      Delete
  2. There was a recent article in Ars Technica about Fortran and its enduring popularity, so it's definitely not dead: http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/

    I use mostly MATLAB, with an occasional mex file in C (sometimes calling GSL), but I've been thinking about learning Fortran. Did you have any particular reason to choose Fortran over C or C++? Also out of curiosity, do you use any proprietary libaries (NAG, IMSL,...) or can one get around with free code from netlib or other sources?

    ReplyDelete
    Replies
    1. The main reason I chose Fortran over C was that more people in my department use Fortran. It was really about having people around who could give me pointers. Also I think Fortran is a bit easier than C, but I would have gone with C if the people around me were doing it.

      I have pulled algorithms from Numerical Recipies, but that's it.

      Delete
    2. As an economist, I am using Fortran since almost 20 years. From my recent experiences with Coarray Fortran I can tell, this technology is already incredible powerful for doing MPMD-like parallel programming, e.g. developing my own task pool management. In my programing I make also heavy use of some IMSL/Stat routines, which are still unmatched in performance and some of its functionality (some of the possibilities are undocumented). Since I do expect those IMSL routines to make heavy use of further parallelism and vectorization (possibly through the use of Intel MKL), I am somewhat confident to get current and upcomming many-core machines (Intel MIC, Xeon Phi) fed, even with my small to medium sized Fortran software projects. But the main challenge will be the development of software which is unthinkable on serial or even multi-core computers.

      Delete
  3. Good post, Ryan. An important thing to keep in mind, though. You talk about using something like Python or R only when you don't need to care about the computation speed and would rather have nice plotting capabilities, etc. That is a good reason to want to use them, but that can hold for more computationally intensive applications as well. You're precluding the option of using Python as a wrapper language. Because it's so nice to use, you can build most of your infrastructure in Python, then write extensions in Fortran (or C/C++, which is what I do) for just the computationally intensive loops, which then return the output as Python objects. And heck, while your at it, if you'd rather use R's graphing options than Python's matplotlib, you can wrap those in your Python framework as well with rpy2. If all your doing is one computation routine, this might not make sense. But if you've got much more of an intricate program, framing the problem in Python can save you a lot of headache and make your code more portable and reusable.

    Also, as a side note, given that with current compilers C is actually faster than Fortran, why recommend Fortran instead of C? Is is just because of legacy code? Numerical Recipes is available for C as well. Parallelization is super easy with OpenMP (and not bad with MPI). Also, if you are going to be using Python to wrap your lower-level extensions, the array tools be default use C-ordering (row) rather than Fortran-ordering (column), which simplifies writing extensions. Then, if you want to use nice tools in your extension like Lapack, you just get the C-ordered headers, like Lapacke. I know it takes a little bit to get used to pointers and manual memory allocation in C, but that's not bad after you get the hang of it and actually opens up a lot of programming power.

    ReplyDelete
    Replies
    1. Yeah, wrapping code is a good way to go, and probably worth doing for most people.

      The finding that C is faster than Fortran is very new and depends on specific compilers. In any case, they're reporting that it's about 5% faster, which is close enough to make other considerations dominate for me. Fortran is a lot easier--it's basically matlab with variable declarations. But when I was looking for a fast language I had a hard time deciding and almost went with C. What pushed me to Fortran was that it's slightly easier and that several professors and students in my department use Fortran. It's quite common in macro/IF. Having people around who use the same languages makes life easier.

      Delete
  4. I find that while computers are cheap, waiting for results is expensive. I am involved with a group that reimplemented a program I worte in Fortran and later in SAS into a modern language (Python with Numpy). The program grew substantially and is at least an order of magnitude slower. I don't find it as readable either, but that may be my training. There is still a place for compiled ALgol-like languages.

    ReplyDelete
  5. I find that while computers are cheap, waiting for results is expensive. I am involved with a group that reimplemented a program I worte in Fortran and later in SAS into a modern language (Python with Numpy). The program grew substantially and is at least an order of magnitude slower. I don't find it as readable either, but that may be my training. There is still a place for compiled ALgol-like languages.

    ReplyDelete