The Estimator (Work in Progress) | Frank Mitchell's Blog

Origin

When I worked in Chicago supporting trading desks at a succession of big banks, I noticed how long calculating risk analytics took. The general procedure was to take the current market price of a portfolio, adjust some parameter like interest rates or the currend date, take the difference, divide by the unit of change, and call it a derivative. (Not to be confused with traded derivatives).

“Wouldn’t it be nice,” I told myself, “if we could estimate how many calculations it would take to analyze a portfolio, multiply that by the time to do one calculation, and warn the user it might take a while?” I called it the Estimation Framework because everything was a “framework” in the 1990s and early 2000s. Then I wrote it on a piece of paper, and eventually on other pieces of paper and in various files, for thirty-some years.

One day I was watching my duplicate-files.rb script chugging on one of my larger home directories and thought “Wouldn’t it be nice if …” But this time I resolved to give the thing a progress bar.

Development

First Attempt

As we see in the current duplicate-files.rb, the actual instrumentation isn’t that complicated:

(ll 53-137) Design a class or API that’s notified of every iteration and updates a progress bar or similar UI accordingly.
(ll 188, 198, 222, 238) Instrument the iterative loop with a callback that reports on every pass through the loop.
(ll 346, 70-82) Estimate the number of iterations based on the size and distribution of data.
(l 340) Tie it all together, and voila!

Except, well, it didn’t come together. My estimation algorithm was ridiculously off, even more so when each file has multiple duplicates. I tried the choose function and even factorial but nothing came within an order of magnitude. Was my comparison algorithm inefficient? Or do I just not estimate well?

So I added yet another API to collect actual times (ll 329, 333, 342, 346, 354) and write out some raw data (ll 302-324, 360-367) that I could analyze to figure out where I went wrong.

And Then …

Well, there’s no answer yet. I’m going to write a quick Lua script to parse the JSON performance logs. This allows me to run various alternative functions on the data without wasting time collecting that data. But it’s obvious that’s the real work, not instrumenting code and collecting progress numbers.

I told you this was a work in progress …