ERROR CODES and THE law OF least ASTONISHMENT

Mar 08 2022

Do you know the law of least astonishment? I am not sure of its origin, but I first learned it from the exceptional “Tao of Programming.” simply put, it is the principle that software must always respond to the users in a way that least astonishes them. In other words, printing a paper shouldn’t erase it from your file system.

Following the law of least astonishment, what must a program do when it hits a hard error? You might say that it must let the user know. Unfortunately, lots of systems just brush it under the rug these days.

I think it started with Windows. Or maybe the Mac. The thinking goes that end users are too silly or too terrified of error codes or in-depth messages so we are just leaving them out. case in point: My wife’s iphone wouldn’t publish pictures. I’m no expert considering that I carry an Android device, but I agreed to look at it. No matter what I tried, I got the same useless message: “Can’t publish photos ideal now. Please try again later.” Not only is this not very informative, but it also implies the problem is in something that might fix itself later like the network.

The real culprit? The iCloud terms of service had changed and she had not accepted the new contract. I have a feeling it might have popped up asking her to do that at some point, but for whatever reason she missed it. until you dug into the settings and checked the box to agree to those terms, “later” was never going to happen.

But it isn’t just iPhones. Windows is full of things like that and you only hope there will be a log in the event customer with a lot more details. I also see a lot more of it now on Linux, although there is normally a log file somewhere if you know how to find it. While I get it that programs having errors run the risk of astonishing the user, it is even a lot more astonishing if there’s no explanation of what’s wrong. imagine if your bank sent you a note: there is a problem with your account. So you respond: “Did I overdraw?” They reply, “No.” now what? That’s the state of lots of software errors today.

There’s really no excuse on desktop systems or websites. However, you might want to forgive tiny embedded systems. Don’t! I recently ported the 3D printer firmware Marlin to an ANET A8 board — an 8-bit processor with little memory — that had been on Repetier firmware for lots of years. The first time I tried to do an autolevel probe I got the message: Probing failed. That’s it.

I’ll grant you, that you can turn on autolevel debugging to get a lot more information, but I’m already at 98% flash utilization, so that would require temporarily removing a bunch of features and rebuilding the code. but why not do like we would do in the old days:

unit global_error=0;
void do_something(void) {
global_error=1;
if (process1()==FAIL) return;
global_error++;
if (process2()==FAIL) return;
. . .

global_error=0;
return;
}
This doesn’t take much space. now you can report something like Probing failed (8) and I can at least go to the code and figure out what the 8th step was that failed. I’m sure someone would even post a list of codes and what they indicated in a case like that.

Too much overhead? tell me the program counter where the error happened. That used to be a pretty common practice. Granted, it requires you to have a memory map file and know how to read it but it is still better than nothing.

We spend a lot of time thinking about how projects and software must work. but we need to spend time thinking, too, about what happens when they don’t work. It is fine that we can do in-circuit debugging or hook up a logic analyzer, but that won’t help our users. even if it is just for you, why not make it a little much easier on yourself?

As we have said before, “There’s no such thing as too much information.” In addition to guarding against system errors, you can also help users not to astonish themselves.

Image Credit: [Elisa Ventur] by means of Unsplash.com

Posted inUncategorized

Leave a Reply Cancel reply