Computing Tips/Truths This is inspired by some "CS Falsehoods" that you can find over the internet[1]. As I do not like having to invert everything I prefer to use truths directly instead, I believe it is more readable this way, at least it makes it easier for me to describe the issues. I would love to be proved wrong or shown doubts on any of this, thanks a lot if you do. - This list isn't absolute truth (but I hope me and you exercise enough doubts) - Yourself/Academia/Internet/Myself/Chuck Norris/Any Computer/… cannot be always right - We've yet to discover enough of how computers works to know what we're doing (reminder: it's not because you know how to pile bricks that you're a good architect) - You can sometimes detect after parsing if a program will or will not end (finite-automata / loop with no end condition) - You cannot detect for all programs if they will or will not end ("The halting problem") - Most programs can be made to crash (and under most Operating Systems it's All) - You can render an operating system unusable (Denial-of-Service) probably more easily than you think, even with some restrictions: - Fixed: Ping of Death, most but not all security issues, … - Mitigated: Forkbombs, using up all memory, using up all of a filesystem (be careful with logs), eating the limit of file descriptors/PIDs/… of the current user or root, … - Cryptography isn't some magic fairy dust to make something secure (it can actually make it worse) - There is no magic solutions to make something secure, but there is good practices - You will need actual debugging tools (gdb/lldb, dtrace, ping, tcpdump/wireshark, …), learn them - Open-Source isn't as different from Libre/Free-software as some people say - There is only few CS/IT fields where some advanced math knowledge is actually useful (not to say that you shouldn't give it a shot anyway) - There isn't a programming language where a human can directly build everything they want - All programming languages interpreters/compilers aren't written in C (see Go, Haskell, Rust, …) - One does not simply knows how copyright laws works on the internet (hint: that means worldwide) - Public Domain in most of Europe is only valid for "Copyright Expiration" (consider licences like CC0) - A large number of lines does not means that the programmer was efficient - A small program does not means that it runs efficiently - A version number isn't a good indicator of quality - Data decays eternally - You need threat models for your security - You do not have a turing machine: RAM is finite, memory allocation can fail - Your programs aren't alone, try to be a good neighbor - Always online isn't - Serverless/Passwordless isn't (also define things by what they are) - Working Replacements aren't (in)direct extensions of what they replace ## Unique IDs So called "Unique IDs" aren't always unique: - A lot of "Unique IDs" can be spoofed or badly generated/stored (quite common for MAC Addresses) - If you assign IDs sequentially it means that you end up with enumeration and a lack of plausible-deniability and can lead to uniqueness issues if you restore storage from an previous point in time, this should be strongly avoided in internet applications - In the case of UUIDs, they can be reasonably trusted but be careful on how you use them: - "nil" UUID (entirely zero) is valid - version 1 should be avoided in settings where time isn't linear (can easily jump backwards, always at the same date on boot, …) - version 3 (MD5) and 5 (SHA-1) obviously shouldn't be used as security credentials - version 4 is pure random and should be avoided when you have reliable time and can get a large sample size (Birthday problem) - In decentralized settings consider FlakeIDs, 128-bits k-ordered IDs: 64-bits of milliseconds since UNIX epoch, 48-bits for the node-ID, 16-bits of random/sequence Implementation: Note on the node-ID: Consider generating a random ID at launch or installation; using another Unique ID like a MAC Address has uniqueness issues and privacy issues It's also assumed that a node can reasonably assert if a FlakeID in it's own namespace was already used. ## Correctness - Asserting correction is hard (I do not believe that languages like F* actually do solve this entirely) - An audit only shows the problems found, it cannot found all of them - You cannot build an entirely secure machine ## Parsing - You cannot parse non-regular languages/formats with one regex - You cannot validate an email address using one regex (Well, other than `.+`) - You cannot validate an email address prior to it's actual usage (ie. UTF-8 might not be supported at destination) - You cannot write a parser for all version numbers - Some regex languages aren't (regular) - You can parse non-regular languages/formats with multiple regexes (see lex/yacc, awk, perl, …) - You cannot parse human formats reasonably well - PEMDAS isn't enough to express Math evaluation priorities - Client-side input pattern guards aren't a replacement to server-side validators, if you need to pick one, always pick server-side - Consider lua patterns instead of some regex dialect - Prefer non-backtracking regex dialects (Go/RE2) instead of Perl's dialect ## Unix maintains bugs - You cannot trust PIDs to point to the same program at two different times (making uptime part of the IDs would help a lot though) ## Standards "The nice thing about standards is that you have so many to choose from." — Andrew S. Tanenbaum; Computer Networks, 2nd ed., p. 254. - POSIX is not followed by most Unix systems (in fact the certifications seems off/wrong) - W3C standards are written first; implementations (maybe) later - 2+ implementations first; description in RFCs later - Most standards aren't enforced or properly certified (even with considering Correctness issues) ## Time - Time sometimes goes backwards (And I believe that you need to be able to set an earlier date during runtime for most operating systems) - You probably cannot represent time (on earth) in one format correctly (as in, ISO8601 isn't absolute and iCal is a gigantic mess) - In a human representation: Hours aren't between 0 and 23, Minutes aren't between 0 and 59, Seconds aren't between 0 and 59 - We do not have a universal way to count seconds, see RFC3339's introduction and https://what-if.xkcd.com/26/ ## Internet lies - Files in lossy formats do not loose more data over time than lossless ones ## Git - You can rewrite history (git-filter-branch, git-rebase, git-reset, ...) ## Names - Consider using something comparable to tag URIs (RFC 4151) to avoid having a namespace that can get conflicts or be burned - (Human) names are weird, verify any assumption you can have and then still use some doubt. I wrote for a reason. ## See Also I would also highly recommend watching https://www.destroyallsoftware.com/talks/ideology which has: - Functionnal Programming can be efficient enough - Garbage Collection can be efficient enough - NULL isn't the only way to represent absence - Type systems cannot replace tests - Tests cannot replace types 1: https://www.netmeister.org/blog/cs-falsehoods.html