computing-truths.txt - blog - My website can't be that messy, right?

computing-truths.txt (7363B)

                       Computing Tips/Truths
This is inspired by some "CS Falsehoods" that you can find over the internet[1].
As I do not like having to invert everything I prefer to use truths directly instead, I believe it is more readable this way, at least it makes it easier for me to describe the issues.
I would love to be proved wrong or shown doubts on any of this, thanks a lot if you do.
- This list isn't absolute truth (but I hope me and you exercise enough doubts)
- Yourself/Academia/Internet/Myself/Chuck Norris/Any Computer/… cannot be always right
- We've yet to discover enough of how computers works to know what we're doing (reminder: it's not because you know how to pile bricks that you're a good architect)
- You can sometimes detect after parsing if a program will or will not end (finite-automata / loop with no end condition)
- You cannot detect for all programs if they will or will not end ("The halting problem")
- Most programs can be made to crash (and under most Operating Systems it's All)
- You can render an operating system unusable (Denial-of-Service) probably more easily than you think, even with some restrictions:
	- Fixed: Ping of Death, most but not all security issues, …
	- Mitigated: Forkbombs, using up all memory, using up all of a filesystem (be careful with logs), eating the limit of file descriptors/PIDs/… of the current user or root, …
- Cryptography isn't some magic fairy dust to make something secure (it can actually make it worse)
- There is no magic solutions to make something secure, but there is good practices
- You will need actual debugging tools (gdb/lldb, dtrace, ping, tcpdump/wireshark, …), learn them
- Open-Source isn't as different from Libre/Free-software as some people say
- There is only few CS/IT fields where some advanced math knowledge is actually useful (not to say that you shouldn't give it a shot anyway)
- There isn't a programming language where a human can directly build everything they want
- All programming languages interpreters/compilers aren't written in C (see Go, Haskell, Rust, …)
- One does not simply knows how copyright laws works on the internet (hint: that means worldwide)
- Public Domain in most of Europe is only valid for "Copyright Expiration" (consider licences like CC0)
- A large number of lines does not means that the programmer was efficient
- A small program does not means that it runs efficiently
- A version number isn't a good indicator of quality
- Data decays eternally
- You need threat models for your security
- You do not have a turing machine: RAM is finite, memory allocation can fail
- Your programs aren't alone, try to be a good neighbor
- Always online isn't
- Serverless/Passwordless isn't (also define things by what they are)
- Working Replacements aren't (in)direct extensions of what they replace
## Unique IDs
So called "Unique IDs" aren't always unique:
- A lot of "Unique IDs" can be spoofed or badly generated/stored (quite common for MAC Addresses)
- If you assign IDs sequentially it means that you end up with enumeration and a lack of plausible-deniability and can lead to uniqueness issues if you restore storage from an previous point in time, this should be strongly avoided in internet applications
- In the case of UUIDs, they can be reasonably trusted but be careful on how you use them:
	- "nil" UUID (entirely zero) is valid
	- version 1 should be avoided in settings where time isn't linear (can easily jump backwards, always at the same date on boot, …)
	- version 3 (MD5) and 5 (SHA-1) obviously shouldn't be used as security credentials
	- version 4 is pure random and should be avoided when you have reliable time and can get a large sample size (Birthday problem)
- In decentralized settings consider FlakeIDs, 128-bits k-ordered IDs: 64-bits of milliseconds since UNIX epoch, 48-bits for the node-ID, 16-bits of random/sequence
  Implementation: <https://git.pleroma.social/pleroma/elixir-libraries/flake_id>
  Note on the node-ID: Consider generating a random ID at launch or installation; using another Unique ID like a MAC Address has uniqueness issues and privacy issues
  It's also assumed that a node can reasonably assert if a FlakeID in it's own namespace was already used.
## Correctness
- Asserting correction is hard (I do not believe that languages like F* actually do solve this entirely)
- An audit only shows the problems found, it cannot found all of them
- You cannot build an entirely secure machine
## Parsing
- You cannot parse non-regular languages/formats with one regex
- You cannot validate an email address using one regex (Well, other than `.+`)
- You cannot validate an email address prior to it's actual usage (ie. UTF-8 might not be supported at destination)
- You cannot write a parser for all version numbers
- Some regex languages aren't (regular)
- You can parse non-regular languages/formats with multiple regexes (see lex/yacc, awk, perl, …)
- You cannot parse human formats reasonably well
- PEMDAS isn't enough to express Math evaluation priorities
- Client-side input pattern guards aren't a replacement to server-side validators, if you need to pick one, always pick server-side
- Consider lua patterns instead of some regex dialect
- Prefer non-backtracking regex dialects (Go/RE2) instead of Perl's dialect
## Unix maintains bugs
- You cannot trust PIDs to point to the same program at two different times (making uptime part of the IDs would help a lot though)
## Standards
"The nice thing about standards is that you have so many to choose from." — Andrew S. Tanenbaum; Computer Networks, 2nd ed., p. 254.
- POSIX is not followed by most Unix systems (in fact the certifications seems off/wrong)
- W3C standards are written first; implementations (maybe) later
- 2+ implementations first; description in RFCs later
- Most standards aren't enforced or properly certified (even with considering Correctness issues)
## Time
- Time sometimes goes backwards (And I believe that you need to be able to set an earlier date during runtime for most operating systems)
- You probably cannot represent time (on earth) in one format correctly (as in, ISO8601 isn't absolute and <a href="https://www.jwz.org/blog/2018/02/timezones/">iCal is a gigantic mess</a>)
- In a human representation: Hours aren't between 0 and 23, Minutes aren't between 0 and 59, Seconds aren't between 0 and 59
- We do not have a universal way to count seconds, see RFC3339's introduction and https://what-if.xkcd.com/26/
## Internet lies
- Files in lossy formats do not loose more data over time than lossless ones
## Git
- You can rewrite history (git-filter-branch, git-rebase, git-reset, ...)
## Names
- Consider using something comparable to tag URIs (RFC 4151) to avoid having a namespace that can get conflicts or be burned
- (Human) names are weird, verify any assumption you can have and then still use some doubt. I wrote <https://hacktivis.me/articles/real%20names> for a reason.
## See Also
I would also highly recommend watching https://www.destroyallsoftware.com/talks/ideology which has:
- Functionnal Programming can be efficient enough
- Garbage Collection can be efficient enough
- NULL isn't the only way to represent absence
- Type systems cannot replace tests
- Tests cannot replace types
1: https://www.netmeister.org/blog/cs-falsehoods.html