README.md (2743B)
- # checksrc: Check directory for potential non-source files
- Manpage webview: <https://hacktivis.me/git/checksrc.mdoc/>
- ## Source Code
- ### Practical angle
- On the practical angle, lack of Source Code means:
- - Code which is harder to read, and so helpful for hiding malware
- - Codegen which you can't redo, leading to less maintainable patches
- Combination of both being particularly horrible, as it means you
- can end up forced to rewrite a lot of code or switch to another
- implementation just to get rid of a bug or worse, malware.
- ### Legal angle
- GPLv3:
- > The "source code" for a work means the preferred form of the work
- > for making modifications to it.
- So on the legal angle you need to make sure you're actually shipping
- Source Code to not violate the GPLv3 and licences derived from it.
- As well as similar licences, MPL-2.0 and EUPL-1.2 both have
- similar definitions.
- ## Dependencies
- - POSIX.1-2008 make(1)
- - C99 Compiler
- - C library with `getdents()` function such as with Linux+musl*
- *Will change to POSIX.1-2024 once musl releases with support for posix_getdents
- Amount of dependencies should be low as it is designed for cases like
- [bootstrap-initrd](https://hacktivis.me/git/bootstrap-initrd/).
- Which for example means file/libmagic won't be used.
- ## Detections
- ### Minor
- Throws a warning but doesn't stops reading the file:
- - dump: block of 10 consecutive lines with lengths varying by less than 3 bytes (to detect hex dumps, base64, …)
- - more punctuation, symbols, and numbers than letters (within 4KB blocks)
- ### Major
- Throws an error, stops reading the file, exits unsuccessfully:
- - minified code: average line length of more than 100 characters (within 4KB blocks)
- - non-printable character (byte under 0x20 other than `\n`, `\r`, `\t`)
- - 3 blocks of dump (see Minor)
- - (planned) string indicating generated code
- ## Difference with deblob
- [deblob](https://hacktivis.me/projects/deblob) is designed for fast
- automated removals with very low false-positives by using file
- signatures in their magic header.
- `checksrc` meanwhile only stops reading a file when a major issue
- is found, to detect cases like `shar(1)` archives where the start
- of the file looks like regular code.
- So `checksrc` can take much longer depending on the payloads.
- ## Difference with other tools
- Debian suspicious-source: Python; Only checks for files not matching
- a list of libmagic-detected MIME types and file extensions.
- <https://github.com/fosslinux/problematic-source>: Python; Only checks
- for known strings of code generators and some generic ones,
- also bases itself on libmagic-detected MIME types.
- ```
- SPDX-FileCopyrightText: 2017 Haelwenn (lanodan) Monnier <contact+checksrc@hacktivis.me>
- SPDX-License-Identifier: MPL-2.0
- ```