logo

checksrc

Check directory for potential non-source files git clone https://anongit.hacktivis.me/git/checksrc.git

README.md (2743B)


  1. # checksrc: Check directory for potential non-source files
  2. Manpage webview: <https://hacktivis.me/git/checksrc.mdoc/>
  3. ## Source Code
  4. ### Practical angle
  5. On the practical angle, lack of Source Code means:
  6. - Code which is harder to read, and so helpful for hiding malware
  7. - Codegen which you can't redo, leading to less maintainable patches
  8. Combination of both being particularly horrible, as it means you
  9. can end up forced to rewrite a lot of code or switch to another
  10. implementation just to get rid of a bug or worse, malware.
  11. ### Legal angle
  12. GPLv3:
  13. > The "source code" for a work means the preferred form of the work
  14. > for making modifications to it.
  15. So on the legal angle you need to make sure you're actually shipping
  16. Source Code to not violate the GPLv3 and licences derived from it.
  17. As well as similar licences, MPL-2.0 and EUPL-1.2 both have
  18. similar definitions.
  19. ## Dependencies
  20. - POSIX.1-2008 make(1)
  21. - C99 Compiler
  22. - C library with `getdents()` function such as with Linux+musl*
  23. *Will change to POSIX.1-2024 once musl releases with support for posix_getdents
  24. Amount of dependencies should be low as it is designed for cases like
  25. [bootstrap-initrd](https://hacktivis.me/git/bootstrap-initrd/).
  26. Which for example means file/libmagic won't be used.
  27. ## Detections
  28. ### Minor
  29. Throws a warning but doesn't stops reading the file:
  30. - dump: block of 10 consecutive lines with lengths varying by less than 3 bytes (to detect hex dumps, base64, …)
  31. - more punctuation, symbols, and numbers than letters (within 4KB blocks)
  32. ### Major
  33. Throws an error, stops reading the file, exits unsuccessfully:
  34. - minified code: average line length of more than 100 characters (within 4KB blocks)
  35. - non-printable character (byte under 0x20 other than `\n`, `\r`, `\t`)
  36. - 3 blocks of dump (see Minor)
  37. - (planned) string indicating generated code
  38. ## Difference with deblob
  39. [deblob](https://hacktivis.me/projects/deblob) is designed for fast
  40. automated removals with very low false-positives by using file
  41. signatures in their magic header.
  42. `checksrc` meanwhile only stops reading a file when a major issue
  43. is found, to detect cases like `shar(1)` archives where the start
  44. of the file looks like regular code.
  45. So `checksrc` can take much longer depending on the payloads.
  46. ## Difference with other tools
  47. Debian suspicious-source: Python; Only checks for files not matching
  48. a list of libmagic-detected MIME types and file extensions.
  49. <https://github.com/fosslinux/problematic-source>: Python; Only checks
  50. for known strings of code generators and some generic ones,
  51. also bases itself on libmagic-detected MIME types.
  52. ```
  53. SPDX-FileCopyrightText: 2017 Haelwenn (lanodan) Monnier <contact+checksrc@hacktivis.me>
  54. SPDX-License-Identifier: MPL-2.0
  55. ```