How bad can it git? Characterizing secret leakage in public GitHub repositories : the morning paper

boru 25th April 2019 at 3:47pm
Public Webnote

Despite best intentions, lots of keys do get leaked (a median of 1,793 unique keys every day).

Using the search approach outlined in the paper, the median time to discovery for a key leaked to GitHub is 20 seconds, with times ranging from half a second to over 4 minutes. In other words, in the time it takes you to go “oh s*&#”!, Did I just…?” and take a look, it’s probably already too late.

Many leaked secrets remain in GitHub repos for a long time (81% remain after 16 days)

We also found AWS credentials for the website of a major government agency in a Western European country…

The regular expressions can then be used to scan the candidate files from the first phase, with any matches considered “candidate secrets”. These candidate secrets are then passed through a set of filters designed to reduce false negatives

Three validity filters were used to remove false positives:

An entropy filter, which catches secrets with very low entropy A words filter, which catches secrets containing common dictionary words of length at least 5 A pattern filter looking for repeated characters (e.g. ‘AAAA’), ascending characters (‘ABCD’) and descending characters (‘DBCA’)

Get Shit Done v2

Some random thoughts.