Before leaving, I wanted to publicly release some or our internal repository (hosted on our private gitlab instance <gitlab>) to ensure the commit history is not lost, and that it can better survive my departure.
To get rid of eventual sensitive information dissaminated within that commit history, the best tool I’m aware of is BFG.

Below are notes taken out when “exposing” the sources of the ULHPC Technical Documentation, initially hosted within the www/ulhpc-docs repository onto [github under the ULHPC/ulhpc-docs.

Create a Mirrored clone

1
2
3
4
# AFTER commit for file removal (see below)
$ git clone --mirror ssh://git@<gitlab>:<port>/www/ulhpc-docs.git ulhpc-docs.bare # --mirror assumes --bare
$ cd ulhpc-docs.bare
$ git remote set-url origin git@github.com:ULHPC/ulhpc-docs.git    # change remote to target github

File removal

To remove deploy instructions and targets (used to be defined in .Makefile.local), simply delete the file in a commit, and run:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
$ bfg --delete-files .Makefile.local ulhpc-docs.bare    # After a commit to delete the file
Using repo : /Users/svarrette/git/<gitlab>/www/ulhpc-docs.bare
Found 604 objects to protect
Found 20 commit-pointing refs : HEAD, refs/heads/master, refs/heads/production, ...
Found 3 tag-pointing refs : refs/tags/v0.0.1-b14, refs/tags/v0.0.2-b106, refs/tags/v0.1.0-b407
Cleaning
--------
Found 505 commits
Cleaning commits:       100% (505/505)
Cleaning commits completed in 375 ms.
Updating 21 Refs
----------------
	Ref                            Before     After
	--------------------------------------------------
	refs/heads/master            | b08bea0a | abc367fd
	refs/heads/production        | fdd68cd2 | 8e0609a8
	refs/merge-requests/11/head  | ed61f9b1 | caafc1fc
	refs/merge-requests/11/merge | 9b5c932e | 4a7ae119
	refs/merge-requests/24/head  | 364023f6 | b2e5abf2
	refs/merge-requests/24/merge | be0ae048 | aa5f79f2
	refs/merge-requests/25/head  | bdf3a65a | 542077d9
	refs/merge-requests/25/merge | 53cbb77f | 4bc1dc89
	refs/merge-requests/26/head  | 60404b7c | 8cdf3234
	refs/merge-requests/26/merge | 27c024c0 | d7ed877c
	refs/merge-requests/27/head  | d4ee2275 | 10a07b86
	refs/merge-requests/27/merge | 0a0b7184 | e5bb3ba9
	refs/merge-requests/28/head  | 767dffb7 | 6a2273f4
	refs/merge-requests/28/merge | 5289a9a7 | 7941a7b4
	refs/merge-requests/29/head  | 50ae58dd | 7202d70f
	...
Commit Tree-Dirt History
------------------------
	Earliest                                              Latest
	|                                                          |
	.DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

	D = dirty commits (file tree fixed)
	m = modified commits (commit message or parents changed)
	. = clean commits (no changes to file tree)
	                        Before     After
	-------------------------------------------
	First modified commit | 3d71b000 | 6772e0d5
	Last dirty commit     | 6c4ad1cc | 4b5c593d
Deleted files
-------------
	Filename          Git id
	------------------------------------------------------------
	.Makefile.local | 0b2c4745 (2.0 KB), d8dea6ea (1019 B ), ...
In total, 957 object ids were changed. Full details are logged here:
	/Users/svarrette/git/<gitlab>/www/ulhpc-docs.bare.bfg-report/2022-08-19/13-13-26
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

Delete swap/backup files:

1
2
3
$ bfg --delete-files '*.swp'   ulhpc-docs.bare
$ bfg --delete-files '*~'      ulhpc-docs.bare
$ bfg --delete-files '*.bak'   ulhpc-docs.bare

Remove unecessary files

1
$ bfg --delete-files test.yml ulhpc-docs.bare

Litteral Pattern removal

  • Remove specific text occurences from commits
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
$ vim pattern_ulhpc-docs_to_filter.txt   # 1 pattern / literal per line
$ bfg --replace-text pattern_ulhpc-docs_to_filter.txt ulhpc-docs.bare
Using repo : /Users/svarrette/git/<gitlab>/www/ulhpc-docs.bare
Found 604 objects to protect
Found 20 commit-pointing refs : HEAD, refs/heads/master, refs/heads/production, ...
Found 3 tag-pointing refs : refs/tags/v0.0.1-b14, refs/tags/v0.0.2-b106, refs/tags/v0.1.0-b407
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
 * commit abc367fd (protected by 'HEAD') - contains 1 dirty file :
	- docs/accounts/index.md (6.5 KB)
WARNING: The dirty content above may be removed from other commits, but as
the *protected* commits still use it, it will STILL exist in your repository.
Details of protected dirty content have been recorded here :
/Users/svarrette/git/<gitlab>/www/ulhpc-docs.bare.bfg-report/2022-08-19/13-34-25/protected-dirt/
If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.
Cleaning
--------
Found 505 commits
Cleaning commits:       100% (505/505)
Cleaning commits completed in 523 ms.
Updating 20 Refs
----------------
	Ref                            Before     After
	--------------------------------------------------
	refs/heads/master            | abc367fd | d4120bb9
	refs/heads/production        | 8e0609a8 | a10e7e12
	refs/merge-requests/11/head  | caafc1fc | 5b0ef797
	refs/merge-requests/11/merge | 4a7ae119 | d683ee13
	refs/merge-requests/24/head  | b2e5abf2 | af111fda
	refs/merge-requests/24/merge | aa5f79f2 | 9fd2b961
	refs/merge-requests/25/head  | 542077d9 | fcdfb860
	refs/merge-requests/25/merge | 4bc1dc89 | 1ad9a266
	refs/merge-requests/26/head  | 8cdf3234 | 5bd53370
	refs/merge-requests/26/merge | d7ed877c | a1cae457
	refs/merge-requests/27/head  | 10a07b86 | 997823db
	refs/merge-requests/27/merge | e5bb3ba9 | b16c50bb
	refs/merge-requests/28/head  | 6a2273f4 | 13e76163
	refs/merge-requests/28/merge | 7941a7b4 | b6c3a502
	refs/merge-requests/29/head  | 7202d70f | da5f2f77
	...
Updating references:    100% (20/20)
...Ref update completed in 48 ms.
Commit Tree-Dirt History
------------------------
	Earliest                                              Latest
	|                                                          |
	..DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
	D = dirty commits (file tree fixed)
	m = modified commits (commit message or parents changed)
	. = clean commits (no changes to file tree)
	                        Before     After
	-------------------------------------------
	First modified commit | 5ddb21b8 | 3e44aa5c
	Last dirty commit     | 97a608f6 | 6d8ea242
Changed files
-------------
	Filename       Before & After
	------------------------------------------------------------
	index.md     | a0804fef ⇒ e541a13a, 118976d4 ⇒ 6895f223, ...
	ipa.md       | 54a0db40 ⇒ b4584394
	mkdocs.yml   | 7f866a55 ⇒ 0b825df3, 78cf5492 ⇒ cb00d91e, ...
	passwords.md | cbaaf304 ⇒ 5e93b26b, 2d31bcae ⇒ 3c234b31
In total, 1305 object ids were changed. Full details are logged here:
	/Users/svarrette/git/<gitlab>/www/ulhpc-docs.bare.bfg-report/2022-08-19/13-34-25
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

For some reason, an occurence of the sensitive information I wanted to remove used to remain (you can check it with git log -S<pattern> to search the pattern within all commits). So I repeated with --no-blob-protection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
$ bfg --replace-text pattern_ulhpc-docs_to_filter.txt --no-blob-protection ulhpc-docs.bare
Using repo : /Users/svarrette/git/<gitlab>/www/ulhpc-docs.bare
Found 0 objects to protect
Found 20 commit-pointing refs : HEAD, refs/heads/master, refs/heads/production, ...
Found 3 tag-pointing refs : refs/tags/v0.0.1-b14, refs/tags/v0.0.2-b106, refs/tags/v0.1.0-b407
Protected commits
-----------------
You're not protecting any commits, which means the BFG will modify the contents of even *current* commits.
This isn't recommended - ideally, if your current commits are dirty, you should fix up your working copy and commit that, check that your build still works, and only then run the BFG to clean up your history.
Cleaning
--------
Found 505 commits
Cleaning commits:       100% (505/505)
Cleaning commits completed in 322 ms.
Updating 1 Ref
--------------
	Ref                 Before     After
	---------------------------------------
	refs/heads/master | d4120bb9 | dc17a918
Updating references:    100% (1/1)
...Ref update completed in 40 ms.
Commit Tree-Dirt History
------------------------
	Earliest                                              Latest
	|                                                          |
	...........................................................D
	D = dirty commits (file tree fixed)
	m = modified commits (commit message or parents changed)
	. = clean commits (no changes to file tree)
	                        Before     After
	-------------------------------------------
	First modified commit | adff298b | 55ddf9df
	Last dirty commit     | d4120bb9 | dc17a918
Changed files
-------------
	Filename   Before & After
	------------------------------
	index.md | 24b6cb9c ⇒ 6fde6054
In total, 5 object ids were changed. Full details are logged here:
	/Users/svarrette/git/<gitlab>/www/ulhpc-docs.bare.bfg-report/2022-08-19/13-47-31
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

Checks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cd ulhpc-docs.bare
# Search for file
❯ git ls-tree --full-tree -r HEAD | grep Makefile
160000 commit 42333c7f1e58e8205c94a40ce4aa1074eb91e89f	.submodules/Makefiles
100644 blob 9c13714c359aca59c48746beec4ddc5b64131f6c	Makefile
❯ git ls-tree --full-tree -r HEAD~1 | grep Makefile
160000 commit 42333c7f1e58e8205c94a40ce4aa1074eb91e89f	.submodules/Makefiles
100644 blob 9c13714c359aca59c48746beec4ddc5b64131f6c	Makefile
# You can check the difference in the **original** repo
# UNDER www/ulhpc-docs:
# ❯ git ls-tree --full-tree -r HEAD | grep Makefile
# 160000 commit 42333c7f1e58e8205c94a40ce4aa1074eb91e89f	.submodules/Makefiles
# 100644 blob 9c13714c359aca59c48746beec4ddc5b64131f6c	Makefile
# ❯ git ls-tree --full-tree -r HEAD~1 | grep Makefile
# 100644 blob 57ceeb98e652caadf0c909fcfd87210434fb67ff	.Makefile.local
# 160000 commit 42333c7f1e58e8205c94a40ce4aa1074eb91e89f	.submodules/Makefiles
# 100644 blob 9c13714c359aca59c48746beec4ddc5b64131f6c	Makefile
#
# check for pattern '<pattern>' across all commits, should return nothing - also works with: tig -S<pattern>
$ tig -S<pattern>    # or git log -S<pattern>

Final cleanup before push

Carefully recheck the commits

1
tig -S{<pattern1>,<pattern2>,...}

Then as per doc:

The BFG will update your commits and all branches and tags so they are clean, but it doesn’t physically delete the unwanted stuff. Examine the repo to make sure your history has been updated, and then use the standard git gc command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements

1
2
cd ulhpc-docs.bare
git reflog expire --expire=now --all && git gc --prune=now --aggressive

Re-check the remote url

1
2
3
$ git remote -v
origin	git@github.com:ULHPC/ulhpc-docs.git (fetch)
origin	git@github.com:ULHPC/ulhpc-docs.git (push)

And push