Bulk updating multiple repos with all-repos
I like the open source communities and all the developments that they enable. As a developer, I like to have the ability to solve my own problems, but when someone else already solved the problem, it’s even better. However, when nobody did, I like to share my solution with others, if possible and practical. I’ve done it enough times that I’m beginning to have a decent setup with a project template that gives me the boilerplate I need to get started quickly, and not waste too much time packaging and publishing my solution.
However, after a while, maintaining all these projects can become overwhelming. And when external things change, like adding support for a new version of a language/framework, dropping an old one, or adapting to how a tool works, it can be tedious to do bulk updates. Of course, in this case, I’m not the first one to hit this problem, and there are a number of solutions out there.
I’ve been using all-repos for about a year and a half, and I’m quite happy with it. It’s a tool that allows you to run a command on all or some of your GitHub repositories. It’s a command line tool written in Python, that’s overall quite easy to use, with an API that has a bit of a learning curve, but is very powerful. In this post, I’ll show you how I use it to update my projects.
I like to keep all my code in git and save it in GitHub, so I’ve created a repo for using it. So, here we are, one more repo… to manage all my repos. One step back to move forward, I guess.
Splitting the config
The recommended usage is to create a config file,
all-repos.json, that contains some config controlling behaviour, as well as your credentials.
However, often switch machine, and I wanted to store the config in source control to be able to share it more easily and only keep the credentials outside. This isn’t supported by all-repos, and it seems to be an intentional design decision, and the author showed no intention to support it, so I created a package to do it: all-repos-envvar. That’s now 2 more repos to maintain… I hope that’s worth it!
Learning the library
The library ships several separate CLI tools, but I must be missing their point, and actually I don’t even use them.
What I really find useful is the autofixer, which is documented at the bottom of the README. The documentation describes the API, and then finally gives an example autofixer, which has the boilerplate you need. The documentation then links to the built-in autofixers which are great examples on what you can do.
So in my project, I have:
pyproject.tomlfile with the dependencies I need, managed by Poetry.
all-repos.jsonfile that contains the config for all-repos, in git. It controls how I fetch the repos, how changes are submitted (pull request), the organisations I want to include, and some repos I want to exclude.
- My GitHub credentials, in a
.envfile that I keep outside of source control.
- Finally, I copied the given boilerplate and wrote a
run_fix.pyfile which I keep changing for my need of the day.
Python 3.7 recently reached EOL, so that was a good opportunity to use the tool to update all my projects in bulk. Here is how the script looked like:
import argparse from pathlib import Path from all_repos import autofix_lib from all_repos.grep import repos_matching # Find repos that have this file... FILE_NAME = "pyproject.toml" # ... and which content contains this string. FILE_CONTAINS = "tool.poetry" # Git stuff GIT_COMMIT_MSG = "feat: drop support for Python 3.7" GIT_BRANCH_NAME = "feat/drop-python-3.7" def apply_fix(): """Apply fix to a matching repo.""" pyproject_toml = Path("pyproject.toml") content = pyproject_toml.read_text() # Be idempotent if 'python = "^3.8"' in content: return new_content = content.replace( 'python = "^3.7"', 'python = "^3.8"', ) pyproject_toml.write_text(new_content) ci_yml = Path(".github/workflows/ci.yml") if ci_yml.exists(): content = ci_yml.read_text() content = content.replace( 'python-version:\n - "3.7"\n', "python-version:\n", ) ci_yml.write_text(content) pc_yml = Path(".pre-commit-config.yaml") if pc_yml.exists(): content = pc_yml.read_text() content = content.replace( '--py37-plus', '--py38-plus', ) pc_yml.write_text(content) # You shouldn't need to change anything below this line def find_repos(config) -> set[str]: """Find matching repos using git grep.""" repos = repos_matching( config, (FILE_CONTAINS, "--", FILE_NAME), ) return repos def main(): """Entry point.""" # Needed to add all-repos command line options # and ability to parse config parser = argparse.ArgumentParser() autofix_lib.add_fixer_args(parser) args = parser.parse_args(None) repos, cfg, commit, stg = autofix_lib.from_cli( args, find_repos=find_repos, msg=GIT_COMMIT_MSG, branch_name=GIT_BRANCH_NAME, ) autofix_lib.fix( repos, apply_fix=apply_fix, config=cfg, commit=commit, autofix_settings=stg, ) return 0 if __name__ == "__main__": raise SystemExit(main())
Ok that’s a lot of code, so let’s break it down.
Fist off, I defined a few constants at the top of the file, which I always need to change:
FILE_...ones are used to detect whether the repository is suitable for this change: in this example, a node project would be filtered out with this mechanism, in the
GIT_...ones are for the git commit message and branch. If anything changes, the tool opens a pull request which I can review before merging.
This is where the actual code change happens. This function runs in the context of a matching repo, and the current directory is the top of the repo. You can read and write file as you please or run commands with
autofix_lib.run(...). It’s a good idea to start with a check to verify that the fix hasn’t run yet, just in case.
This is mainly boilerplate, it’s essentially a wrapper around
git grep and uses 2 of the constants I explained earlier. If you want to learn more, I suggest to go read the source code of
main() function is the entry point. It’s called when running
python run_fix.py, and it mostly calls the autofix API with our parameters. This is a slightly simplified version from the boilerplate.
Lifecycle of the script
I keep this script around and just change it in place, committing each time I change it to make a bulk change. It’s usually some one-off operations, but it’s quite useful to have some history in git in case I need to do a similar task again, like when I’ll need to drop Python 3.8. You can check its history on GitHub to see how it evolved.
On the downside, one might argue that the code is a bit verbose, but in my opinion it’s a good thing. I’m comfortable with Python, it does the job and let me use all of its power to do the changes I need, and step into a debugger when I need to.
I think that’s it! I hope this was useful, and will help make the all-repos tool more accessible. It’s quite powerful, although I found that it had lots of things I didn’t need.