You are preparing for an interview for a job (build/release engineer or DevOps engineer) where a requirement is "knowledge of branching strategies." There are different patterns, models, paradigms, workflows and even philosophies associated with branches in repositories. What is knowledge of branching strategies in the context of code versioning systems and the CI/CD pipeline?
***Updated in January of 2022.***
Open source projects may have different branching strategies from enterprises developing in-house proprietary software. Code versioning systems are the holders of repositories subject to branching. CVSes can be referred to as SCMs (Source Code Managers or Source Control Managers), VCSes (Version Control Systems), or DVCSes (such as Distributed Version Control Systems). Distributed version control systems (contribute to self-organizing teams (according to page 394 of Continuous Delivery) which is desirable from an Agile perspective.
Some companies have big development teams whereas others have small teams. The tolerance for unstable repositories varies from organization to organization. Some companies allow for unstable repositories where many developers have the ability to write changes. Other enterprises give read-only access to the majority of the employees to ensure the repositories are stable; this would normally require a dedicated release manager (in the form of an individual such as a lead developer or a merge team, Continuous Delivery page 407). The internal dynamics may dictate whether or not you will create a branch of a source code repository or not and how that branch will exist. Professionally we advise you to be the flexible type of person who can work according to how the company tells you.
If your professional position will determine the branching strategy, now in 2022, there are two major categories of branching strategies: trunk-based development (that would include mainline development and branching by abstraction) and regular branching (by feature, team, release etc.).
Trunk-based Development / Mainline Development without Branches
"It only takes two branches performing a refactoring in a tightly coupled codebase to bring the entire team to a halt when one of them merges. It bears repeating that branching is fundamentally antithetical to continuous integration" (Continuous Delivery, page 410).
One option is "trunk-based development." A quick, but technically inaccurate, description of this strategy is to not have other branches. Code can be committed directly to the mainline (or main branch of the repository). Trunk-based development involves all branches ultimately being merged to the main branch (Continuous Delivery, page 405). (Technically you can still have branches with trunk-based development, but the branches you would have would be ephemeral.)
The DevOps Handbook cites Gary Gruver (on pages 144 and 145), and his success with not having another branch. Trunk-based development is the lack of a second branch. Forking a repository is creating a second repository; it is like cloning one but "fork" connotes the publication of an independent copy of a repository. A branch is an independent copy of a repository within a repository for the purpose of possibly merging it with the trunk. To develop in a mainline way is another phrase that describes trunk-based development. It can be worthwhile to forgo a branch for a single-source of truth. It can deter developers from committing code if it means that they will need to do a significant amount of work. They may do more testing before they commit their code. This strategy may be accepted by the business if there is little to no need for a multi approval process; some development projects lend themselves to this. Code versioning systems by their very nature preserve archived versions, thus trunk-based development may work for numerous businesses.
Trunk-based development can be ideal for small teams because each person must have knowledge of the implications. Additionally it can be ideal for developers who are responsible for managing the production environment or those developers who work on low-risk and iterative changes. The DevOps Handbook recommends it for large teams because as the number of developers increases, the amount of work for merging branches goes up exponentially (page 147). The DevOps Handbook says that trunk-based development is associated with greater productivity (page 151).
Some CI/CD pipelines have a feature that code commits are rejected if any build or deployment tests fail. Individual code commits or PRs are undone in what are called "gated commits" (page 148 of The DevOps Handbook). Gated commits are implemented to ensure that the main branch is releasable. They are a feature in some trunk-based development environments.
The DevOps Handbook (page 186) cites Paul Hammant who gives mixed treatment toward trunk-based development. Paul Hammant, depending on the circumstances, can recommend "branching by abstraction" (a term he coined). As branching by abstraction is a strategy that does not have a repository branch, we mention it here in the section of trunk-based development. It is a strategy that is ideal for major refactoring of an application (according to page 350 of Continuous Delivery).
If you are skeptical about trunk-based development (and you think it would hinder using a repository from a pragmatic perspective), see the section of the book Continuous Delivery (by Humble and Farley) "Keeping Your Application Releaseable" on page 346.
Trunk-based development is ideal for Terraform development and operations. An O'Reilly publication on Terraform says "…for any shared environment (e.g., Stage, Prod), always deploy from a single branch." (Taken from page 304 of Terraform: Up & Running, 2nd Edition by Yevgeniy Brikman (O'Reilly), Copyright 2019, 978-1-492-04690-5.) The DevOps Handbook says (on page 117) that failures in code migrations are more commonly attributed to differences between the source and destination environments than problems with the code itself (e.g., a lack of robustnesses or sufficient exception handling).
A big advantage of trunk-based development is that all code is continually integrated (according to page 405 of Continuous Delivery). To learn more about trunk-based development, you may want to read these links:
Other Branching Strategies (with Long-Lived Branches)
A second option (of branching strategies) is to have branches. A branch to a repository can be ideal in a large team with a new developer. A new developer can thoroughly refactor code, making major changes, and commit the changes to a branch other than the main branch. Experimentation can be boundless with a branch that will be ignored. If the developer's workstation is lost, the changes can be safe in the repository.
Before all the parameter files and source code are merged (and thus overwrite) the existing files of what is currently in the main branch, the professional can create a pull request. This is a request which will usually require another person (e.g., a more tenured employee who knows the businesses coding conventions and idiosyncrasies) to approve the merge of code manually. A branch can prevail in a merge to the main branch if the pull request is approved (e.g., by at least one other person besides the original committer, but it depends on the configuration of the repository). Pull request approval requirements vary from group-to-group.
Companies with high turnover and large numbers of developers may prefer to have branches. The branches do not have to be deployable during the development process. Code repositories have archived versions of the code. Some companies find a formal code branch merging approval process tedious for the developers. The DevOps Handbook admits that trunk-based development is controversial (on page 151); the book Continuous Delivery concurs that trunk-based development is controversial (on its page 36). In 2022, it is still common for enterprises to not use trunk-based development. An advantage of having long-lived branches is that you are not restricted to incremental development and some legacy code (or suboptimal code) may be tightly-coupled and lend itself to non-incremental refactoring.
Branching by Feature and/or Function
A branch may be created to isolate the development of a specific feature. Once the feature has been regression tested, the feature branch can be merged with the main branch. Early and deferred branching patterns can be readily associated with feature or function branching. Early branching and deferred branching have advantages and disadvantages (as page 390 of Continuous Delivery explains).
Branching by Component
Logical subportions of code or an application can be considered a component. These modules may support a non-functional requirement. Their definition is somewhat nebulous and subjective. However certain software systems may have numerous components, and branching for these units may enhance the teamwork associated with branching. Early and deferred branching patterns can be readily associated with component branching. Early branching and deferred branching have advantages and disadvantages (as page 390 of Continuous Delivery explains).
Branching by Story
In an agile environment, a developer may be given a user story. The story's content may be in the penumbra of a bug fix or a new feature. A user story may have nothing to do with a bug fix or a new feature. Some environments produce job duties wherein a developer may want to create a branch associated with a user story (e.g., a case number in a Jira system). It can be a good way to track precisely what is attempted to be solved or improved for posterity. These branches would be merged with the main branch after enough testing stages to ensure it was not released to production without confidence it was ready.
Branching by Environment or Platform Technology
A repository may have Dockerfiles, PowerShell scripts, Bash scripts, and other code that must run in certain pods or on servers with a particular operating system. Having branches for a given platform (e.g., Docker, Windows, or Linux), can allow developers to work in parallel and merge their code back into the main branch when they are ready. This branching strategy would lend itself to having different CI/CD pipelines for each branch; the pod or server doing the building/compilation would have to be appropriate for the architecture involved. Dependency fulfillment is crucial to testability.
Branching by Team (Organizational Branching)
A branch can be created for a team within a larger business. The members of the team can commit their code to this branch, and then the branch can be merged into the main branch. A team branch would likely have an owner to manage policies that govern who can check code into it (Continuous Delivery, page 414). For release cycles that are more than two weeks, you would likely have a CI/CD pipeline for each team branch; but be warned that having multiple developers making changes can be difficult to reconcile when it is time to do a Pull Request and merge the team branch with the main branch.
Branching by Release (x.y Releases That Never Merge with the Main Branch)
Some code is used by various departments or customers. For non-hosted code a development team may want to have different branches for different versions of the software. The branch names can be that of the versions (e.g., 1.1, 1.2, and 1.3). By keeping the branches unmerged, it is one of the few times a long-lived branch is advisable to maintain. Continuous Delivery, on page 382, says that these branches are never merged with the main branch. You would likely have a CI/CD pipeline for each branch if you use this strategy.
Git Flow (a Branching Strategy)
One common way to have branches is to use Git Flow. Git Flow is more commonly used than trunk-based development. Git Flow is a strategy where you have five different branches (according to this external site): main, develop, feature, release and hotfix. Some of the non-main branches will receive merges from the main branch with the main branch's changes prevailing. That is, the Git administrator will merge changes from the main branch to a given feature branch. This way your organization can keep working on a feature, and this non-main feature branch can receive updates such as critical hotfixes. Git Flow is considered a set of guidelines (rather than firm rules), according to this video. You can read more about the details here.
A disadvantage of branching is that code can not go through the CI/CD process until you merge to main unless you have a pipeline for each branch. Another disadvantage is that merging code is inherently error prone (according to page 400 of Continuous Delivery).
Ideally you will have confidence that a repository has production readiness from automated testing in lower environments. But automated testing has its limitations and lower environments are often not exactly identical to production. Having the computing resources for a different CI/CD pipeline for each branch can be expensive. Moreover having identical lower environments to the production environment is also an expense that some companies do not want to pay for.
The benefits of having a shared single source of truth are many (and described in detail in The DevOps Handbook on pages 290 through 292). To generalize the benefits of an accessible repository that numerous professionals can work on, we can see how "trunk-based development" has its place (without extra branches which can be confusing for parallel development of code). A code commit to a repository could be the impetus for the CI/CD process. With automated testing following a triggering event, the promotion of the code to higher environments can happen safely. You can have more relevant testing, in theory, with trunk-based development. Merge conflicts should be avoided with commits to the mainline.
Having long-lived branches can be an alternative that can be an efficient way for programmers to collaborate. A developer can merge code changes to a number of different files with an approved pull request. The submission of a pull request will send a notification that a request for approval of the code changes has been made. The approver will see the changes via the diffs. This can be an efficient way for team-based development with clear ownership of code before it is integrated.
Code versioning systems support parallel collaboration of different individuals. The concurrent coding efforts can be ultimately merged (unless a release branch remains a release branch or unless you commit code all to the main branch). Code versioning systems record a history of revisions also known as changesets. Revision history of a merge will display a left and right revision. A merge or potential merge can have conflicts described as text-based, semantic, or syntactic. (To learn more about these types of conflicts, see this external posting or pages 367-369 of Elements of Programming Interviews in Python.) If new code introduces a flaw, a developer should be confident that recovery is possible by retrieving a previous version. This confidence can conduce to frequent commits or merges.
One tactic with a local copy of a Git repository is the need to do a "rebase." When someone does a rebase, changes that they did not make from the remote repository are brought down to the local copy. This action is collaborative because it allows a programmer to keep developing in a workspace while not diverging from the remote, central Git repository. The integration of code (merging of two or more developers) can involve conflict resolution. Rebasing periodically enables individuals to defer upstream conflicts while working on an up-to-date code base. Rebasing could readily happen with trunk-based development or when using a branch by team strategy. The programmer can then merge her changes upstream when she is ready to contribute at a time when it would maximize productivity.
Different branching strategies have their place depending on the environment, the team size, the goals, its discipline and workload toward the CI/CD process, and how formal the approval process is for pull requests and releases to production. Defining a branching strategy can help you communicate with your team as well as govern team behavior. Having carefully designed methods can help you rapidly evolve a software product and handle merge conflicts near a release. Team productivity as opposed to individual productivity is usually the goal. You want to be confident that your code has production readiness at the time of the release and minimize untimely merge conflicts. Whatever paradigm you choose, consider the implications carefully.
For Further Reading / Miscellaneous
The DevOps Handbook and Continuous Delivery were both co-authored by Jez Humble; he seems to have had a smaller role in The DevOps Handbook, but a substantial role in Continuous Delivery.
To learn more about Git branching strategies, read these articles:
This git command "git bisect" is relevant to reconciling different branches:
To view a list of Git books, see this page.
To learn more about branching strategies in general, see these links:
Remember that some code changes can have a problem that is not captured until run-time. Code may compile fine, but fail when it is run. So you must have a CI/CD pipeline with automated testing of the final product; static code analysis and code linting are not enough.
For resolving conflicts of merge attempts, with older non-git technologies, there was a question about optimistic and pessimistic locking techniques. In 2022, we tend to think that these locking techniques no longer come into play for resolving code merge conflicts.
To disambiguate multiway branching in coding from code versioning branching strategies, we want to assert that conditional logic in programming is completely separate. Code versioning systems can support branches of directories of files, branching in coding (such as multiway branching) can refer to if/then logic or other arithmetic-based flow control for processing a program.