TECH

How to better organize your git commits

blogpost

Git is an admirable piece of software. It might be the best one I've seen so far in my programming career. I think it's one of those that I'm impressed the most because very often when I do learn something new it leads me to another new thing. This constant process of exploration happens very naturally to me in my daily workflow and during free-time programming workouts cause I use git as my version control system in all my projects so far and honestly I find it kinda sad that a lot of people limit their work with git to a very basic subset of commands. Amongst many programmers, younger or not, that I met personally or just happen to read about their workflow online I see a lack of interest in git besides basic commands, used for the basic workflow. It’s nothing bad, but I think it’s kind of a shame, and a wasted opportunity to learn something fun and useful.

This basic workflow with git many people limit themselves to goes as follows:

  • git clone and git init to get an existing project or create a new one

  • git branch branch_name* (or git checkout -b branch_name)

  • git rebase to update branches with some other branch changes

  • git pull to... well to pull from remote

  • git push to send your changes to remote

  • git merge - to merge...

  • some helpers command like status, diff, log

  • and obviously git add; git commit -m 'message' to stage your changes and commit them

Honestly, that usually is enough to work on a completely functional project and if none makes some mistakes you can go from nothing to a fully functioning git repository using only those commands. And if you and your team can make a new project and create a readable, sensible 'git tree' using only those commands then bravo, none made a mistake, none pushed an old branch with force, none merged merge conflicts, none rebased with the wrong branch, etc.

Anyway, I would like to encourage you to check out, and maybe use an interesting option passed into git add. I've learned about it recently and became really fond of it.

WHAT DOES add . EVEN DO?

We should probably quickly go over this, right? Without getting too much into the inner workings of git cause that's not why we are here git add . adds(duh!) changes you point to “.” indicates to everything that is currently changed but not staged from the directory we are currently in. So being in the main project directory will add all the stuff nested in it. You can also pass a certain file path) to a magical place called index. The index is this space where your changes go to, before becoming a real commit. When you commit, you take the index changes and make them into a commit. You may think of the index as a sort of checkpoint before actually doing something. In reality, git index is actually a binary file in a hidden git folder of your repo, but that is not very important right now.

WHY YOU SHOULD DIVIDE YOUR CHANGES TO MANY COMMITS

So imagine you just got a new task, and you estimate it as a task for at least 3-4 days of work. A lot of new files, lots of changes to existing ones, the task involves a lot of business changes and many logically separate components. If you are a good git worker, you will know that tasks like this should consist of many commits. Each of those commits should introduce a new business or logical change to your project. You want to push your changes in a way that will make it easy for people to type git log and then look at your last few commits and immediately know what was added from a business point of view, and why was it added this way (by commit description for example). If you commit two days of hard, intense work on a complex code that tackles a lot of problems it is very probable that this commit won't be very explicit or helpful. It might even prove confusing for other people. The way big batches of changes should be implemented is usually series of smaller commits (or at least one commit with a more explicit description if the changes are really tightly connected)

You may think this is obvious but believe me, it is not. I saw senior developers commit work worth of two weeks with a message of the date they started working on the feature. And oh my god, so many "hotfix" commits with a description of "fix2" or "fix bug_4". It's a really bad practice. I think that when we type in git log into terminal, and we scroll down a bit, we should get a good description of what has changed in our app, why it changed that way, etc. If we write our code in a way that makes it as explicit and understandable for others, why shouldn't we treat our git tree the same way? It's just plain English (or any other language) in those messages.

SO WHAT DOES add --patch DO?

When you use git add . everything is done in the blink of an eye, but git add --patch will take a while, and give you more to do (hurray, more work with git, right?). Alright, we will work on some legacy rails app, and do some mockup changes to it. Let's say we are working on a new branch, where we got a lot of different changes to do, they are all connected for the client that tasked us with the changes. They come from one ticket (task) and should all be deployed to production at the same time. But from our point of view, those changes are not that much connected. Let's see git diff of those changes.

So there's a change to a mailer that welcomes the new user to the system, there's a change in the validation of Post model, there's a change in some method of AuditLog model that probably affects its data, and finally, there's a change to PostController delete action (keep in mind those changes are just mocked, it does not matter they don't do much, what matters is that they represent different business and logical changes and can be displayed easily for me to explain).

We can have two approaches to situations like this. We could just git add .; git commit and then name the commit, and put in the description what exactly we changed. But for git messages to be easily readable they should be limited to 50 characters. Describing changes to 4 different concepts in 50 characters might prove difficult. Especially if we had some bigger changes that would take up more than 4 files.

Lets finally use add --patch

Okay so we only see one bit of change, the controller one. And we have a prompt. What is git asking us about, what is this whole hunk thing? Hunks are the bits of changed code you see separately when you use git diff. We will talk more about hunks and how git sees them soon. For now, let's just add it and go with the flow. Type y

Another hunk. Hmmm, let's not stage this one, okay? Type n

Again, we don’t want this hunk... We don’t want any other hunks added right now, we added one and we are happy with that. Type q to indicate that we should skip this and all other hunks that were not added yet. We can use d to only skip hunks from the current file.

Alright, we have 4 modified files, 1 of them is staged. Let's commit and see our logs.

Looks okay, right? Alright lets stage and commit all other changes with their own commits using git add --patch and see our final product.

Now, we have a more clear view of commits. You might ask yourself though, "well I will just plan my workflow correctly, do all business and logical changes together, commit them and move to next batch of connected changes". Sure it sounds nice and all, and if you can do that, then good for you. But reality verifies statements like this and other brave statements like "we will refactor this later". In reality, most of us when tasked with a new big feature and a limited portion of time, just jump right into it, and start building until we are finished. Create a new branch and work, work till its ready to be presented to others for code review. What's great about --patch is not only the ability to separate your work as logical batches (in the form of commits) but also a more thorough, step-by-step review of your code. Once you get the hang of it, it does not take that much time and might save you a lot of work later on. It will also help others to understand what is going on (code review by commits, not by the entire branch is easier when commits are separated nicely).

Read more on our blog

Check out the knowledge base collected and distilled by experienced
professionals.
bloglist_item
Tech

The buzzwords are Readability, Reusability, Maintainability. Here's the long version:

Modern web applications can grow in complexity. We often need to manage workflows more complex than simple...

bloglist_item
Tech

Over the years I had to deal with applications and system that have a long history of already being "legacy".
On top of that I met with clients/product owners that never want you to spend time ref...

bloglist_item
Tech

How many times have you searched for that one specific library that meets your needs? How much time have you spent customizing it to fit your project's requirements? I must admit, waaay too much. T...