Contributing to PostgreSQL
This post outlines my experience with becoming more involved with the PostgreSQL development process. Having even a tiny part in reporting and fixing a bug was exciting and rewarding. Turns out, though, there's quite a steep learning curve to get involved. The Postgres project isn't small and things move fast, making it easy to get lost. I have learned a lot and am looking forward to contributing more to the PostgreSQL project and community.
Getting involved
I got involved with the development of PostgreSQL
after years of frustration with a particular headache related to
QGIS layer styles and pg_dump
(fixed!).
Granted,
the workaround
wasn't that big of a deal but did add
unnecessary complexity and a point of confusion.
I dug into the source code to find why that workaround was required,
and that led me to find a case where the workaround no longer worked.
Now it wasn't just a hassle, it
was a bug 🐛.
Fast forward a few months and I had decided to try my hand at fixing the bug... Lucky for me, the fix had already been developed and all that I needed to do was test my use case and ensure the bug I could test for was indeed fixed. I also reviewed a bunch of XML-related documentation, learning far more about XML specs than I wanted to know!
Community reflection
The Postgres community is discussing (again?) the future of the development platform and process. The best I can tell, this round of discussion started with @wbolster's thread on Twitter.
#postgresql @PostgreSQL 🐘 is among the most popular and loved database technologies. but sadly the flow for contributing is extremely archaic and unwelcome for unclear reasons. thread.
— wouter bolsterlee (@wbolster) April 12, 2019
Since this thread, I have seen discussions and polls pop up on social media. The results of the polls are summarized on the Postgres Conference blog.
Room for improvement
There is room for improvement. I say this from my perspective from 8 years of experience with Postgres and the desire to be more involved in the growth and future of the project. Addendum 2 from the above thread hits home to me.
"addendum 2: i forgot to mention that #postgresql does NOT have a bug tracker. yes. NO bug tracker except for some ‘bug’ mailing list. for such a complex and large piece of software... i can't even 😶"
This seems like an easy win. A mailing list is not a bug tracker. A mailing list with filename-versioned patches is not a modern, collaborative development environment. Most of the discussion I have seen have related to these main points.
Reason 1: Staying in the loop
As a person interested in a specific patch, I had a tough time keeping myself in-the-loop and involved. As mentioned in the Twitter thread, everything happens on the mailing lists and those lists have dozens (or hundreds) of emails per day. The commitfest record I was involved with had three (3) different email threads associated with it. I managed to follow one of those three threads (33%) fairly well. Though, fairly well of 33% is not encouraging. At one point Chapman Flack even wrote:
"There might be too many different email threads on this with patches, but in case it went under the radar..."
Yeah, there were too many email threads. Especially for me as a novice in this process. One of those three email threads was started by me, because I didn't know the fix was already in progress. I didn't know it was in progress because a mailing list is not a bug tracker.
Reason 2: Good tools avoid wasted time
I spent (wasted) a bunch of time trying to
apply patches that had already been merged into the stable branches.
I didn't know the patches had already been merged into STABLE
branches
because there is not an easy way to see that from the
commitfest/mailing-list tool.
That discussion happened somewhere in the middle of a flurry of emails
on one of those three mail threads... somehow I missed
absorbing every bit of information. Remember "fairly well of 33%"?
It's true, some of my wasted time was a direct result of my lack of experience with this process. I succumbed to impostor's syndrome thinking I was "missing something" or "doing something wrong." That caused me to miss the obvious clues and the true error. That said, I believe my wasted time could have been avoided or reduced with better tooling. Having a coherent view of what's happened and what is outstanding is a good thing.
Barriers to contributing do not look good in the ROI calculations. As J.D. Drake wrote, corporations interested in Postgres do care about their ROI.
"Those corporations will not contribute as much directly to PostgreSQL if the cost to benefit analysis is a net negative."
Reason 3: Knowing what is fixed
After the bug was fixed, I wanted to ensure the bug report indicated this. Too bad, it's just an item in the mailing list. There is no "status" to update. There are no links to the real discussions about how this was fixed. Those discussions are buried within the various email threads on the mailing list. Anyone who finds that bug report on the mailing list (via web searches, etc.) is left wondering what happened to that issue.
For a project of this magnitude, this seems... odd. To reuse a quote from earlier...
"...for such a complex and large piece of software... i can't even 😶"
Excited for what's next
I have barely started contributing to the PostgreSQL project and am excited to continue down this path. I intend to stay involved in the development process, hopefully in increasing amounts. I do hope to see the project move to a more modern development process, though I don't expect seismic shifts or for the change to happen at lightning speed. Obviously, there are aspects of this process that work well, otherwise Postgres wouldn't be the awesome platform it is today!
The survey results show a strong trend (75%) wanting to move to GitHub (or similar), but how many of those are current contributors? There are legitimate arguments against migrating for the sake of migrating, too.
You know what else? 99.9% of the work is still done by very few people. The hoards didn't show up. The real hard work which is to make sure that your patch doesn't break shit. I'm not convinced that making it easier to send patches in helps.
— Dave Cramer (@dave_cramer) May 13, 2019
Summary
This process has only increased my enthusiasm for the PostgreSQL project and for open source projects in general. I had a pain-point, I investigated why it existed, and that led me to report a bug. That resulted in me reviewing a patch that included the fix to my bug. The patch I reviewed also included a lot of improvements to the PostgreSQL documentation regarding its handling of XML data. Improvements to the documentation are immeasurably important. The sum of these improvements is now available to all Postgres users. Everyone wins, cool!
Can the process be better? Of course! Can you name a process that can't be improved?
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!