Friday, January 2

What Advent of Code Has Taught Me About Data Science


in the Advent of Code, a series of daily programming challenges released throughout December, for the first time. The daily challenges usually contain two puzzles building on a similar problem. Even though these challenges and problems do not resemble typical data science workflows, I have realized that many of the habits, ways of thinking, and approaching problems that they encourage can be translated surprisingly well to data-focused work. In this article, I reflect on five learnings that I got from following the Advent of Code challenge this year and how they translate to data science.

For me, Advent of Code was more of a controlled practice environment for revisiting fundamentals and working on my programming skills. You are focusing on the essentials as distractions that you would face in a day-to-day job are not present; you have no meetings, shifting requirements, stakeholder communication, or coordination overhead. Instead, you have a feedback loop that is straightforward and binary: your answer is correct or it is not. There is no “almost correct”, no way of explaining the outcome, and no way of selling your solution. At the same time, you have the freedom and flexibility to choose any approach you see fit as long as you can arrive at a correct solution.

Working in such a setting was quite challenging, yet very valuable as it also exposed habits. Given that you have very little room for ambiguity and cannot hide your mistakes, any flaws in your work were exposed immediately. Over time, I also realized that most of the failures I encountered had little to do with syntax, algorithm choice, or coding implementation but far more with the way how I have approached problems before touching any code. What follows are my key learnings from this experience.

Image created by author with ChatGPT

Lesson 1: Sketch the Solution – Think Before You Code

One pattern that surfaced often during Advent of Code was my tendency to go straight into implementation. When faced with a new problem, I was usually tempted to start coding immediately and try to converge to a solution as quickly as possible. Ironically, this approach often caused exactly the opposite. For example, I wrote deeply nested code to handle edge cases that inflated runtime of the code without realizing that a much simpler solution existed.

What eventually helped me was to take a step back before starting with the code. Instead, I started by noting requirements, inputs, and constraints. The process of noting this down helped me to get a level of clarity and structure that I had been missing when I jumped directly into the code. Furthermore, thinking about possible approaches, outlining a rough solution, or working on some pseudocode helped to formalize the needed logic even further. Once this was done, the act of implementing it via the code became a lot easier.

This learning can be translated to data science as many problems can be challenging due to unclear goals, poorly framed objectives, or because constraints, and requirements are not known well enough in advance. By defining desired outcomes and reasoning about the solution before starting to write code can prevent wasted effort. Working backward from the intended result instead of going forward from a preferred technology helps to keep the focus on the actual goal that needs to be achieved.

Learning 2: Input Validation – Know Your Data

Even after taking this approach of sketching solutions and defining the desired solution upfront, another recurring obstacle surfaced: the input data. Some failures that I experienced had nothing to do with faulty code but with assumptions about the data that I had made which did not hold in practice. In one case, I assumed the data had a certain minimum and maximum boundary which turned out to be wrong, leading to an incorrect solution. After all, code can be correct when seen in isolation, yet fail completely when it is working with data it has never been designed to work on.

This again showed why checking the input data is so crucial. Often, my solution did not need to be revamped entirely, smaller adjustments such as introducing additional conditions or boundary checks were enough to obtain a correct and robust solution. Furthermore, initial data investigation can offer signals about the scale of the data and indicate which approaches are feasible. When facing large ranges, extreme values, or high cardinality, it is very likely that brute-force methods, nested loops, or combinatorial approaches will hit a limit quickly.

Naturally, this is equally as important in data science projects where assumptions about data (implicit or explicit) can lead to serious issues if they remain unchecked. Investigating data early is a vital step to prevent problems from propagating downstream where they can get much harder to fix later. The key takeaway is not to avoid assumptions about data at all but rather to make them explicit, document them, and test them early on in the process.

Learning 3: Iterate Quickly – Progress Over Perfection

The puzzles in Advent of Code are usually split into two parts. While the second one often builds on the first one, it introduces a new constraint, challenge, or twist such as an increase in the problem size. The increase in complexity often invalidated the initial solution for the first part. Nonetheless, this does not mean that the solution to the first part is useless as it provides a valuable baseline.

Having such a working baseline helps to clarify how the problem behaves, how it can be tackled, and what the solution already achieves. From there on, improvements can be tackled in a more structured way as one knows which assumptions no longer hold and which parts must change to arrive at a successful solution. Refining a concrete baseline solution is therefore much easier than designing an abstract “perfect” solution right from the start.

In Advent of Code, the second part is only appearing after the first one is solved, thereby making early attempts to find a solution that works for both parts pointless. This structure reflects a constraint commonly encountered in practice as one usually does not know all requirements upfront. Trying to anticipate all the possible extensions that might be needed in advance is not only largely speculative but also inefficient.

In data science, similar principles can be observed. As requirements shift, data sources evolve, and stakeholders refine their needs and asks, projects and solutions have to evolve as well. As such, starting with simple solutions and iterating based on real feedback is far more effective than attempting to come up with a fully general system from the outset. Such a “perfect” solution is rarely visible at the beginning and iteration is what allows solutions to converge toward something useful.

Learning 4: Design for Scale – Know the Limits

While iteration emphasizes to start with simple solutions, Advent of Code also repeatedly points out the importance of understanding scale and how it affects the approach to be used. In many puzzles, the second part does not simply add logical complexity but also increase the problem size dramatically. Thus, a solution with exponential or factorial complexity may be sufficient for the first part but start to become impractical when the problem size grows in the second part.

Even when starting with a simple baseline, it is crucial to have a rough idea of how that solution will scale. Nested loops, brute-force enumeration, or exhaustive searches of combinations signal that the solution will stop working as efficiently when the problem size grows. Knowing the (approximate) breaking point therefore makes it easier to gauge if or when a rewrite is necessary.

This does not contradict the idea of avoiding premature optimization. Rather, it indicates that one should understand the trade-offs a solution makes without having to implement the most efficient or scalable approach right away. Designing for scale means having an awareness of scalability and complexity, not having to optimize blindly from the start.

The parallel to data science is also given here as solutions may work well on sample data or limited datasets but are prone to fail when faced with “production-level” sizes. Being conscious of these bottlenecks, recognizing likely limits and keeping alternative approaches in mind makes these systems more resilient. Knowing where a solution could stop working can prevent costly redesigns and rewrites later, even if they are not implemented immediately.

Learning 5: Be Consistent – Momentum Beats Motivation

One of the less obvious takeaways from participating in the Advent of Code had less to do with problem solving and much more with “showing up”. Solving a puzzle every day sounds manageable in theory but in practice was challenging, especially when it collided with fatigue, limited time, or a decline in motivation, especially after a full day of work. Hoping for motivation to magically reappear was therefore not a viable strategy.

Real progress came from working on problems on a daily basis, not from occasional bursts of inspiration. The repetition reinforced ways of thinking and disentangling problems which in turn created momentum. Once that momentum was built, progress began to compound and consistency mattered more than intensity did.

Skill development in data science rarely comes from one-off projects or isolated deep dives either. Instead, it is resulting from repeated practice, reading data carefully, designing solutions, iterating on models, and debugging assumptions done consistently over time. Relying on motivation is not viable, but having fixed routines makes it sustainable. Advent of Code exemplified this distinction: while motivation fluctuates, consistency compound. Having such a daily structure helped to turn solving puzzles into a habit rather than an aspiration.

Image generated by author with ChatGPT

Closing Thoughts

Looking back at it, the real value that I derived from participating in Advent of Code was not in solving single puzzles, learning some new coding tricks but instead it was from making my habits visible. It highlighted where I tend to rush to solutions, where I tend to overcomplicate and where slowing down and taking a step back would have saved me a lot of time. The puzzles as such were only a means to an end, the learnings I got out of them were the real value.

Advent of Code worked best for me when seen as deliberate practice rather than as a competition. Showing up consistently, focusing on clarity over cleverness and refining solutions instead of chasing perfect solutions from the start turned out to be far more valuable than finding a single solution.

If you have not tried it yet yourself, I would recommend giving it a shot, either during the event next year or by working through past puzzles. The process quickly surfaces habits that carry over beyond the puzzles themselves. And if you enjoy tackling challenges, you will most likely find it a genuinely fun and rewarding experience.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *