Number Factoring, Part 1
I recently started working on improving some code I wrote in college, and like many software projects, hit a wall in trying to improve the thing I originally set out to do. This post talks about the struggle, and the struggle is real.
When I was starting out in college, I had to take a bunch of remedial math courses. I wound up basically re-taking all the algebra from about 10th grade forward in college. I was also getting started in computer programming at the time in C#. At some point, we started talking about factoring numbers, as part of finding the Greatest Common Factor (as I mentioned, I was taking remedial maths). I was also working full-time, had other courses, and like many college students, wanted to find a way to do my homework faster. I wasn’t content to simply make doing homework faster, though: I wanted to advance my literacy as a programmer as well. Number factoring was an easy enough challenge, and had the secondary benefit of helping me solve those homework problems faster (though I probably spent more time on writing the code than I saved with the result). I came up with a very simple algorithm, which worked quite well for all the numbers I was dealing with. Here’s a simple version written in C#:
public struct FactorPair {
public FactorPair(int low, int high) {
Low = low;
High = high;
}
public int Low { get; }
public int High { get; }
}
public static class FactoringHelper {
public List<FactorPair> GetFactorsOf(int value) {
// Return value.
var factors = new List<FactorPair>();
// 1 and self are always factors.
factors.Add(new FactorPair(1, value));
// Skip 1 and value, because we added them above.
for (var i = 2; i < value; i++) {
for (var j = value - 1; j > i; j--) {
if (value % (i * j) == 0) {
factors.Add(new FactorPair(i, j));
}
}
}
return factors;
}
}
Note: I haven’t employed many of the language features of modern C# here. There are all sorts of ways we could improve this example, starting with some error handling: the example is meant only to be illustrative.
The problem with this code is that there are lots of optimizations available to us. This algorithm is horrendously inefficient, running in [quadratic time]. It’s fairly trivial to do much, much better, by taking advantage of the fact that any factor of an integer will be either a prime number, or a multiple of a prime number. This optimization step is part of what’s called prime factoring). Once we have the list of prime factors, we can easily derive all the other factors. We can do so in far less time, in the general case, than we could ever hope to using the algorithm I came up with in college.
It is worth mentioning that the algorithm I came up with in college will work fine for numbers smaller than about 100,000. The algorithm is space-efficient. The biggest problem with the algorithm I came up with was that it wasn’t time efficient. If the algorithm employed a prime factoring strategy, I could drastically decrease the time complexity, but to maximize those gains, it would also be a good idea to cache the set of primes used for factoring numbers. There are two essential ways this could be done:
- By storing an enormous list of prime numbers at compile-time, and then using that pre-computed list at runtime
- By adding code to compute the list of prime numbers, and then caching the computed list.
I personally prefer the second option, because there are potentially further optimizations that can be made (such as computing a segmented list of prime numbers, thus never computing more than the number required). Such an approach would also allow us to lazily compute additional primes “as needed,” but take advantage of the computations later on during the program’s life cycle.
Of course, it all depends on your use-case: if your program’s only job is to factor numbers for your users, then there’s no point in wasting time on computing prime numbers when you can just store them in either your program’s data segment, or from an external file, database, or web service. It really depends on your objectives, and what trade-offs you’re willing to make.
To wrap this post up, here’s a table of factors of the number 68. I’ve only included factor pairs, but there’s enough information to get an idea of what results the algorithm should return. In a future post, I’ll elaborate on how I have refactored the original algorithm. It will be less space-efficient, but will drastically improve the running time.
1 | 68 |
2 | 34 |
4 | 17 |
I hope you enjoyed this post! I’ve enjoyed the brief trip down memory lane, and look forward to writing more about this problem.
- Brian