Most estimation procedures involve finding parameters that minimize (or maximize) some objective function. For example, with OLS, we minimize the sum of squared residuals. With Maximum Likelihood Estimation, we maximize the log-likelihood function. The difference is trivial: minimization can be converted to maximization by using the negative of the objective function.
Sometimes this problem can be solved algebraically, producing a closed-form solution. With OLS, you solve the system of first order conditions and get the familiar formula (though you still probably need a computer to evaluate the answer). In other cases, this is not mathematically possible and you need to search for parameter values using a computer. In this case, the computer and the algorithm play a bigger role. Nonlinear Least Squares is one example. You don't get an explicit formula; all you get is a recipe that you need to computer to implement. The recipe might be start with an initial guess of what the parameters might be and how they might vary. You then try various combinations of parameters and see which one gives you the lowest/highest objective function value. This is the brute force approach and takes a long time. For example, with 5 parameters with 10 possible values each you need to try
combinations, and that merely puts you in the neighborhood of the right answer if you're lucky. This approach is called grid search.
Or you might start with a guess, and refine that guess in some direction until the improvements in the objective function is less than some value. These are usually called gradient methods (though there are others that do not use the gradient to pick in which direction to go in, like genetic algorithms and simulated annealing). Some problems like this guarantee that you find the right answer quickly (quadratic objective functions). Others give no such guarantee. You might worry that you've gotten stuck at a local, rather than a global, optimum, so you try a range of initial guesses. You might find that wildly different parameters give you the same value of the objective function, so you don't know which set to pick.
Here's a nice way to get the intuition. Suppose you had a simple exponential regression model where the only regressor is the intercept:
The objective function is
With this simple problem, both approaches are feasible. The closed-form solution that you get by taking the derivative is
. You can also verify that anything else gives you a higher value of the objective function by plugging in
instead. If you had some regressors, the analytical solution goes out the window.