One of the most common formulas in data science and machine learning is the sum of errors squared. The sum of square of errors is nothing more than taking the difference between actual and the estimate and then squaring it. This is the same basic formula that is used when calculating standard deviation. It is also the the formula for the Pythagoras theorem. In middle school math you learned this formula as the distance formula.
The distance formula allows you to find the distance between any two points. The distance formula works along the x axis, the standard Cartesian coordinate system (x, y), in 3 dimensions (x, y, z), or in any dimension. This basic idea of distance between two points is something that is used over and over again in data science. Keep in mind that distance is nothing more than actual-estimated.
A Cartesian graph showing distance formula goes here.
Distance is squared because of the way the estimated line is derived. The estimated line is calculated based upon "average" values. In other words, on average this is the best possible line. There are some points above and some points below but on average this is the best possible fit. Points above the line would be positive distance and points below the line are negative distances. By definition, all theses distances added together sum to zero. Since there is no such thing as a negative distance, the distances are squared to get rid all the negative signs. The squaring is done by convention and there are other ways to get rid of a negative sign (like absolute values).
Show a graph with a line positive and negative distances.