David Longstreet
  • Home Page
  • Data Science
  • Articles
    • Data Science >
      • Data Science Projects
      • Machine Learning
      • Logistic Regression
      • Regression (for the non data scientist)
      • Polynomials
      • Modeling Steroid Usage in MLB
      • Building Blocks of Data Science
    • Leadership >
      • Leadership
  • MyBookSucks
  • Art and Data
    • Art of Data
    • Voronoi Diagrams >
      • Voronoi As Art >
        • Voronoi Explained
        • Shades of Grey
  • About David
    • Short Bio
    • Client List
    • Learning Piano
    • Resume
    • Contact Me

Sum of Squares

One of the most common formulas in data science and machine learning is the sum of errors squared.  The sum of square of errors is nothing more than taking the difference between actual and the estimate and then squaring it.  This is the same basic formula that is used when calculating standard deviation.  It is also the the formula for the Pythagoras theorem.   In middle school math you learned this formula as the distance formula.

The distance formula allows you to find the distance between any two points.  The distance formula works along the x axis, the standard Cartesian coordinate system (x, y), in 3 dimensions (x, y, z), or in any dimension.  This basic idea of distance between two points is something that is used over and over again in data science.  Keep in mind that distance is nothing more than actual-estimated. 

A Cartesian graph showing distance formula goes here.

Distance is squared because of the way the estimated line is derived.  The estimated line is calculated based upon "average" values.  In other words, on average this is the best possible line.  There are some points above and some points below but on average this is the best possible fit.  Points above the line would be positive distance and points below the line are negative distances.   By definition, all theses distances added together sum to zero.   Since there is no such thing as a negative distance,  the distances are squared to get rid all the negative signs.  The squaring is done by convention and there are other ways to get rid of a negative sign (like absolute values).

Show a graph with a line  positive and negative distances.

Links to some videos.
​

Passions and Professionalisms 


  • Home Page
  • Data Science
  • Articles
    • Data Science >
      • Data Science Projects
      • Machine Learning
      • Logistic Regression
      • Regression (for the non data scientist)
      • Polynomials
      • Modeling Steroid Usage in MLB
      • Building Blocks of Data Science
    • Leadership >
      • Leadership
  • MyBookSucks
  • Art and Data
    • Art of Data
    • Voronoi Diagrams >
      • Voronoi As Art >
        • Voronoi Explained
        • Shades of Grey
  • About David
    • Short Bio
    • Client List
    • Learning Piano
    • Resume
    • Contact Me