Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Machine learning for Java developers

Gregor Roth | Sept. 18, 2017
Set up a machine learning algorithm and develop your first prediction function in Java.


// the theta vector used here was output of a train process
double[] thetaVector = new double[] { 1.004579, 5.286822 };
LinearRegressionFunction targetFunction = new LinearRegressionFunction(thetaVector);

// create the feature vector function with x0=1 (for computational reasons) and x1=house-size
Double[] featureVector = new Double[] { 1.0, 1330.0 };

// make the prediction
double predictedPrice = targetFunction.apply(featureVector);

The target function's prediction line is shown as a blue line in the chart below. The line has been computed by executing the target function for all the house-size values. The chart also includes the price-size pairs used for training.

machine learning fig2

So far the prediction graph seems to fit well enough. The graph coordinates (the intercept and slope) are defined by the theta vector { 1.004579, 5.286822 }. But how do you know that this theta vector is the best fit for your application? Would the function fit better if you changed the first or second theta parameter? To identify the best-fitting theta parameter vector, you need a utility function, which will evaluate how well the target function performs.

 

Scoring the target function

In machine learning, a cost function (J(θ)) is used to compute the mean error, or "cost" of a given target function.

f3 costl function

The cost function indicates how well the model fits with the training data. To determine the cost of the trained target function above, you would compute the squared error of each house example (i). The error is the distance between the calculated y value and the real y value of a house example i.

For instance, the real price of the house with size of 1330 is 6,500,000 €. In contrast, the predicted house price of the trained target function is 7,032,478 €: a gap (or error) of 532,478 €. You can also find this gap in the chart above. The gap (or error) is shown as a vertical dotted red line for each training price-size pair.

To compute the cost of the trained target function, you must summarize the squared error for each house in the example and calculate the mean value. The smaller the cost value of J(θ), the more precise the target function's predictions will be.

In Listing 3, the simple Java implementation of the cost function takes as input the target function, the list of training records, and their associated labels. The predicted value will be computed in a loop, and the error will be calculated by subtracting the real label value.

Afterward, the squared error will be summarized and the mean error will be calculated. The cost will be returned as a double value:

 

Previous Page  1  2  3  4  5  6  7  8  9  10  11  Next Page 

Sign up for Computerworld eNewsletters.