Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Machine learning for Java developers

Gregor Roth | Sept. 18, 2017
Set up a machine learning algorithm and develop your first prediction function in Java.

machine learning fig3

In this case, although the cost will no longer decrease significantly after 500 to 600 iterations, the target function is still not optimal; it seems to underfit. In machine learning, the term underfitting is used to indicate that the learning algorithm does not capture the underlying trend of the data.

Based on real-world experience, it is expected that the the price per square metre will decrease for larger properties. From this we conclude that the model used for the training process, the target function, does not fit the data well enough. Underfitting is often due to an excessively simple model. In this case, it's the result of our simple target function using a single house-size feature only. That data alone is not enough to accurately predict the cost of a house.

Adding features and feature scaling

If you discover that your target function doesn't fit the problem you are trying to solve, you can adjust it. A common way to correct underfitting is to add more features into the feature vector.

In the housing-price example, you could add other house characteristics such as the number of rooms or age of the house. Rather than using the single domain-specific feature vector of { size } to describe a house instance, you could usea multi-valued feature vector such as { size, number-of-rooms, age }.

In some cases, there aren't enough features in the available training data set. In this case, you can try adding polynomial features, which are computed by existing features. For instance, you could extend the house-price target function to include a computed squared-size feature (x2):

f6 linear regression. with polynomial features

Using multiple features requires feature scaling, which is used to standardize the range of different features. For instance, the value range of size2 feature is a magnitude larger than the range of the size feature. Without feature scaling, the size2 feature will dominate the cost function. The error value produced by the size2 feature will be much higher than the error value produced by the size feature. A simple algorithm for feature scaling is:

f7 feature scaling

This algorithm is implemented by the FeaturesScaling class in the example code below. The FeaturesScaling class provides a factory method to create a scaling function adjusted on the training data. Internally, instances of the training data are used to compute the average, minimum, and maximum constants. The resulting function consumes a feature vector and produces a new one with scaled features. The feature scaling is required for the training process, as well as for the prediction call, as shown below:


Previous Page  1  2  3  4  5  6  7  8  9  10  11  Next Page 

Sign up for Computerworld eNewsletters.