We begin by building a model of the product catalogue and training our algorithm on the model
Once complete, the model resembles the central nervous system of a human brain. It encompasses a multi-layer system of interconnected "neurons".
Each neuron in the model has a set of numeric feature weights that adjust based on experience (i.e. new inputs). This makes the model capable of learning.
The features that are weighted for each neuron represent all the brands, categories and product attributes found in the catalogue metadata.
During training, we loop all products through the model. Each product input will match one neuron, and both the product and neuron will remember that match forever.
Like a neuron, each product is made up of a set of feature weights. Therefore, we locate a matching neuron by comparing its feature weights to that of the product input.
A match occurs when the product and the neuron have the same, or very similar, feature weights.
When the first search is made, we do not have any behavioural data yet. Instead, we rely on basic text-matching to produce a list of products that contain the search query in their metadata.
Each product in the list is given a score based on where that word is located in the metadata and how frequently it is found (i.e. higher score when word is found in the product title versus the description).
Before giving the visitor search results, the list of products found through text-matching is passed over to our Machine Learning algorithm for further analysis.
Our Machine Learning algorithm takes the list found through text-matching and locates the neurons that match the products in the list.
If all the products in the list don't match the same neuron, or if the matched neurons are not from the same cluster (i.e. clusters of neurons share similar feature weights), then our algorithm will divide the list up into groups of products that match the same neuron and/or neuron cluster.
We then take a random sample of products from each of the newly created groups and present them as results to the visitor.
The sample distribution and ranking of results are set by the text-matching scores (i.e. larger samples will be taken from the highest scoring groups).
This allows our engine to present the visitor with a range of relevant direct results, without letting one specific category or product type dominate.
Group 1: 37%
Group 2: 22%
Group 3: 10%
As soon as a visitor interacts with the search results (e.g. clicks, add-to-cart, purchases), we can begin to refine results according to behavioural data.
So the next time a visitor searches with the same or similar query, past behaviour will determine which products best represent the intended meaning of the query.
Over time, we will no longer rely on text-matching for the list of products our Machine Learning algorithm will process. Instead, we will refer entirely to past visitor behaviour.
Updated probability distribution (using behavioural data):
Group 1: 68%
Group 2: 15%
Group 3: 5%
Eventually, there will be enough behavioural data that a random sampling of products will no longer be needed to establish relavance.
In addition to direct results, our engine generates a list of related results. These results have no connection to the actual search query, but are relevant because their feature weights resemble those of the direct results.
As stated above, when a product and neuron have identical feature weights they are considered a match.
While each product can only match one neuron, each neuron can match several products.
The products that match the same neurons as the products found through text-matching and/or behavioural analysis are what supply the list of related results.
The products listed under related results are similar to the products listed under direct results.
They are similar because they match the same neurons and therefore have similar feature weights.
These results often relate to the direct results in one of three ways: