Author: Michael Misiewicz, Director, Data Science
Product: Answers
Blog Date: August 2021
Building a great algorithmically-driven product requires a lot of data. You can (and almost certainly must) get some of this data via human labeling, but a great way to really make huge improvements on your algorithms is to get them out in the wild, and measure where they were wrong, so they can be retrained!
The best algorithmic product companies all do this at a large scale:
The more you look, the more you see these cycles in many products all around you, especially the ones that seem to have the most magical or impressive algorithmic features. Further, if you look at many of the algorithms underpinning many popular machine learning models, you'll also see this same pattern repeat everywhere. Random forests, gradient boosted trees, neural network optimizers, and reinforcement learning all use this fundamental concept.
We want to make Yext Answers the best it can be, so how do we go about implementing some of these training loops? Well, for the purposes of fitting a single model to a dataset there are incredibly powerful but also pretty well modularized methods. But how do we do it on a higher level of complexity, that is across a number of systems, functions, features and teams? There's a few objectives we need to optimize.
It's no coincidence that the companies that are most successful in developing algorithmic products also make huge investments in optimizing the 3 points above. If you look at Google's track record of accomplishments (BigTable, Spanner, Borg, Kubernetes, TensorFlow, Tensor Processing Units, etc.), you can see the advanced features which become possible with great infrastructure.
Concretely, what are the data science, engineering, and product teams at Yext doing to speed up the virtuous cycle of machine learning?
Thanks to this continued investment, I am confident that Yext Answers will continue to expand and improve. We're doing large amounts of R&D work to improve quality, from making it easier to add content to the Knowledge Graph, understand query intent better, and improve and expand direct answers.