Introduction to the MathOverflow Tag Recommendation Problem
Here is the first paragraph of a recent post on the front page of MathOverflow:
More …Here is the first paragraph of a recent post on the front page of MathOverflow:
More …We take a look at the data which comes from the quarterly Stack Exchange data dump. We explore the data to understand how it is structured and clean the data.
More …Most train/valid/test split tools are not optimized for multilabel problems. The tool MultilabelStratifiedShuffleSplit from iterstrat.ml_stratifiers (see the github page) implements the algorithm from Konstantinos Sechidis, Grigorios Tsoumakas & Ioannis Vlahavas (2011).
More …Andrej Karpathy makes a distinction between what he calls software 1.0 and software 2.0. Software 1.0 consists of explicit instructions for transforming inputs into desired outputs. Software 2.0 is machine learning: we provide a model with a ton of parameters and minimize a loss function. The trained model then transforms inputs into desired outputs in a way which performs well on the training data, and which (we hope!) will generalize to novel data.
More …We summarize the work done in this Colab notebook.
More …