Machine Learning
Machine learning is a science that is concerned with making computers work without human intervention. Machine learning is an important way to solve the problem of Data mining. This technology has enabled selfdriving cars, better web search, and a thorough understanding of human genome.^{[1]} Machine learning evolved from the fields of computer science, statistics, engineering, and mathematics and it requires this combination of skills to effectively apply it in problemsolving. For effective problemsolving computer programs that learn from data and improve with experience have to be developed. ^{[2]} In Machine learning, a branch of artificial intelligence is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. A learner can take advantage of data to capture characteristics of interest of their unknown underlying probability distribution.A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data. ^{[3]}
Supervised learning In this approach input data also referred to as training data has a known result e.g. email is spam or not. A model undergoes training where it makes predictions that are corrected when they are incorrect. This process is repeated up to the point when the level of accuracy is acceptable. Classification and regression are problems that can be solved this way. Logistic Regression and Neural Networks are examples of algorithmic examples of Supervised learning. ^{[4]}
Unsupervised learning In this approach there are no known results. A model is developed by drawing upon structures present in the data. Clustering and dimensionality reduction are some problems solved this way.
Semisupervised learning In this approach there are known and unknown results. The model has to learn structures present in the data and also make predictions.
Reinforcement learning In this approach loss function of the learning system is unclear. This is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. ^{[5]}
Contents
 1 Currently Hot topics
 2 Classification of Input Data Type
 3 Challenges
 4 A Few Useful Things to Know about Machine Learning
 5 Criticism
 6 Book list
 7 History
 8 Algorithms
 9 Deep Learning
 10 Companies providing machine learning software
 11 Examples of Machine Learning
 12 Top 5 Recent Tweets
 13 Reference
Currently Hot topics
(1). Deep learning seems to be getting the most press right now. It is a form of a Neural Network (with many neurons/layers). Articles are currently being published in the New Yorker and the New York Times on Deep Learning.
(2). Combining Support Vector Machines (SVMs) and Stochastic Gradient Decent (SGD) is also interesting. SVMs are really interesting and useful because you can use the kernel trick to transform your data and solve a nonlinear problem using a linear model (the SVM). A consequence of this method is the training runtime and memory consumption of the SVM scales with the size of the data set. This situation makes it very hard to train SVMs on large data sets. SGD is a method that uses a random process to allow machine learning algorithms to converge faster. To make a long story short, you can combine SVMs and SGD to train SVMs on larger data sets (theoretically).
(3). Because computers are now fast, cheap, and plentiful, Bayesian statistics is now becoming very popular again (this is definitely not "new"). For a long time, it was not feasible to use Bayesian techniques because you would need to perform probabilistic integration by hand (when calculating the evidence). Today, Bayesist are using Monte Carlo Markov Chains, Grid Approximations, Gibbs Sampling, Metropolis Algorithm, etc.
(4). Any of the algorithms described in the paper "Map Reduce for Machine Learning on a Multicore". This paper talks about how to take a machine learning algorithm/problem and distribute it across multiple computers/cores. It has very important implications because it means that all of the algorithms mentioned in the paper can be translated into a mapreduce format and distributed across a cluster of computers. Essentially, there would never be a situation where the data set is too large because you could just add more computers to the Hadoop cluster. This paper was published a while ago, but not all of the algorithms have been implemented into Mahout yet.
(5). Anomaly detection. ^{[6]} Anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions
Classification of Input Data Type
Data comes in three formats:
 Structured data is organized in a way that both computers and humans can read. The most obvious example is a relational database.
 Semistructured data, which includes XML, email and electronic data interchange (EDI), lacks such formal structure but nonetheless contains tags that separate semantic elements. Semistructured data does not have a form but does have tags that help create structuredlike hierarchies.
 Unstructured data refers to data types, including images, audio, and video, which are not part of a database. It has no clear or consistent structure and no formal data model (FDM). Unstructured data is created by new data sources, many of which did not even exist at the dawn of the database. Every time one uses a mobile device to place a call, sends a text message, views (or posts) a video or interacts with a website, that creates data. Every transaction, in any context, creates data, as does every email. The content that populates every Web venue from the text, images and rich media content built by site owners, to social networking content created by potentially any Web user at any time from anywhere around the globe—creates data. Even phone calls, if they are delivered as packets over an IP network, are now data.
Challenges
The foremost challenge is the need to unlock the data and gain access to it so you can store it and use it. This allows for the information to stay in its raw format, where it can be analyzed and reported on as it streams realtime into an analytics system. For structured data, this process is fairly straightforward. When working with unstructured data, on the other hand, advanced algorithms and powerful engines are needed to process the incoming data. ^{[7]}^{[8]}
Challenges in Machine Learning have proven to be efficient and costeffective ways to quickly bring to industry solutions that may have been confined to research. In addition, the playful nature of challenges naturally attracts students, making the challenge a great teaching resource. Challenge participants range from undergraduate students to retirees, joining forces in a rewarding environment allowing them to learn, perform research, and demonstrate excellence. Therefore challenges can be used as a means of directing research, advancing the stateoftheart or venturing in completely new domains. ^{[9]}
QUESTION 1: What are the limits of deep learning?
ANSWER: Yann LeCun, Director of AI Research at Facebook and Professor at NYU
The “classical” forms of deep learning include various combinations of feedforward modules (often convolutional nets) and recurrent nets (sometimes with memory units, like LSTM or MemNN).
These models are limited in their ability to “reason”, i.e. to carry out long chains of inferences, or optimization procedure to arrive at an answer. The number of steps in a computation is limited by the number of layers in feedforward nets, and by the length of time a recurrent net will remember things.
To enable deep learning systems to reason, we need to modify them so that they don’t produce a single output (say the interpretation of an image, the translation of a sentence, etc), but can produce a whole set of alternative outputs (e.g the various ways a sentence can be translated). This is what energybased models are designed to do give you a score for each possible configuration of the variables to be inferred. A particular instance of energybased models is factor graphs (nonprobabilistic graphical models). Combining learning systems with factor graphs is known as “structured prediction” in machine learning. There have been many proposals to combine neural nets and structured prediction in the past, going back to the early 1990s. In fact, the check reading system my colleague and I built at Bell Labs in the early 1990s used a form of structured prediction on top of convolutional nets that we called “Graph Transformer Networks”. There has been a number of recent works on sticking graphical models on top of ConvNets and training the whole thing end to end (e.g. for human body pose estimation and such). For a review/tutorial on energybased models and structured prediction on top of neural nets (or other models) see this paper: ^{[10]}
Deep learning is certainly limited in its current form because almost all the successful applications of it use supervised learning with humanannotated data. We need to find ways to train large neural nets from “raw” nonannotated data so they capture the regularities of the real world. As I said in a previous answer, my money is on adversarial training.
QUESTION 2: What are the pros and cons of Generative Adversarial Networks vs Variational Autoencoders?
ANSWER: Yoshua Bengio Head of Montreal Institute for Learning Algorithms, Professor @ U. Montreal
An advantage for VAEs (Variational AutoEncoders) is that there is a clear and recognized way to evaluate the quality of the model (loglikelihood, either estimated by importance sampling or lowerbounded). Right now it’s not clear how to compare two GANs (Generative Adversarial Networks) or compare a GAN and other generative models except by visualizing samples.
A disadvantage of VAEs is that, because of the injected noise and imperfect reconstruction, and with the standard decoder (with factorized output distribution), the generated samples are much more blurred than those coming from GANs.
The fact that VAEs basically optimize likelihood while GANs optimize something else can be viewed both as an advantage or a disadvantage for either one. Maximizing likelihood yields an estimated density that always bleeds probability mass away from the estimated data manifold. GANs can be happy with a very sharp estimated density function even if it does not perfectly coincide with the data density (i.e. some training examples may come close to the generated images but might still have nearly zero probability under the generator, which would be infinitely bad in terms of likelihood).
GANs tend to be much more finicky to train than VAEs, not to mention that we do not have a clear objective function to optimize, but they tend to yield nicer images.
A Few Useful Things to Know about Machine Learning
1. Learning = Representation + Evaluation + Optimization
Suppose you have an application that you think machine learning might be good for. The first problem facing you is the bewildering variety of learning algorithms available. Which one to use? There are literally thousands available, and hundreds more are published each year. The key to not getting lost in this huge space is to realize that it consists of combinations of just three components. The components are:
Representation. A classifier must be represented in some formal language that the computer can handle. Conversely, choosing a representation for a learner is tantamount to choosing the set of classifiers that it can possibly learn. This set is called the hypothesis space of the learner. If a classifier is not in the hypothesis space, it cannot be learned. A related question, which we will address in a later section, is how to represent the input, i.e., what features to use.
Evaluation. An evaluation function (also called objective function or scoring function) is needed to distinguish good classifiers from bad ones. The evaluation function used internally by the algorithm may differ from the external one that we want the classifier to optimize, for ease of optimization (see below) and due to the issues discussed in the next section.
Optimization. Finally, we need a method to search among the classifiers in the language for the highestscoring one. The choice of optimization technique is key to the efficiency of the learner, and also helps determine the classifier produced if the evaluation function has more than one optimum. It is common for new learners to start out using offtheshelf optimizers, which are later replaced by customdesigned ones.
2. Feature Engineering is the Key
At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used. If you have many independent features that each correlates well with the class, learning is easy. On the other hand, if the class is a very complex function of the features, you may not be able to learn it. Often, the raw data is not in a form that is amenable to learning, but you can construct features from it that are. This is typically where most of the effort in a ma chine learning project goes. It is often also one of the most interesting parts, where intuition, creativity and “black art” are as important as the technical stuff.
Firsttimers are often surprised by how little time in a ma chine learning project is spent actually doing machine learn ing. But it makes sense if you consider how timeconsuming it is to gather data, integrate it, clean it and preprocess it, and how much trial and error can go into feature design. Also, machine learning is not a oneshot process of building a data set and running a learner, but rather an iterative process of running the learner, analyzing the results, modifying the data and/or the learner, and repeating. Learning is often the quickest part of this, but that’s because we’ve already mastered it pretty well! Feature engineering is more difficult because it’s domainspecific while learners can be largely generalpurpose. However, there is no sharp frontier between the two, and this is another reason the most useful learners are those that facilitate incorporating knowledge.
Of course, one of the holy grails of machine learning is to automate more and more of the feature engineering process. One way this is often done today is by automatically generating large numbers of candidate features and selecting the best by (say) their information gain with respect to the class. But bear in mind that features that look irrelevant in isolation may be relevant in combination. For example, if the class is an XOR of k input features, each of them by itself carries no information about the class. (If you want to annoy machine learners, bring up XOR.) On the other hand, running a learner with a very large number of features to find out which ones are useful in combination may be too timeconsuming, or cause overfitting. So there is ultimately no replacement for the smarts you put into feature engineering. ^{[11]}
Criticism
1. Criticism by Machine Learning Experts
Machine learning lacks in some major and minor components. One of the common problems with machine learning is the debugging process. The automated process of debugging in machine learning can be extremely timeconsuming, which can make some users uncomfortable. The lack of statistical prediction invention in machine learning can cause the learning to lack in details. ^{[12]} Also, the difficulty lies in the fact that the set of all possible behaviors are given all possible inputs is too large to be covered by the set of observed examples.This makes the learner generalize from the given data, so as to be able to produce a useful output in new cases. ^{[13]}
Machine learning is the body of research related to automated largescale data analysis. Historically, the field was centered around biologically inspired models, and the longterm goals of much of the community are oriented to producing models and algorithms that can process information as well as biological systems.
The field also encompasses many of the traditional areas of statistics with, however, a strong focus on mathematical models and also prediction. Machine learning is now central to many areas of interest in computer science and related largescale information processing domains.
There are many problems haven't solved in practical machine learning. For example, in anomaly detection, the state of the art method suffers from scalability, usecase restrictions, the difficulty of use and a large number of false positives. ^{[14]}
2. Research Directions of Deep Learning
Machine learning: Trends, perspectives, and prospects (Science 2015) M. I. Jordan and T. M. Mitchell
Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today’s most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and lowcost computation. The adoption of dataintensive machinelearning methods can be found throughout science, technology and commerce, leading to more evidencebased decisionmaking across many walks of life, including healthcare, manufacturing, education, financial modeling, policing, and marketing.
Humanlevel concept learning through probabilistic program induction (Science 2016.5) Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum
People learning new concepts can often generalize successfully from just a single example, yet machine learning algorithms typically require tens or hundreds of examples to perform with similar accuracy. People can also use learned concepts in richer ways than conventional algorithms—for action, imagination, and explanation. We present a computational model that captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets. The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. On a challenging oneshot classification task, the model achieves humanlevel performance while outperforming recent deep learning approaches. We also present several “visual Turing tests” probing the model’s creative generalization abilities, which in many cases are indistinguishable from human behavior.
Oneshot Learning with MemoryAugmented Neural Networks (arXiv DeepMind 2016.6) Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, Timothy Lillicrap
Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of “oneshot learning.” Traditional gradientbased networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information without catastrophic interference. Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models. Here, we demonstrate the ability of a memoryaugmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory locationbased focusing mechanisms.
Improved Techniques for Training GANs (arXiv 2016.5) Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung
We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semisupervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve stateoftheart results in semisupervised classification on MNIST, CIFAR10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data and CIFAR10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.
Deep Residual Learning for Image Recognition (arXiv 2015.12) Kaiming He, Xiangyu Zhang, Shaoqing Ren Microsoft Research
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [41] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR10 with 100 and 1000 layers.
The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Book list
1. The Elements of Statistical Learning (ESL), Trevor Hastie etc.
During the past decade, there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boostingthe first comprehensive treatment of this topic in any book.
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, nonnegative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide data (p bigger than n), including multiple testing and false discovery rates. ^{[15]}
2. Pattern Recognition and Machine Learning (PRML), Christopher M. Bishop
This is the first textbook on pattern recognition to present the Bayesian viewpoint. The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning. No previous knowledge of pattern recognition or machine learning concepts is assumed. Familiarity with multivariate calculus and basic linear algebra is required, and some experience in the use of probabilities would be helpful though not essential as the book includes a selfcontained introduction to basic probability theory. ^{[16]}
3. Machine Learning: A Probabilistic Perspective (MLAPP), Kevin P. Murphy
Today's Webenabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and selfcontained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudocode for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled modelbased approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package  PMTK (probabilistic modeling toolkit)  that is freely available online. The book is suitable for upperlevel undergraduates with an introductorylevel college math background and beginning graduate students. ^{[17]}
4. Bayesian Reasoning and Machine Learning (BRML), David Barber
People who know the methods have their choice of rewarding jobs. This handson text opens these opportunities to computer science students with modest mathematical backgrounds. It is designed for finalyear undergraduates and master's students with limited background in linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models. Students learn more than a menu of techniques, they develop analytical and problemsolving skills that equip them for the real world. Numerous examples and exercises, both computer based and theoretical, are included in every chapter. Resources for students and instructors, including a MATLAB toolbox, are available online. ^{[18]}
5. Foundations of Machine Learning, Mehryar Mohri and Afshin Rostamizadeh
This graduatelevel textbook introduces fundamental concepts and methods in machine learning. It describes several important modern algorithms, provides the theoretical underpinnings of these algorithms, and illustrates key aspects for their application. The authors aim to present novel theoretical tools and concepts while giving concise proofs even for relatively advanced topics. Foundations of Machine Learning fills the need for a general textbook that also offers theoretical details and an emphasis on proofs. Certain topics that are often treated with insufficient attention are discussed in more detail here; for example, entire chapters are devoted to regression, multiclass classification, and ranking. The first three chapters lay the theoretical foundation for what follows, but each remaining chapter is mostly selfcontained. The appendix offers a concise probability review, a short introduction to convex optimization, tools for concentration bounds, and several basic properties of matrices and norms used in the book.
The book is intended for graduate students and researchers in machine learning, statistics, and related areas; it can be used either as a textbook or as a reference text for a research seminar. ^{[19]}
6. Deep Learning, Yoshua Bengio
The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. Deep learning is to allow computers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept deﬁned in terms of its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all of the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones.^{[20]}
7. Probabilistic Graphical Models: Principles and Techniques, Daphne Koller and Nir Friedman
Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs. ^{[21]}
8. Statistical Learning with Sparsity The Lasso and Generalizations (SLS), Trevor Hastie and etc.
This book attempted to summarize the actively developing field of statistical learning with sparsity. A sparse statistical model is one having only a small number of nonzero parameters or weights. It represents a classic case of “less is more”: a sparse model can be much easier to estimate and interpret than a dense model. In this age of big data, the number of features measured on a person or object can be large, and might be larger than the number of observations. The sparsity assumption allows us to tackle such problems and extract useful and reproducible patterns from big datasets. ^{[22]}
9. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies, John D. Kelleher and etc.
This introductory textbook offers a detailed and focused treatment of the most important machine learning approaches used in predictive data analytics, covering both theoretical concepts and practical applications. Technical and mathematical material is augmented with explanatory worked examples, and case studies illustrate the application of these models in the broader business context . These models are used in predictive data analytics applications including price prediction, risk assessment, predicting customer behavior, and document classification. The author also explains how Machine learning is often used to build predictive models by extracting patterns from large data sets. ^{[23]}
History
1946: the first computer system ENIAC was developed.
1950: Alan Turing proposed a test based on the idea that we can only determine if a machine can actually learn if we communicate with it and cannot distinguish it from another human.
1952: Arthur Samuel in IBM wrote the first gameplaying program, for checkers, to achieve sufficient skill to challenge a world champion. Samuel’s machines learning programs improved the performance of checkers players.
1957: Frank Rosenblatt showed that by combining a large number of classifiers in a network a powerful model could be created.
1964: ELIZA system developed by Joseph Weizenbaum. ELIZA simulated a psychotherapist by using tricks like string substitution and canned responses based on keywords.
1986: J.R. Quinlan introduced Decision Tress (ID3). This algorithm is capable of finding discrete valued target functions of large data. ^{[24]}
Algorithms
See Machine Learning Algorithm
Deep Learning
See Deep Learning
Companies providing machine learning software
See Companies providing machine learning software
Examples of Machine Learning
 screening large molecule databases and identify which (druglike) molecules are likely binding to a particular receptor protein
 predict the potency of a receptor agonist or antagonist
 find recognize input, find relevant searches, predict which results are most relevant to us, return a ranked output
 recommend similar products (e.g., Netflix, Amazon, etc.)
 predict if an applicant is creditworthy
 detect credit card fraud
 find promising trends on the stock markea
 autonomous Mars robots
 indentification of relevant information (objects) in large amounts of Astronomy data
See Examples of Machine Learning
Top 5 Recent Tweets
Date  Author  Comment 

April 21, 2019  alejandrina_gr  LMAO. I just created an automated parody account of AOC with machine learning, Twitter API, and a server. It’s work… https://t.co/89rPaWEhAi 
April 22, 2019  KirkDBorne  One of my alltime favorites >> The Most Complete List of the Best Cheat Sheets for #DataScientists covering #AI… https://t.co/6AQlhEpTF0 
April 22, 2019  JackPosobiec  Stanford University Student Uses Machine Learning to Create OcasioCortez Parody Bot @alejandrina_gr https://t.co/9ObqIGevT9 
April 21, 2019  SwiftOnSecurity  Then you add in image analysis and machine learning for anomalies. The future gov/corp surveillance state is not a… https://t.co/jJlfU3uRsU 
April 23, 2019  Fisher85M  Machine Learning Algorithms Cheat Sheet {Infographic}

Reference
 ↑ https://www.coursera.org/learn/machinelearning
 ↑ http://machinelearningmastery.com/whatismachinelearning/
 ↑ http://everything.explained.today/Machine_learning//
 ↑ http://machinelearningmastery.com/atourofmachinelearningalgorithms/
 ↑ http://machinelearningmastery.com/atourofmachinelearningalgorithms/
 ↑ https://www.quora.com/WhatarecurrentlythehottopicsinMachineLearningresearchandinrealapplications
 ↑ http://www.techworld.com/apps/startingbigdatainitiative3377370/
 ↑ http://www.techworld.com/apps/startingbigdatainitiative3377370/
 ↑ https://nips2015.sched.org/event/4GFD/challengesinmachinelearningciml2015openinnovationandcoopetitions
 ↑ https://scholar.google.com/citations?view_op=view_citation&hl=en&user=WLN3QrAAAAAJ&cstart=20&pagesize=80&citation_for_view=WLN3QrAAAAAJ%3A8k81klMbHgC
 ↑ http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
 ↑ https://www.quora.com/Whatarethetop10problemsinmachinelearning/
 ↑ http://www.cambridge.org/us/academic/subjects/computerscience/patternrecognitionandmachinelearning/bayesianreasoningandmachinelearning?format=HB&isbn=9780521518147
 ↑ http://www.cambridge.org/us/academic/subjects/computerscience/patternrecognitionandmachinelearning/bayesianreasoningandmachinelearning?format=HB&isbn=9780521518147
 ↑ http://www.springer.com/us/book/9780387848570
 ↑ http://www.springer.com/us/book/9780387310732
 ↑ https://mitpress.mit.edu/books/machinelearning0
 ↑ http://www.cambridge.org/us/academic/subjects/computerscience/patternrecognitionandmachinelearning/bayesianreasoningandmachinelearning
 ↑ https://mitpress.mit.edu/books/foundationsmachinelearning
 ↑ http://www.deeplearningbook.org/
 ↑ https://mitpress.mit.edu/books/probabilisticgraphicalmodels
 ↑ https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS_corrected_1.4.16.pdf
 ↑ https://play.google.com/store/books/details?id=3EtQCgAAQBAJ&source=productsearch&utm_source=HA_Desktop_US&utm_medium=SEM&utm_campaign=PLA&pcampaignid=MKTAD0930BO1&gl=US&gclid=COmIl9Ovkc4CFYKpNwodhA0O1w&gclsrc=ds
 ↑ http://www.erogol.com/briefhistorymachinelearning/