4 Machine Learning II
4.1 Multiple Linear Regression
4.2 Logistic Regression
4.2.1 Classification
4.3 Overfitting and Underfitting
- Underfitting: model too simple to capture data patterns
- Overfitting: model too complex, captures noise instead of signal
- Validation helps detect these behaviors.
4.3.1 Unsupervised Learning Use Cases
Use Case | Sample Inputs | Model Output Description | What ML Question is Being Answered? | What Business Question is Being Answered? | Example Algorithm(s) |
---|---|---|---|---|---|
Customer Segmentation | Age, income, purchase history | Cluster/group labels for each customer | What types of customers exist in my data? | How can I tailor marketing strategies to different customer types? | K-means, DBSCAN |
Topic Modeling | Articles or documents | Topics with keywords per document | What topics are being discussed? | What content themes resonate most with my audience or market? | LDA, NMF |
Anomaly Detection | Transaction logs, sensor data | Anomaly score or binary flag | Which data points are unusual? | Are there fraudulent transactions or system failures I need to act on? | Isolation Forest, Autoencoder |
Dimensionality Reduction | High-dimensional features (e.g., pixels) | 2D or 3D projections for analysis or visualization | How can I reduce feature space while preserving info? | How can I visualize or simplify complex data for human analysis or modeling? | PCA, t-SNE, UMAP |
Market Basket Analysis | Sets of purchased items | Association rules (A & B → C) | What items co-occur frequently in purchases? | Which product bundles or cross-sell offers should I promote? | Apriori, FP-Growth |
Word Embedding | Text corpus | Word vectors capturing semantic similarity | What are the contextual relationships between words? | How can I build a smarter search engine or chatbot that understands language context? | Word2Vec, GloVe |
Image Compression | Raw pixel arrays | Compressed version of the image | How can I represent this image with fewer features? | How can I reduce storage or transmission costs for image data? | Autoencoders |
4.4 Reinforcement Learning
4.4.1 Reinforcement Learning Use Cases
Use Case | Sample Inputs | Model Output Description | What ML Question is Being Answered? | What Business Question is Being Answered? | Example Algorithm(s) |
---|---|---|---|---|---|
Game Playing | Game state (e.g., board, score) | Action to take | What should I do to win the game? | How can I build an AI that outperforms humans or creates adaptive gameplay? | Q-learning, DQN |
Robotics & Control | Sensor data (angles, velocities, etc.) | Movement or control signals | How should the agent move next to reach a goal? | How can I automate physical tasks like picking, sorting, or navigating? | PPO, SAC, DDPG |
Autonomous Vehicles | Sensor input (camera, LIDAR, speed, GPS) | Driving action | What’s the optimal next driving move? | How can I develop a safe and efficient self-driving vehicle system? | Deep RL + sensor fusion |
Recommendation Systems | User history, preferences, session behavior | Recommended item | What should I recommend next? | How can I increase user retention, engagement, or sales? | Contextual Bandits, RL |
Portfolio Management | Financial indicators, stock prices | Asset allocation decision | How should I invest to maximize return? | How can I build an automated trading or portfolio optimization system? | Actor-Critic methods |
Personalized Education | Student progress and quiz results | Next learning step | What lesson or content should come next? | How can I boost student outcomes by personalizing learning pathways? | Multi-armed bandits |
Healthcare Treatment | Patient history and vitals | Treatment or intervention strategy | What care plan maximizes long-term patient health? | How can I optimize healthcare outcomes while reducing costs and readmissions? | Off-policy RL, POMDPs |
4.5 Glossary
logistic (sigmoid) function:
Softmax: