Home Contact
Topics in Data Science, Machine Learning, Artificial Intelligence
During my work and study in the field of AI/ML/DS/BD I often struggeld to structure the content properly to prevent losing oversight. It is also sometimes not easy to find good sources of information to learn about a topic. Therefore I created over time a hierarchical categorization of topics and added suitable links.
How this page is organized:
The topics Big Data, Artificial Intelligence, machine learning and data science statistics, math are all heavily overlapping and interrelated.
I break down the massive universe of knowledge for these topics in to several 3-level bullet point lists who are connected via links. I found this method most suitable when I studied fore the MBA
Content
Data Science top
- What is it What Is Data Science?
- Comparison with pure statistics
- Comparison with pure software development
- How to become a Data Scientist in 2019 Link
- Typical tools
- Storing data with files, SQL and Non-SQL databases
- Analyzing data by writing code or by using analytic tools
- Life cycle of Data Science projects
- Typical challenges in Data Science
Big Data top
- What is it
- Critical opinions on technology and design decisions
- Special technologies used
- Spark
- good course Udemy Link
- installing in Linux Link
- Use Case: Apache Spark @Scale: A 60 TB+ production use case Link
- Another good intro Link
- Deep Learning With Apache Spark????????? Link
- Hadoop
- Services by Amazon
- Services by Google
- Services By Microsoft
- Critics
- Theres No Such Thing as Big Data in HR link
- Streaming technologies
- Deep Learning For Real Time Streaming Data With Kafka And Tensorflow | YongTang - ODSC East 2019 link to video
- How to get data from SQL databases into Kafka Link
- Kafka Spark Xassandra pipeline Link
Artificial Intelligence top
- What is it
- Latest Trends in AI link
- Overview of categories of AI
- Machine Learning Basics
- Supervised more...
- Non-Supervised
- Reinforcement Learning (more) more...
- Deep Learning
- Various types of Neural Networks See here...
- How to set up AI projects
Robotics top
- The basic problem
- Resources and Links
Optimization algorithms top
Reinforcement Learning top
- What is it
- Simple example with very easy code YouTube
- Using TensorFlow Reinforcement Learning with TensorFlow
- Deep reinforcement learning: Combining reinforcement learning with neural networks
- A Brief Survey of Deep Reinforcement Learning Link
- Deep Reinforcement Learning for Dynamic Urban Transportation Problems Link
- Critics and way forward:
- Reinforcement learnings foundational flaw link
- How to fix reinforcement learning link
Statistics top
- Descriptive statistics. Describe the sample. Summarize it with simple terms
- Mean
- Average
- Variance
- Histogram / PMF
- Exploratory data analysis: Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.
- Statistical Inference, inductive statistics. Draw conclusions about the population Overview presentation
- General approaches / framworks
- Frequentist apporach
- Bayesian Approach
- Likelihood-based_inference Link
- Akaike information criterion Link
- Correlation
- Point estimation
- Interval Estimation
- Hypothesis testing
- Frequentist apporach Link
- Bayesian Approach Link
- Model Selection
- Frequentist apporach Link
- Bayesian Approach Link
- Machine Learning vs. Statistical inference Link to Quora discussion
Cumulative distribution functions
Distributions - Probabilities: Exponential, Pareto, Normal, Lognormal
Probability
Operations on distributions - Skewness
- PDF
- Central Limit theorem
- Distribution Framework
Hypothesis Testing - differences in mean - the p-value
- cross validation
- bayesian probabilities
- threshold
- chi-square test
- power
Estimation - Confidence intervals
- Bayesian estimation
Main concepts. From: Discovering statistics using R
- Measurement error
- correlation
- frequency distribution
- Mean
- standard error
- confidence interval
- test statistics
- Type I and II errors
- Power
- Testing for homogeneity
- testing for normal distribution
- Shaprio Wink test
- Levene Test
- Hartley F test
- Correlation
- Comparing correlation
- Regression
- Logistic Regression
- Comparing two means
- Comparing seveal means: ANOVA, GLM 1
- Analysis of covariance: ANCOVA GLM 2
- Factorial ANOVA GLM 3
- Repeated Measures design GLM4
- Mixed designs GLM 5
- Non-parametric tests
- Multivariate Analysis MANOVA
- Exporatory factor analysis
- Categorial data
- chi-square test
- fishers exact test
- likelihood ratio
- Yates correction
- Multilevel linear models
- Various other
- R-square and its limitations Link
- Basic Statistics Every Data Scientist Should Know Link
Deep Learning top
Supervised Machine Learning top
- Classification problem
- 2018 State of the art document analysis (machine reading) Link
- Various architectural approaches
- SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks
for Real-Time Object Detection for Autonomous Driving PDF
- SqueezeNet implementation in Keras Link
- Regression
- Beyond linear regression. Using polynomial fit. Link
- Algorithms I used that can be found in Scikit or statsmodels library in Python: Ridge,
LinearRegression,
GradientBoostingRegressor,
SupportVectorRegression,
DecisionTreeRegressor,
exponential,
SGDRegressor,
Lasso,
SVR,
MLPRegressor Illustration how to chose algorithm
- Python vs. R code for typical machine learning algorithms Link
- Forecasting Tiemseries
- Special topics and challenges
- Overfitting and how to avoid it
- Various types of neural networks See also here
- Generative Adversarial Network (GAN) Link
- The rise of GANs Link
- The original paper: titled Generative Adversarial Networks Link
- Feed-forward neural networks. This is the basic, the classic neural network
- Recurrent neural network (RNN)
- Wikpedia Link
- Tutorial Link
- Tutorial Link
- Tutorial Link
- Recurrent Neural Networks by Example in Python Link
- Multi-layer perceptrons (MLP)
- Convolutional neural networks. Geoffrey Hinton talk "What is wrong with convolutional neural nets ?" ... Video
- Convolutional Neural Networks - The Math of Intelligence (Week 4) Video
- Math Behind Convolutional Neural Networks Link
- Recursive neural networks
- VARIATIONAL AUTOENCODERS Link
- Deep belief networks
- Convolutional deep belief networks
- Self-Organizing Maps
- Deep Boltzmann machines
- Stacked de-noising auto-encoders
- CapsNet or Capsule Network Link
- Multimodal Deep Learning Link
- Memory-Augmented Neural Networks
Current trends and hot topics in Artificial Intelligence top
- Hot topics in AI research (2018) link
- The current trends in Artificial Intelligence (2018) link
- Important AI reserach papers in 2018 link
- Why Machine Learning Models Crash And Burn In Production Article
Various articles on algorithms top
- A tour of the top 5 sorting algorithms with Python code Link
- Differences Between AI and Machine Learning, and Why it Matters Link
- Self-documenting code is (mostly) nonsense Link
Various articles on Data, Engineering, AI, etc. top
- ON the evolution of Data Engineering Link
- Data Engineering Introduction and Epochs PDF
- NoSQL Databases Overview, Types and Selection Criteria Link
- Data modeling concepts Link
- Pandas, Dask or PySpark? What Should You Choose for Your Dataset? Link
- Step by Step guide to Version Control your Machine Learning and Deep Learning tasks Link
- 14 Different Types of Learning in Machine Learning Link
- AI platforms compared Link
- Top 10 Machine Learning Interview Questions 2019 Link
- Here?????????s why so many data scientists are leaving their jobs Link
- The most difficult thing in data science: politics Link
- Effective Microservices: 10 Best Practices Link
- Python Libraries for Interpretable Machine Learning Link
- Machine Learning Summarized in One Picture Link
- ED-12C/DO-178C vs. Agile Manifesto - A Solution to Agile Development of
Certifiable Avionics Systems PDF
- Confronting the Mission-Critical Software Testing Challenge Webinar Video
- Getting started with Apache Airflow Link
- Hadoop vs. cassandra etc. Link
- Elastic Search SQL Link
- Google Sheets on edit trigger Link
- Developing Safety-Critical Software: A Practical Guide for Aviation Software and DO-178C Compliance Hardcover - 7. January 2013 Book
- How to Automate Hyperparameter Optimization Link
- The Most In Demand Tech Skills for Data Scientists Link
- THE DIFFERENCES BETWEEN DESCRIPTIVE, DIAGNOSTIC, PREDICTIVE & COGNITIVE ANALYTICS Link
- Google just published 25 million free datasets Link
- Tensorflow for R Link
- The Best Free Data Science eBooks Link
- Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber Link
- Cython-A Speed-Up Tool for your Python Function Link
- How many tech skills are enough for a data scientist? Link
- 30 Python Best Practices, Tips, And Tricks Link
- Speed up your Data Analysis with Python?????????s Datatable package Link
- An Overview of Python?????????s Datatable package Link
- Data Manipulation with Python Pandas and R Data.Table Link
- How To Evaluate Unsupervised Learning Models Link
- How to Extract Text from Images with Python Link
- Learn How to Quickly Create UIs in Python Link
- How to embed Bootstrap CSS & JS in your Python Dash app Link
- Airflow: how and when to use it (Advanced) Link
- An Introduction to Statistical Analysis and Modelling with Python Link
- A Recipe for Organising Data Science Projects Link
- Introduction to Vectors and Matrices using Python for Data Science Link
- What is XGBoost? And how to optimize it? Link
- Computer Vision Recipes: Best Practices and Examples Link
Home Contact