ZMedia Purwodadi

MLOps in Practice - How Machine Learning Moves from Notebook to Production

Table of Contents

 



A Good Notebook Is Not a Production AI System

A machine learning model can look perfect inside a notebook. The dataset is clean, the charts look impressive, and the accuracy score feels exciting. During a demo, everyone may believe the project is ready.

But a notebook is not the real world.

A business cannot depend on someone manually opening Jupyter Notebook, running cells one by one, downloading predictions, and sending results to another team. That may work for a college project or a quick experiment, but not for a real company process.

A production AI system must run reliably without daily manual effort. It must handle new data, unexpected errors, security rules, version updates, monitoring, and business approvals.

This is where MLOps becomes important.

MLOps means machine learning operations. It is the practical process of taking a machine learning model from experiment stage to production stage and keeping it useful after deployment.

The goal is not only to build a model. The goal is to make that model work safely and repeatedly in a real business environment.

The Real Gap Between Demo and Production

Many machine learning projects fail after the demo stage.

The model may work on the data scientist’s laptop, but the company still has many unanswered questions.

Where will new data come from?
How often will predictions run?
Who will approve a new model version?
What happens when input data changes?
How will errors be detected?
How will the model be monitored?
Can the system roll back to an older version?
Who owns the model after deployment?

These questions are not only about machine learning. They are about software engineering, data pipelines, infrastructure, security, and business responsibility.

A notebook proves that an idea may work. MLOps helps prove that the idea can survive in production.

Link to: AI vs Rule based systems

A Simple MLOps Flow

A practical MLOps workflow usually looks like this:

Raw data
   |
   v
Data validation
   |
   v
Feature engineering
   |
   v
Model training
   |
   v
Experiment tracking
   |
   v
Model testing
   |
   v
Model registry
   |
   v
Deployment
   |
   v
Monitoring
   |
   v
Retraining or rollback

This flow keeps the machine learning project controlled.

Without this structure, teams may lose track of which model was trained, which dataset was used, why performance changed, or which version is serving real users.

Why Normal DevOps Is Not Enough

MLOps is inspired by DevOps, but machine learning has extra challenges.

In normal software development, developers write rules. If the code and input are the same, the output is usually predictable.

Machine learning is different. The model’s behavior depends on data.

A change in data can change model performance, even if the code did not change.

Example:

Software bug:
Code changed and feature broke.

ML issue:
Code did not change, but real-world data changed and predictions became weak.

This is why machine learning systems need monitoring beyond normal server health.

A model can be online and still be wrong.

A Retail Example: Stock Prediction Model

Imagine a retail company builds a model to predict which products may go out of stock.

Inside the notebook, the model uses past sales data and predicts stock risk accurately.

But production needs more than accuracy.

The company must decide:

How will daily sales data enter the system?
What happens if yesterday’s sales file is missing?
How will stock managers receive alerts?
How will wrong predictions be reported?
How often should the model retrain?
Who approves the updated model?

If these problems are not solved, the model remains only a demo.

MLOps turns the model into a working business tool.

Data Pipelines: The Starting Point of MLOps

Every machine learning system starts with data.

In experiments, data may come from a CSV file saved manually.

In production, data may come from:

Databases
Application logs
Customer activity
Sensors
Payment systems
CRM tools
Inventory systems
External APIs

A production data pipeline collects, cleans, checks, and moves this data automatically.

Example flow:

Sales database
      |
      v
Daily data extraction
      |
      v
Validation checks
      |
      v
Feature processing
      |
      v
Prediction system

If the pipeline breaks, the model may receive missing, outdated, or incorrect data.

That means MLOps starts before model training. It starts with reliable data movement.

Data Validation Before Training

Data validation checks whether the incoming data is usable.

Example:

A model expects customer age as a number.

But one day the data contains:

"twenty five"
"unknown"
NULL
-5

These values can break training or produce poor predictions.

Data validation can check:

Missing values
Wrong data types
Unexpected categories
Negative values where not allowed
Duplicate records
Unusual spikes
Empty files
Schema changes

This is important because bad data creates bad models.

A production system should detect data problems early instead of silently training on damaged data.

Feature Engineering Must Be Consistent

Features are processed values that the model uses for learning or prediction.

Example:

Raw data:

Order date: 2026-06-20
Customer signup date: 2025-12-10
Total purchase amount: ₹4,500

Possible features:

Day of week
Customer age in months
Average order value
Purchase frequency
Days since last order

A common production mistake happens when training and live prediction calculate features differently.

For example, during training, customer spending may be calculated for the last 90 days. In production, the application may accidentally calculate it for the last 30 days.

The model receives different meaning than what it learned.

MLOps encourages reusable feature pipelines so that training and production use the same logic.

Experiment Tracking: Avoiding Final Model Confusion

Data scientists may test many models.

Example:

Logistic regression
Random forest
XGBoost
Neural network
Different learning rates
Different feature sets
Different datasets

Without tracking, the project becomes confusing.

Files may be named:

model_final.pkl
model_final_2.pkl
best_model_new.pkl
latest_best_model.pkl

This is risky.

Experiment tracking records:

Model type
Training dataset version
Feature list
Hyperparameters
Accuracy metrics
Training time
Code version
Developer notes
Generated model file

This helps the team understand exactly how a model was created.

If a model performs well, the team can reproduce it. If a model performs badly, the team can investigate.

Model Testing Beyond Accuracy

Accuracy is important, but it is not enough for production.

A model may have high accuracy but still fail in business use.

Example:

A fraud detection model may look accurate overall, but it may miss rare high-risk fraud cases.

A recommendation model may increase clicks but recommend low-quality products.

A loan model may perform well on average but behave unfairly for certain customer groups.

Production testing should check more than one number.

Useful checks include:

Prediction quality
Response speed
Memory usage
Data compatibility
Security risks
Bias and fairness
Failure behavior
Output format
Business rule compliance

A model should move to production only when it satisfies the company’s technical and business standards.

Model Registry: Controlled Storage for Models

A model registry is like a controlled library for trained models.

It stores model versions and their status.

Example statuses:

Experimental
Under review
Approved for testing
Production
Retired
Rejected

A model registry helps answer important questions:

Which model is active now?
Which dataset trained this model?
Who approved this model?
What was the previous version?
Can we roll back quickly?
Why was this model replaced?

This is useful in large companies and regulated industries.

Without a registry, teams may not know which model is actually serving users.

Link to: Embeddings AI

Deployment Patterns in MLOps

A model can be deployed in different ways depending on the use case.

Real-Time API Deployment

A model runs as an API and responds immediately.

Example:

User makes payment
      |
      v
Fraud model checks transaction
      |
      v
Risk score returned in milliseconds

This is useful for fraud detection, recommendations, search ranking, and real-time personalization.

Batch Prediction Deployment

The model runs on a schedule and stores predictions.

Example:

Every night:
Predict tomorrow’s product demand
Store results in dashboard
Send alerts to stock team

This is useful when predictions do not need instant response.

Streaming Deployment

The model processes continuous events.

Example:

Sensor data
Clickstream events
Live transactions
Security logs

This is useful in monitoring, cybersecurity, IoT, and real-time analytics.

Edge Deployment

The model runs on a device instead of a central server.

Example:

Mobile app
Camera device
Factory machine
Vehicle system

This is useful when speed, privacy, or offline access matters.

CI/CD/CT in Machine Learning

In software development, CI/CD means continuous integration and continuous delivery.

In machine learning, there is another important idea: continuous training.

A simple view:

CI  = test code and pipeline changes
CD  = deploy approved model versions
CT  = retrain model when needed

Continuous training does not mean retraining every minute. That can be risky.

Retraining should happen based on clear conditions.

Examples:

New data reaches a required volume
Model accuracy drops below threshold
Business rules change
Scheduled monthly retraining
Major data drift is detected

The new model should still be tested before replacing the old one.

Model Monitoring After Deployment

A deployed model needs monitoring.

Normal software monitoring checks:

Server uptime
Response time
Errors
CPU and memory usage

MLOps monitoring also checks:

Input data changes
Prediction distribution
Model accuracy
Drift
Failed predictions
Business impact

A model can be technically working but practically failing.

Example:

A demand forecasting model may continue returning predictions, but customer buying behavior may have changed after a price increase. The system is online, but the predictions may no longer be useful.

Monitoring helps detect this early.

Data Drift

Data drift happens when live input data becomes different from training data.

Example:

A model was trained on customers aged 25 to 45. After a new marketing campaign, many customers are aged 18 to 22.

The model may still run, but it is now seeing a different user group.

Data drift does not always mean the model is broken, but it means the team should investigate.

Training data:
Mostly office workers

Live data:
Mostly college students

The model may need retraining or adjustment.

Concept Drift

Concept drift happens when the relationship between input and output changes.

Example:

Before a pricing change, users who reduced usage were likely to cancel. After a new cheaper plan is introduced, reduced usage may no longer mean cancellation risk.

The old pattern becomes weaker.

Old relationship:
Low usage → likely cancellation

New relationship:
Low usage → may be normal for cheaper plan

This is more serious than normal data change because the meaning behind behavior has changed.

Prediction Drift

Prediction drift happens when the model’s outputs change unexpectedly.

Example:

A fraud model usually flags 2% of transactions.

Suddenly, it flags 15%.

Possible reasons:

Real fraud increased
Input data format changed
A new user group arrived
The model is receiving bad data
A business process changed

Monitoring does not automatically know the reason. It alerts the team so they can investigate.

Reproducibility in MLOps

A model should not depend on one laptop.

If another engineer uses the same code, data, and settings, they should be able to reproduce the result or understand why it changed.

This requires versioning.

Teams may version:

Source code
Training data
Feature logic
Configuration files
Model files
Python packages
Docker images
Environment settings

Reproducibility is important for debugging, auditing, and long-term maintenance.

If a model created six months ago caused a problem, the team should be able to trace how it was built.

Link to: Types of Machine Learning

Simple Command Example: Freezing Dependencies

In Python projects, teams often save dependency versions.

pip freeze > requirements.txt

This helps another environment install the same package versions.

Install dependencies:

pip install -r requirements.txt

This is a simple step, but it supports reproducibility.

For larger production systems, teams may use containers.

Simple Docker Example for Model Service

A model API may be packaged with Docker.

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["python", "app.py"]

This makes the runtime environment more consistent.

Docker does not solve every MLOps problem, but it helps reduce “it works on my machine” issues.

Simple API Deployment Example

A model can be exposed through an API.

from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

model = joblib.load("model.pkl")

@app.route("/predict", methods=["POST"])
def predict():
    data = request.json
    prediction = model.predict([data["features"]])

    return jsonify({
        "prediction": prediction[0]
    })

if __name__ == "__main__":
    app.run()

This is a simplified example.

A production API needs more safety features such as validation, authentication, logging, rate limiting, monitoring, and error handling.

Rollback: Returning to a Stable Model

A new model is not always better.

Sometimes a new model performs well in testing but fails in production.

A rollback process allows the team to return to a previous stable version quickly.

Simple rollback idea:

New model deployed
        |
        v
Monitoring detects issue
        |
        v
Traffic moved back to old model
        |
        v
Team investigates new model

Rollback is important because production systems should not depend on hope.

Every release should have a recovery plan.

Security in MLOps

Machine learning systems often handle sensitive data.

Examples:

Customer records
Transactions
Medical data
Employee details
Location data
Business documents
User behavior logs

Security must be part of the MLOps workflow.

Important practices:

Limit data access
Encrypt sensitive data
Avoid storing secrets in code
Log carefully
Protect model endpoints
Control who can approve deployments
Review third-party packages
Remove unnecessary personal data

A model pipeline can become a security risk if it copies sensitive data into unsafe locations.

Governance and Approval

Not every model should go to production automatically.

Some models need human review before release.

Approval may involve:

Data science review
Engineering review
Business owner approval
Security approval
Compliance approval
Fairness review

This is especially important for finance, healthcare, insurance, hiring, education, and safety-related systems.

MLOps creates a clear approval path so teams know who is responsible.

People Matter More Than Tools

MLOps is not only about tools.

A company may buy an expensive platform and still fail if teams do not communicate.

A production AI system usually involves:

Data scientists
Data engineers
Software developers
DevOps engineers
Cloud engineers
Security teams
Business experts
Compliance teams
Product managers

Each team has a different responsibility.

A data scientist may build the model.
A data engineer may prepare reliable data pipelines.
A software engineer may connect the model to an application.
A DevOps engineer may manage deployment and monitoring.
A business expert may confirm whether predictions are useful.

MLOps works best when responsibility is shared clearly.

Small Team MLOps

A small company does not need a huge MLOps platform on day one.

A practical small-team setup may include:

Git for code versioning
Clear folder structure
Saved training data version
requirements.txt or Docker
Basic automated tests
Simple model version naming
Cloud deployment
Monitoring dashboard
Manual approval checklist
Rollback plan

This is enough for many early projects.

The goal is not to copy big-company infrastructure. The goal is to avoid fragile manual steps.

Large Company MLOps

A larger company may need more advanced MLOps.

Possible requirements:

Model registry
Feature store
Automated retraining
Approval workflows
Advanced monitoring
Access control
Audit trails
Multi-environment deployment
A/B testing
Canary releases
Compliance reporting

The larger the business risk, the stronger the process should be.

A bank running fraud models needs more control than a small team running a simple internal sales forecast.

Warning Signs That a Team Needs MLOps

A team may need stronger MLOps when these problems appear:

Models work in demos but never reach users
Nobody knows which model is in production
Training data and production data are processed differently
A model becomes inaccurate without anyone noticing
Deployments require manual steps every time
Old experiments cannot be reproduced
Model updates cause unexpected failures
There is no rollback plan
Monitoring is missing
Business teams do not trust predictions

These warning signs show that the machine learning workflow is not mature enough.

MLOps turns repeated confusion into a controlled process.

Realistic Example: Late Delivery Prediction

Imagine a delivery company wants to predict whether a package will arrive late.

The model may use:

Distance
Weather
Traffic
Warehouse workload
Driver availability
Past delivery performance
Package type
Delivery location

Inside a notebook, the model predicts late deliveries with good accuracy.

An MLOps process turns this into a working service.

Flow:

Delivery data collected daily
        |
        v
Data validation checks missing values
        |
        v
Features are generated consistently
        |
        v
Model predicts late-delivery risk
        |
        v
Dispatch dashboard shows risk score
        |
        v
Monitoring checks prediction quality
        |
        v
New model is trained if performance drops

This helps dispatch teams act early.

They may assign another driver, notify customers, or adjust routes.

The value is not only the model. The value is the full production workflow.

Link to : Rag AI

Realistic Example: Customer Churn Prediction

A subscription company wants to predict which users may cancel.

The notebook model uses:

Login frequency
Last payment date
Support tickets
Feature usage
Plan type
Previous discounts
Account age

In production, the model must run regularly and send useful results to the retention team.

MLOps handles:

Daily data update
Feature calculation
Model scoring
Dashboard update
Alert generation
Accuracy monitoring
Retraining schedule

If customer behavior changes after a pricing update, monitoring may detect drift.

The company can retrain the model instead of continuing with outdated predictions.

Beginner Mistake: Only Focusing on Accuracy

Accuracy is not the only production requirement.

A model may be accurate but too slow.
A model may be accurate but hard to explain.
A model may be accurate but unfair for some groups.
A model may be accurate but expensive to run.
A model may be accurate but difficult to update.

Production AI needs more than a good score.

Useful production questions:

Is the model fast enough?
Is the output explainable enough?
Is the data reliable?
Can we monitor it?
Can we roll back?
Can we update it safely?
Does the business trust it?

MLOps helps answer these questions.

Beginner Mistake: Manual Deployment

Manual deployment may look harmless in the beginning.

Example:

Train model locally
Upload file manually
Restart server manually
Send message to team manually

This becomes risky over time.

Someone may upload the wrong file. Another person may forget a step. A new team member may not know the process.

Automated deployment reduces these risks.

Even a simple script is better than an undocumented manual process.

Beginner Mistake: No Monitoring

A model without monitoring is like a vehicle without a dashboard.

It may be moving, but you do not know whether something is wrong.

Monitoring should show:

How many predictions are made
How many requests fail
How fast the model responds
How input data changes
How predictions change
How business results change

Without monitoring, teams may discover problems only after users complain.

Beginner Mistake: Ignoring Business Feedback

A model may perform well technically but fail operationally.

Example:

A sales prediction model may be accurate, but the sales team may not understand how to use the output.

A churn model may rank customers, but the marketing team may not know what action to take.

A model becomes valuable only when people can use it.

MLOps should include feedback from business users, not only engineers.

Practical MLOps Checklist

Before deploying a model, check:

Data source is reliable
Data validation exists
Feature logic is consistent
Experiments are tracked
Model is tested beyond accuracy
Model version is stored
Deployment process is repeatable
Monitoring is configured
Rollback plan exists
Security review is done
Business owner approves usage
Maintenance owner is assigned

This checklist helps prevent production surprises.

MLOps Interview Points

MLOps is common in AI, data science, and DevOps interviews.

Important points:

MLOps helps deploy and maintain ML models in production
It combines data science, DevOps, data engineering, and monitoring
A notebook is not enough for production
Data pipelines must be reliable
Feature consistency is important
Experiments and models should be versioned
Model registry controls approved versions
Monitoring checks drift and prediction quality
Rollback is needed for failed releases
MLOps is also about team collaboration

A strong interview answer should include one real example, such as fraud detection, delivery delay prediction, or customer churn prediction.

Practical Mindset for MLOps

MLOps is not about making a project look advanced. It is about making machine learning dependable.

A model inside a notebook is only a starting point. A production AI system needs data pipelines, testing, deployment, monitoring, security, rollback, and people who own the process.

A simple way to remember it:

Notebook proves the model can work.
MLOps proves the model can keep working.

That is the real difference between an experiment and a production AI system.


Link to: AI Hallucination

Link to : Rag AI

Link to: Fine Tuning AI

Link to: Token AI

Link to: AI Agent

Link to: Vector Database

Link to: Types of Machine Learning

Link to: Embeddings AI

Link to: AI vs Rule based systems

Post a Comment