Introduction:
Data tiering is becoming a critical pillar of every organization’s data and analytics strategy. As companies move toward AI-driven decision-making and deploy AI agents, MCP servers, and other autonomous capabilities, the value of context-rich, trusted data products increases dramatically. Simply put: high‑quality data is the fuel that powers generative, agentic, and autonomous applications across the business.A modern data tiering (or data management) strategy is about organizing and storing data in a way that optimizes both cost and performance. Data is being generated at unprecedented speed and volume, from business transactions and master data to IoT signals, application logs, surveys, social media, and streaming sources. Not all of this data is equally valuable at all times, and not all of it needs the same level of performance or freshness. The key is to balance the cost of storing data with the cost (and business value) of accessing it.
Different consumption patterns drive different requirements:
- Real-time operational analytics (daily, weekly)
- Tactical reporting (monthly, quarterly)
- Strategic analysis (multi-year historical trends)
- Advanced AI and machine learning use cases
For each of these, it is essential to understand what data tiering is, which options exist, and in which scenarios to apply which tier. In this technical article, we focus on data tiering in the context of SAP’s Business Data Cloud, with a particular emphasis on SAP Datasphere. We align the discussion with SAP’s latest data and analytics strategy as presented at Sapphire 2026, and provide practical guidance and best practices for designing an effective tiering strategy that supports both traditional analytics and next-generation AI workloads.
1. Business & AI Context:
AI is no longer an experimental add‑on to the SAP landscape; it is becoming the execution layer of the enterprise. With SAP’s vision of the Autonomous Enterprise, AI agents, assistants, and Joule-powered workflows are expected to automate large parts of finance, supply chain, HR, and customer operations. All of that depends on one thing: trusted, well‑managed data with rich business context.
1.1 Why Data Tiering Matters in an AI-First World?
Modern AI in SAP is context-hungry. It does not just need a lot of data; it needs: Who the customer is across CRM, S/4HANA, and e‑commerce, Which material or location a transaction relates to, What policies or controls apply in a given process or region, How a process (order-to-cash, procure-to-pay, hire-to-retire) is supposed to run. This business context is what SAP brings to AI, and it is exactly what SAP Business Data Cloud and SAP Datasphere are designed to provide. However, the data feeding AI agents, planning models, and analytics is exploding in volume and variety:
- Core ERP transactions and master data
- IoT, machine telemetry, and sensor streams
- Application logs and clickstreams
- Survey, social, and interaction data
- External market and partner data
Not all of this data needs to sit on the most expensive, highest‑performance storage. At the same time, some scenarios, like Joule agents making operational decisions, cannot tolerate high latency or missing data. Data tiering is how you manage this trade‑off between cost and performance without compromising AI quality and trust.
1.2 From Reporting to Autonomous Decisions
Historically, data tiering conversations in SAP environments focused on:
- Reducing database size (especially in ERP and BW)
- Managing performance for reporting and planning
- Meeting retention and compliance requirements at lower cost
In the Autonomous Enterprise, the stakes are higher. Tiering decisions now influence:
- Real-time operational analytics: e.g., S/4HANA operational KPIs, supply chain control towers, same‑day profitability
- Tactical management reporting: month‑end/quarter‑end analysis, scenario planning, rolling forecasts
- Strategic and historical analysis: multi‑year trends, structural cost analysis, sustainability reporting
- AI and agentic use cases: Joule agents executing workflows, AI copilots assisting users in SAP Fiori and line-of-business apps,RAG‑style (retrieval‑augmented generation) scenarios pulling context into LLMs, ML models trained on long‑range historical, IoT, or behavioral data
Each of these has a different tolerance for latency, freshness, data volume, and cost. A well‑designed tiering strategy ensures that:
- Hot data is available at in‑memory speed for agents and operational analytics
- Warm data supports mid‑term analysis and planning without over‑provisioning HANA
- Cold data and archives are kept cost‑efficiently but still discoverable, governable, and usable for ML and compliance
2. Data Tiering Concepts & Terminology:
Before going into SAP Datasphere specifics, it helps to establish a clear, neutral vocabulary for data tiering that you can reuse with architects, business stakeholders, and AI teams.
2.1 What Is Data Tiering? Data tiering is a data management strategy where you separate data into different temperature tiers and storage technologies and core dimensions classifying it:
- Access frequency: How often is the data read? Hourly, daily, monthly, rarely?
- Update requirement: Does the data need to be updated after initial load? Is it transactional (insert/update/delete) or more append‑only?
- Performance & latency: How fast do queries need to return? Does an AI agent or operational dashboard depend on sub‑second responses?
- Data Freshness: Does the data need to be real‑time, near real‑time, or is daily/weekly refresh enough?
- Retention period: How many years must the data be kept for legal, audit, or business reasons?
- Data volume: How large is the dataset today, and how fast is it growing?
- Security & compliance: Are there specific regulations (GDPR, industry rules) that affect where data can be stored and who can access it?
- Business criticality (MOST IMPORTANT): If this data is slow or unavailable, what is the impact on the business and on AI‑driven processes?
In SAP terms, data tiering is part of multi‑temperature data management: you keep the most valuable, time‑critical data on your fastest (and most expensive) infrastructure, while moving less critical data to cheaper, slower layers without losing access to it.
2.2 The Classical 3 Tiers: Hot, Warm, Cold Most SAP data tiering discussions (HANA , BW/4HANA, Datasphere) use the same three core concepts, with slightly different technical realizations. Remote Data Tier is an data access pattern.
2.3 Data Tiering vs. Related Concepts in SAP: It is useful to separate data tiering from other, sometimes overlapping, concepts:
- Data aging (e..g SAP S/4HANA): Framework that moves older data from hot in‑memory storage to warm disk‑based storage within the same HANA system. It reduces memory footprint while keeping data accessible. This is a mechanism that supports a tiering strategy at the application level.
- Data Tiering Optimization (DTO) in BW/4HANA: Classifies data in an ADSO as hot, warm, or cold, and maps each temperature to a specific storage technology (e.g., HANA in‑memory, HANA NSE/extension nodes, SAP IQ). DTO is a tool that applies tiering policies in BW/4HANA.
- Embedded Data Lake / data lake integration: Storing large volumes of structured/semi‑structured/unstructured data on low‑cost cloud object storage (e.g., embedded data lake in SAP Datasphere, or external data lakes). This typically supports warm/cold tiers for large volumes.
- Virtualization / federation: Data remains in the source system (S/4HANA, Ariba, ECC, BW, non‑SAP DB, data lake) and is accessed on demand via remote tables or views. Virtualization is not a tier by itself, but an access pattern that can implement hot, warm, or cold tiers without physical replication.
2.4 Analytical & AI Consumption Patterns: Tiering decisions should start from the intended consumption patterns, not from the storage technology. Key patterns include:
- Real‑time operational analytics: Examples: same‑day order‑to‑cash KPIs, supply chain control tower, production monitoring.Requirements: hot data, very low latency, typically in‑memory HANA, minimal joins across slow systems.
- Tactical reporting & planning: Examples: monthly financial close, rolling forecasts, quarterly performance reviews. Requirements: mix of hot and warm data, acceptable latency in seconds; may combine data from replicated and virtual sources.
- Strategic, historical, and regulatory analysis: Examples: multi‑year profitability, sustainability KPIs, root‑cause analysis over several years. Requirements: heavy use of warm and cold tiers; performance is important but not sub‑second; cost and retention dominate.
- AI & ML training: Examples: training predictive models, building recommendation systems, anomaly detection, LLM fine‑tuning. Requirements: large volumes of warm/cold data; batch access; cost‑efficient storage more important than interactive speed.
- AI & agent inference / context retrieval (RAG): Examples: Joule or custom agents answering questions or taking actions using business data context. Requirements: hot data products with low latency and rich semantics; limited but well‑curated warm data for extended context; some cold data may be pre‑indexed or summarized.
Each pattern maps to a different temperature mix which later translates into concrete options in SAP Datasphere (local tables (In-memory/Disk), remote tables, object storage (Delta-tables), etc.).
2.5 Temperature as a Policy, Not Just a Location: A subtle but important point: data temperature is a policy, not just a physical storage location.For example:
- A table partition might be “hot” for the current year and “warm” for previous years, even though the table itself still lives in SAP Datasphere.
- An object in an embedded data lake (Object store) might be “warm” when actively used for ML training, then become “cold” once the model is trained and only needs occasional retraining.
- Data can move hot → warm → cold as it ages (time‑based policies) or cold → warm → hot when certain events or AI scenarios increase its business relevance (usage‑based policies).
This policy‑driven view becomes especially important in SAP Datasphere and Business Data Cloud, where you design data products and spaces that must support changing business and AI requirements over time.
2.6 Data tiering characteristics:
| Data Tier | Performance | Cost | Data Volume | Data Updates | Typical Business Scenarios |
| HOT | Best / Ultra-Fast | High | Low-Medium | High | Operational reporting, AI Agents, Joule context, SAC dashboards, Open Orders, Current Inventory, Planning, Master Data |
| WARM | Medium | Medium | Medium-High | High-Medium | Historical operational analytics (1–5 years), Supply Chain analytics, Procurement spend analysis, Customer analytics, Finance trending, AI/ML |
| COLD | Low | Low | Massive | Low/None | Audit, Compliance, Archived ERP/BW history, Long-term retention, Historical IoT data, Delta lake tables, AI/ML |
| REMOTE | Variable (Source Dependent) | Minimal | Low | Very High | Virtualized access to S/4HANA, BW/4HANA, SAP HANA, Snowflake, Databricks, Azure Fabric, data discovery and low-frequency access scenarios |
3. SAP Datasphere Architecture and Strategy for Data Tiering:
Data tiering in SAP Datasphere is built on top of SAP HANA Cloud and the Business Data Cloud architecture. At a high level, you combine local tables, spaces, storage options, and access patterns to implement hot, warm, and cold tiers without losing business context. Remote table will point to directly to source without copying for the remote tier data.
3.1 Key Building Blocks
- Tenant Sizing/Configuration: Based on the requirements select the storage and memory.
- Datasphere Spaces: Logical workspaces that isolate data, users, and data storage resources (memory, disk, file storage).You can dedicate spaces to domains (Finance, Supply Chain), regions, or workloads (AI lab vs. standard reporting).
- Data Builder (Data engineering & Modeling): Remote/local tables, views, data flows, replication flows, transformation flow, Analytic model etc. This is where you design physical/virtual placement and thus tiering.
- Connections & Data Federation Layer: Connects to SAP S/4HANA, BW/4HANA, ECC, SuccessFactors, Ariba, non‑SAP DBs, data lakes, and SaaS APIs. Foundation for virtual access (remote tables) and data replication the two main tiering levers.
3.2 Storage & Access Options Relevant for Tiering: Under the hood, Datasphere uses SAP HANA Cloud as a database.
- Local Tables: Physically replicate and store the data.
- Remote Tables / Virtualization: : No replication; queries are pushed down to the source (e.g., S/4HANA, BW, external DB, data lake)
- View Persistency: Persisted views/materializations in Datasphere to speed up complex logic. Often used as a performance buffer between raw inbound data and reporting, instead of keeping everything hot.
3.4. Data Tiering Options in SAP Datasphere
Remote Tier: Remote Tables that point to data in source systems and runs queries via federation without data copy.
Hot Tier: Local Tables in In-memory storage. Local tables physically stored in the SAP HANA Cloud database of your Datasphere tenant (column store).
Warm Tier: Local Tables in Disk storage. Local tables physically stored in the SAP HANA Native Storage Extension (NSE) is a built‑in warm data option in SAP HANA Cloud to keep data on disk instead of fully in memory. It combines disk‑based storage with the in‑memory to lower costs while keeping acceptable performance.
Cold Tier: Local Tables in File storage. Local tables(delta tables) physically stored in the object store (HANA Data Lake Files) of your Datasphere tenant for structured, semistructured, and unstructured data sets.
4. SAP Datasphere: Extract, Transform & Load (ETL) – Best Practices
5. Design Patterns & Typical Scenarios
5.1 Pattern 1 – Operational & Agentic Core with Remote, Hot, Warm, and Cold Tiers.
Intent: Support real-time operational decisions and AI agents with the latest business context, keep monthly and quarterly analytics on warm transactional history, and move multi-year data to cold lake storage for audit, compliance, and ML training.
Common scenarios:
- S/4HANA‑centric processes: Finance, Order‑to‑Cash, Procure‑to‑Pay, inventory, production.
- A relatively small set of promoted tables: Daily Revenue, Open Orders, Customer 360 Core, etc.
- Monthly, quarterly, and yearly reporting on transactional history.
- Operational dashboards, control towers, exception monitoring.
- Joule/agent scenarios that need real‑time, deterministic context.
- Audit, regulatory, and historical intelligence use cases.
Architecture view:
Remote tier (Remote tables):
- Real-time operational data is accessed directly from source applications.This tier is used when the business needs the latest committed state without waiting for replication.
- Best suited for: Critical real-operational reports, Control-tower dashboards, Time-sensitive AI agents
Warm + Hot, Data warehouse & database ( Local tables-Disk/In-memory )
- Warm (default): most replicated tables from S/4 and other apps land on HANA disk/NSE. Older partitions of the same transactional tables (12–36 months), used for monthly/quarterly reporting and trend analysis.
- Hot (explicit): only the most critical slices (e.g., current year postings, open documents, key master data, control tables) are switched to in‑memory for ultra‑fast queries.
Cold tier, Data Lake storage (Local tables-Files)
- Acts as the foundation of the data fabric and Lake storage holds multi-year historical facts, logs, IoT data, archived detail, and deep history that is rarely accessed but required for audit or ML training.
- Best suited for: Model/ML training and retraining, Rare deep-dive analytics and Audit and compliance
Design guidance
- Keep schema, semantics, and business definitions consistent across remote, hot, warm, and cold so that data products can span all tiers seamlessly.
- Use partitioning on the warm tier so that the active 12–36 month range behaves like “hot” for most analytical queries.
- Separate your models by purpose:
- Operational / real-time models should combine remote + hot + current warm.
- Union models: Remote+Hot+Warm+Cold.
- Historical / audit / ML models should extend into the cold tier only when needed.
- Promote only the smallest business-critical slice to hot; keep the broader history warm and the full archive cold.
Example: 1. SAP Datasphere Native Connection with SAP S/4HANA: Sales Order Data with below persisted data tiering
Hot Tier: Current year data (2026)
Warm Tier: Previous two completed years (2024,2025)
Cold Tier: All data older (2023 and earlier)
1. UNIONing Sales Order Hot+Warm+Cold Tier Data, Fact Graphical View
2. Data Analysis View Showing – Analytic Model, Fact Model, Multi-Tier Tables and Replication Flow objects.
3. Testing
Example: 2. SAP BDC Formation with SAP S/4HANA: Sales Order Data Product with below persisted data tiering
5.2 Pattern 2 – Cold‑First Datasphere Object Store, Selectively Promoted → Warm/Hot
(Everything lands in the lake, then is “pulled up” by importance)
Intent: Default to landingall data in the cold tier(cheap, scalable lake storage), thenpromote only critical datainto warm and hot tiers based onactualaccess patterns and performance needs.
Common scenarios
- Cloud‑scale data (IoT, clickstream, logs, partner feeds) combined with SAP business data.
- Organizations with strong cost pressure and many unknown future use cases.
- Heavy AI / ML experimentation and data science workloads.
Architecture view
Cold tier as landing zone (Local tables-Files)
- All raw and detailed data (transactions, events, logs, IoT, external feeds) lands first in the HDLF/cold lake.
- Acts as, long‑term system of record and single source for all higher tiers.
Warm tier (curated subsets in the lakehouse / warehouse)
- Frequently used, cleansed, and modelled subsets are loaded from the lake into warm local tables in SAP Datasphere.
- Characteristics:
- Structured, governed, optimized for analytical workloads
- Drives standard BI, tactical reporting, and many AI feature sets
- Only a fraction of lake data is promoted here at any given time.
Hot tier (very selective, performance‑critical slices)
- Small number of tables/partitions further promoted from warm to in‑memory hot for:
- Executive dashboards
- Time‑critical planning and simulations
- High‑throughput AI inference and agents
- Typically:
- Recent periods (e.g., current month/quarter)
- Narrow sets of attributes and key metrics
Remote tier (optional)
- Some data may remain in source systems and be accessed virtually instead of being landed in the lake (e.g., SAP SaaS applications).
- Treated logically as warm/cold depending on latency and SLA.
Design guidance
- Treat the cold lake as default; movement to warm/hot must be justified by:
- Access frequency
- Latency requirements
- Business criticality
- Automate usage‑driven promotion:
- Monitor query logs on lake and warm tables.
- Identify top N datasets to copy or materialize in warm.
- From warm, promote an even smaller top of top to hot.
- For AI:
- Training and experimentation pull broadly from the cold lake.
- Production agents and data products should rely primarily on warm/hot, with controlled fallbacks to cold for rare, deep questions.
Example: Sales Order Data with below persisted data tiering:
5.3 Pattern 3 – Warm‑First Datasphere Core, Selectively Promoted → Hot/Cold
(Default warm in Datasphere, tuned over time up or down)
Intent: Start by loading everything needed for analytics into the warm tier of SAP Datasphere (default disk storage), then rebalance over time: move critical subsets up to hot and aging/low‑value data down to cold.
Common scenarios
- Typical SAP Datasphere rollout where data replication lands in warm (Disk/HANA NSE) by default.
- Traditional BI / reporting, planning, and first wave of AI use cases.
- Customers that want simplicity first, then optimization.
Architecture view
Warm tier as default (Datasphere core Disk Storage)
- All relevant operational and external data needed for reporting and AI is replicated/ingested into warm local tables.
- Characteristics:
- Good general‑purpose performance
- Simple to model (everything is in one analytical environment)
- Most dashboards, planning models, and initial agent use cases
Hot tier (promoted, latency‑critical slices)
- After observing usage and SLAs, promote key tables and partitions to in‑memory hot for:
- Mission‑critical dashboards and High‑frequency queries
- AI agents needing sub‑second context
- Typical candidates:
- Current‑period transactional slices (e.g., last 30–90 days)
- High‑value master data (customers, products, locations)
- Control / security / data access lookup tables
Cold tier (demoted aging or low‑value data)
- As data ages or is rarely accessed, move older partitions from warm into lake storage (HDFS/object store).
- Characteristics:
- Read‑only or minimal updates.
- Year‑over‑year, long‑term trend analysis, audit and regulatory requirements
- Occasional ML retraining on full history
- The warehouse/lakehouse keeps a logical view that transparently spans warm + cold.
Remote tier (where appropriate): For some source systems, keep data remote and blend it with warm/hot only where necessary.
Design guidance
- Start by designing semantic models and data products against the warm tier; this keeps onboarding simple.
- Define lifecycle rules:
- Age‑based: e.g., after 12-24 months in warm, move to cold.
- Criticality‑based: e.g., if a dataset feeds a critical report/agent, promote its latest partitions to hot.
- For AI:
- Keep agent‑critical context (current state, key masters, control rules) in hot or the hottest partitions of warm.
- Use warm for general analytics, trend analysis, and feature engineering.
- Store long‑term histories in cold but link them semantically so agents and analysts can still reach them when necessary (e.g., via separate historical data products).
Example: Sales Order Data with below persisted data tiering:
6. Best practices and Recommendations for Data Tiering:
Remote tier:
- Use remote tables when you need real‑time S/4HANA, LoB Apps, or source data with no replication delay.
- Keep remote queries narrow and push filters and aggregates down to the source. Always filter by time, company code, plant, etc. and Push filters and aggregates down to the source to avoid large data transfers.
- Avoid overusing remote access for heavy self‑service; replicate to warm if queries become frequent or complex.
- Use remote + local table together: Keep always‑hot context (small master data, control tables) in local hot and Join remote operational facts with local hot masters for fast, enriched views.
- If a remote dataset becomes high‑volume or heavily used: Replicate it into warm and optionally promote the most critical slice to hot.
Hot tier, In‑memory (ultra‑fast, mission‑critical):
When to use ?
- Data powers mission‑critical dashboards, planning, or AI agents.
- You need sub‑second to low‑second response and high concurrency.
- Data changes frequently and is used continuously (daily / hourly).
What to put in hot ?
- Current period slices of key transactional tables (e.g., last 30–90 days).
- High‑value master data (Customer, Material, Plant, BP, cost center, etc.).
- Control and data access control tables used in joins and authorization logic.
- Persisted outputs of core semantic models and AI‑critical data products (e.g., Daily Revenue, Open Orders, Customer 360 Core).
Best practices:
- Keep hot small and focused: Frequently accessed, Feeding critical reports or Joule/AI agents, needed for very recent periods.
- Partition by time and business keys: Make only the most recent partitions hot; keep older ones warm/cold.
- Materialize complex logic: Persist heavy joins and KPI logic into hot tables (via View persistency/Transformation Flows) instead of recomputing in every query.
- Minimize width: Only store required columns; move verbose or rarely used attributes to warm/cold dimensions.
- Protect hot capacity: Use separate spaces or roles to keep ad‑hoc/lab workloads off production hot tables.
Warm tier (HANA disk / NSE):
When to use?
- Default tier for replicated SAP and non‑SAP data in Datasphere.
- You need good performance for 12–36 months of history, but not in‑memory speed.
- Data is still updatable and frequently used for reporting and planning.
What to put in warm?
- 1-2-3 years of transactional fact tables (FI, SD, MM, PP, CO, etc.) and less time‑critical master data and lookup tables.
- Curated, harmonised reporting models that are widely reused.
- Staging/curated tables used for feature engineering and AI training.
Best practices:
- Treat warm as your default landing zone for replication.
- SAP‑managed and customer‑managed data products can be replicated into local table Ideal for Monthly/quarterly reporting, Cross‑domain analytics or Standard AI feature sets and model training.
- Use time‑based partitioning so:
- Recent partitions (e.g., last 12–18 months) behave near‑hot.
- Older partitions are candidates to move to cold.
- Apply HANA Cloud warm‑store sizing rules and do not let warm grow uncontrolled.
- Use warm as a “parking lot”:
- Move data from hot → warm as it ages.
- Keep still occasionally needed data here before sending it to cold.
- Monitor usage:
- Promote warm tables/partitions to hot if they become latency‑critical.
- Demote rarely used warm data to cold to free capacity.
Cold tier (HDLF / object store):
When to use?
- Data is rarely accessed, mostly for audit, compliance, or deep historical analytics.
- You need very low storage cost and can tolerate high latency.
- Ideal for long‑term retention and cloud‑scale AI/ML training.
What to put in cold?
- Detailed line‑item history older than 3–5+ years.
- Full raw IoT, log, and event streams once aggregated metrics exist in warm/hot.
- Long‑term snapshots, backups, and regulatory archives.
- Large external datasets used mainly for ML experimentation.
Best practices:
- Sizing the SAP Datasphere Object Store.
- Design cold primarily for: Historical analytics, large‑slice ML training and forensic investigations.
- Avoid designs where agents or business users need frequent single‑record lookups in cold.
- Expose aggregated summaries (yearly/quarterly totals, high‑level KPIs) into warm/hot so everyday analytics don’t touch cold.
- Implement automated lifecycle: Hot → Warm → Cold rules based on age and usage (e.g., after 18 months move to warm, after 3–5+ years to cold).
- Customer‑managed data products: Built by customers in SAP Datasphere using Object store mainly from SAP sources (S/4HANA, ECC), non‑SAP data, or other data products. Replicated in Object Store→Data Product→shared via SQL on Files/Delta Sharing. Note: SAP-Managed Data Products are stored in the SAP BDC Foundation Services (FOS) layer, which is separate from the SAP Datasphere Object Store, although both use HDLF technology.
Cross‑tier recommendations (tying it all together):
- Start from business SLAs and AI needs, not storage and Define clear policies:
- Remote/Hot = real‑time and daily/weekly decisions (including agents).
- Warm = regular analytics and planning.
- Cold = retention, ML training, rare investigations.
- Promote and demote based on real usage: Instrument and review query stats and Move only proven hotspots to hot, demote unused data to cold.
- Use partitioning + lifecycle policies + automation instead of ad‑hoc moves to move data hot → warm → cold based on age and actual usage or vice-versa.
Conclusion:
Like we’ve seen throughout this article, data tiering is really about being intentional: put the right data in the right place for the right purpose. Hot and remote tiers serve real‑time decisions and AI agents, warm tiers power day‑to‑day reporting and planning, and cold tiers keep history available for audit and ML without wasting budget. SAP Datasphere and Business Data Cloud give you the building blocks, local in‑memory/disk tables, object storage, and remote access, to implement these tiers consistently. Once you design clear policies and align them with your business data, tables, data products and AI use cases, data tiering stops being a technical afterthought and becomes a core part of your business, analytics, planning and AI strategy.
References:
Introduction:Data tiering is becoming a critical pillar of every organization’s data and analytics strategy. As companies move toward AI-driven decision-making and deploy AI agents, MCP servers, and other autonomous capabilities, the value of context-rich, trusted data products increases dramatically. Simply put: high‑quality data is the fuel that powers generative, agentic, and autonomous applications across the business.A modern data tiering (or data management) strategy is about organizing and storing data in a way that optimizes both cost and performance. Data is being generated at unprecedented speed and volume, from business transactions and master data to IoT signals, application logs, surveys, social media, and streaming sources. Not all of this data is equally valuable at all times, and not all of it needs the same level of performance or freshness. The key is to balance the cost of storing data with the cost (and business value) of accessing it.Different consumption patterns drive different requirements:Real-time operational analytics (daily, weekly)Tactical reporting (monthly, quarterly)Strategic analysis (multi-year historical trends)Advanced AI and machine learning use casesFor each of these, it is essential to understand what data tiering is, which options exist, and in which scenarios to apply which tier. In this technical article, we focus on data tiering in the context of SAP’s Business Data Cloud, with a particular emphasis on SAP Datasphere. We align the discussion with SAP’s latest data and analytics strategy as presented at Sapphire 2026, and provide practical guidance and best practices for designing an effective tiering strategy that supports both traditional analytics and next-generation AI workloads.1. Business & AI Context:AI is no longer an experimental add‑on to the SAP landscape; it is becoming the execution layer of the enterprise. With SAP’s vision of the Autonomous Enterprise, AI agents, assistants, and Joule-powered workflows are expected to automate large parts of finance, supply chain, HR, and customer operations. All of that depends on one thing: trusted, well‑managed data with rich business context.1.1 Why Data Tiering Matters in an AI-First World?Modern AI in SAP is context-hungry. It does not just need a lot of data; it needs: Who the customer is across CRM, S/4HANA, and e‑commerce, Which material or location a transaction relates to, What policies or controls apply in a given process or region, How a process (order-to-cash, procure-to-pay, hire-to-retire) is supposed to run. This business context is what SAP brings to AI, and it is exactly what SAP Business Data Cloud and SAP Datasphere are designed to provide. However, the data feeding AI agents, planning models, and analytics is exploding in volume and variety:Core ERP transactions and master dataIoT, machine telemetry, and sensor streams Application logs and clickstreamsSurvey, social, and interaction dataExternal market and partner dataNot all of this data needs to sit on the most expensive, highest‑performance storage. At the same time, some scenarios, like Joule agents making operational decisions, cannot tolerate high latency or missing data. Data tiering is how you manage this trade‑off between cost and performance without compromising AI quality and trust.1.2 From Reporting to Autonomous DecisionsHistorically, data tiering conversations in SAP environments focused on:Reducing database size (especially in ERP and BW)Managing performance for reporting and planningMeeting retention and compliance requirements at lower costIn the Autonomous Enterprise, the stakes are higher. Tiering decisions now influence:Real-time operational analytics: e.g., S/4HANA operational KPIs, supply chain control towers, same‑day profitabilityTactical management reporting: month‑end/quarter‑end analysis, scenario planning, rolling forecastsStrategic and historical analysis: multi‑year trends, structural cost analysis, sustainability reportingAI and agentic use cases: Joule agents executing workflows, AI copilots assisting users in SAP Fiori and line-of-business apps,RAG‑style (retrieval‑augmented generation) scenarios pulling context into LLMs, ML models trained on long‑range historical, IoT, or behavioral dataEach of these has a different tolerance for latency, freshness, data volume, and cost. A well‑designed tiering strategy ensures that:Hot data is available at in‑memory speed for agents and operational analyticsWarm data supports mid‑term analysis and planning without over‑provisioning HANACold data and archives are kept cost‑efficiently but still discoverable, governable, and usable for ML and compliance2. Data Tiering Concepts & Terminology:Before going into SAP Datasphere specifics, it helps to establish a clear, neutral vocabulary for data tiering that you can reuse with architects, business stakeholders, and AI teams.2.1 What Is Data Tiering? Data tiering is a data management strategy where you separate data into different temperature tiers and storage technologies and core dimensions classifying it:Access frequency: How often is the data read? Hourly, daily, monthly, rarely?Update requirement: Does the data need to be updated after initial load? Is it transactional (insert/update/delete) or more append‑only?Performance & latency: How fast do queries need to return? Does an AI agent or operational dashboard depend on sub‑second responses?Data Freshness: Does the data need to be real‑time, near real‑time, or is daily/weekly refresh enough?Retention period: How many years must the data be kept for legal, audit, or business reasons?Data volume: How large is the dataset today, and how fast is it growing?Security & compliance: Are there specific regulations (GDPR, industry rules) that affect where data can be stored and who can access it?Business criticality (MOST IMPORTANT): If this data is slow or unavailable, what is the impact on the business and on AI‑driven processes?In SAP terms, data tiering is part of multi‑temperature data management: you keep the most valuable, time‑critical data on your fastest (and most expensive) infrastructure, while moving less critical data to cheaper, slower layers without losing access to it.2.2 The Classical 3 Tiers: Hot, Warm, Cold Most SAP data tiering discussions (HANA , BW/4HANA, Datasphere) use the same three core concepts, with slightly different technical realizations. Remote Data Tier is an data access pattern.2.3 Data Tiering vs. Related Concepts in SAP: It is useful to separate data tiering from other, sometimes overlapping, concepts:Data aging (e..g SAP S/4HANA): Framework that moves older data from hot in‑memory storage to warm disk‑based storage within the same HANA system. It reduces memory footprint while keeping data accessible. This is a mechanism that supports a tiering strategy at the application level.Data Tiering Optimization (DTO) in BW/4HANA: Classifies data in an ADSO as hot, warm, or cold, and maps each temperature to a specific storage technology (e.g., HANA in‑memory, HANA NSE/extension nodes, SAP IQ). DTO is a tool that applies tiering policies in BW/4HANA.Embedded Data Lake / data lake integration: Storing large volumes of structured/semi‑structured/unstructured data on low‑cost cloud object storage (e.g., embedded data lake in SAP Datasphere, or external data lakes). This typically supports warm/cold tiers for large volumes.Virtualization / federation: Data remains in the source system (S/4HANA, Ariba, ECC, BW, non‑SAP DB, data lake) and is accessed on demand via remote tables or views. Virtualization is not a tier by itself, but an access pattern that can implement hot, warm, or cold tiers without physical replication.2.4 Analytical & AI Consumption Patterns: Tiering decisions should start from the intended consumption patterns, not from the storage technology. Key patterns include:Real‑time operational analytics: Examples: same‑day order‑to‑cash KPIs, supply chain control tower, production monitoring.Requirements: hot data, very low latency, typically in‑memory HANA, minimal joins across slow systems.Tactical reporting & planning: Examples: monthly financial close, rolling forecasts, quarterly performance reviews. Requirements: mix of hot and warm data, acceptable latency in seconds; may combine data from replicated and virtual sources.Strategic, historical, and regulatory analysis: Examples: multi‑year profitability, sustainability KPIs, root‑cause analysis over several years. Requirements: heavy use of warm and cold tiers; performance is important but not sub‑second; cost and retention dominate.AI & ML training: Examples: training predictive models, building recommendation systems, anomaly detection, LLM fine‑tuning. Requirements: large volumes of warm/cold data; batch access; cost‑efficient storage more important than interactive speed.AI & agent inference / context retrieval (RAG): Examples: Joule or custom agents answering questions or taking actions using business data context. Requirements: hot data products with low latency and rich semantics; limited but well‑curated warm data for extended context; some cold data may be pre‑indexed or summarized.Each pattern maps to a different temperature mix which later translates into concrete options in SAP Datasphere (local tables (In-memory/Disk), remote tables, object storage (Delta-tables), etc.).2.5 Temperature as a Policy, Not Just a Location: A subtle but important point: data temperature is a policy, not just a physical storage location.For example:A table partition might be “hot” for the current year and “warm” for previous years, even though the table itself still lives in SAP Datasphere.An object in an embedded data lake (Object store) might be “warm” when actively used for ML training, then become “cold” once the model is trained and only needs occasional retraining.Data can move hot → warm → cold as it ages (time‑based policies) or cold → warm → hot when certain events or AI scenarios increase its business relevance (usage‑based policies).This policy‑driven view becomes especially important in SAP Datasphere and Business Data Cloud, where you design data products and spaces that must support changing business and AI requirements over time.2.6 Data tiering characteristics: Data TierPerformanceCostData VolumeData UpdatesTypical Business ScenariosHOTBest / Ultra-FastHighLow-MediumHighOperational reporting, AI Agents, Joule context, SAC dashboards, Open Orders, Current Inventory, Planning, Master DataWARMMediumMediumMedium-HighHigh-MediumHistorical operational analytics (1–5 years), Supply Chain analytics, Procurement spend analysis, Customer analytics, Finance trending, AI/MLCOLDLowLowMassiveLow/NoneAudit, Compliance, Archived ERP/BW history, Long-term retention, Historical IoT data, Delta lake tables, AI/MLREMOTEVariable (Source Dependent)MinimalLowVery HighVirtualized access to S/4HANA, BW/4HANA, SAP HANA, Snowflake, Databricks, Azure Fabric, data discovery and low-frequency access scenarios 3. SAP Datasphere Architecture and Strategy for Data Tiering: Data tiering in SAP Datasphere is built on top of SAP HANA Cloud and the Business Data Cloud architecture. At a high level, you combine local tables, spaces, storage options, and access patterns to implement hot, warm, and cold tiers without losing business context. Remote table will point to directly to source without copying for the remote tier data.3.1 Key Building BlocksTenant Sizing/Configuration: Based on the requirements select the storage and memory.Datasphere Spaces: Logical workspaces that isolate data, users, and data storage resources (memory, disk, file storage).You can dedicate spaces to domains (Finance, Supply Chain), regions, or workloads (AI lab vs. standard reporting).Data Builder (Data engineering & Modeling): Remote/local tables, views, data flows, replication flows, transformation flow, Analytic model etc. This is where you design physical/virtual placement and thus tiering.Connections & Data Federation Layer: Connects to SAP S/4HANA, BW/4HANA, ECC, SuccessFactors, Ariba, non‑SAP DBs, data lakes, and SaaS APIs. Foundation for virtual access (remote tables) and data replication the two main tiering levers.3.2 Storage & Access Options Relevant for Tiering: Under the hood, Datasphere uses SAP HANA Cloud as a database.Local Tables: Physically replicate and store the data.Remote Tables / Virtualization: : No replication; queries are pushed down to the source (e.g., S/4HANA, BW, external DB, data lake)View Persistency: Persisted views/materializations in Datasphere to speed up complex logic. Often used as a performance buffer between raw inbound data and reporting, instead of keeping everything hot.3.4. Data Tiering Options in SAP DatasphereRemote Tier: Remote Tables that point to data in source systems and runs queries via federation without data copy.Hot Tier: Local Tables in In-memory storage. Local tables physically stored in the SAP HANA Cloud database of your Datasphere tenant (column store).Warm Tier: Local Tables in Disk storage. Local tables physically stored in the SAP HANA Native Storage Extension (NSE) is a built‑in warm data option in SAP HANA Cloud to keep data on disk instead of fully in memory. It combines disk‑based storage with the in‑memory to lower costs while keeping acceptable performance.Cold Tier: Local Tables in File storage. Local tables(delta tables) physically stored in the object store (HANA Data Lake Files) of your Datasphere tenant for structured, semistructured, and unstructured data sets.4. SAP Datasphere: Extract, Transform & Load (ETL) – Best Practices5. Design Patterns & Typical Scenarios5.1 Pattern 1 – Operational & Agentic Core with Remote, Hot, Warm, and Cold Tiers.Intent: Support real-time operational decisions and AI agents with the latest business context, keep monthly and quarterly analytics on warm transactional history, and move multi-year data to cold lake storage for audit, compliance, and ML training.Common scenarios:S/4HANA‑centric processes: Finance, Order‑to‑Cash, Procure‑to‑Pay, inventory, production.A relatively small set of promoted tables: Daily Revenue, Open Orders, Customer 360 Core, etc.Monthly, quarterly, and yearly reporting on transactional history.Operational dashboards, control towers, exception monitoring.Joule/agent scenarios that need real‑time, deterministic context.Audit, regulatory, and historical intelligence use cases.Architecture view:Remote tier (Remote tables):Real-time operational data is accessed directly from source applications.This tier is used when the business needs the latest committed state without waiting for replication.Best suited for: Critical real-operational reports, Control-tower dashboards, Time-sensitive AI agentsWarm + Hot, Data warehouse & database ( Local tables-Disk/In-memory )Warm (default): most replicated tables from S/4 and other apps land on HANA disk/NSE. Older partitions of the same transactional tables (12–36 months), used for monthly/quarterly reporting and trend analysis.Hot (explicit): only the most critical slices (e.g., current year postings, open documents, key master data, control tables) are switched to in‑memory for ultra‑fast queries.Cold tier, Data Lake storage (Local tables-Files)Acts as the foundation of the data fabric and Lake storage holds multi-year historical facts, logs, IoT data, archived detail, and deep history that is rarely accessed but required for audit or ML training.Best suited for: Model/ML training and retraining, Rare deep-dive analytics and Audit and complianceDesign guidanceKeep schema, semantics, and business definitions consistent across remote, hot, warm, and cold so that data products can span all tiers seamlessly.Use partitioning on the warm tier so that the active 12–36 month range behaves like “hot” for most analytical queries.Separate your models by purpose:Operational / real-time models should combine remote + hot + current warm.Union models: Remote+Hot+Warm+Cold.Historical / audit / ML models should extend into the cold tier only when needed.Promote only the smallest business-critical slice to hot; keep the broader history warm and the full archive cold.Example: 1. SAP Datasphere Native Connection with SAP S/4HANA: Sales Order Data with below persisted data tiering Hot Tier: Current year data (2026)Warm Tier: Previous two completed years (2024,2025)Cold Tier: All data older (2023 and earlier)1. UNIONing Sales Order Hot+Warm+Cold Tier Data, Fact Graphical View2. Data Analysis View Showing – Analytic Model, Fact Model, Multi-Tier Tables and Replication Flow objects. 3. TestingExample: 2. SAP BDC Formation with SAP S/4HANA: Sales Order Data Product with below persisted data tiering 5.2 Pattern 2 – Cold‑First Datasphere Object Store, Selectively Promoted → Warm/Hot(Everything lands in the lake, then is “pulled up” by importance)Intent: Default to landingall data in the cold tier(cheap, scalable lake storage), thenpromote only critical datainto warm and hot tiers based onactualaccess patterns and performance needs.Common scenariosCloud‑scale data (IoT, clickstream, logs, partner feeds) combined with SAP business data.Organizations with strong cost pressure and many unknown future use cases.Heavy AI / ML experimentation and data science workloads.Architecture viewCold tier as landing zone (Local tables-Files)All raw and detailed data (transactions, events, logs, IoT, external feeds) lands first in the HDLF/cold lake.Acts as, long‑term system of record and single source for all higher tiers.Warm tier (curated subsets in the lakehouse / warehouse)Frequently used, cleansed, and modelled subsets are loaded from the lake into warm local tables in SAP Datasphere.Characteristics:Structured, governed, optimized for analytical workloadsDrives standard BI, tactical reporting, and many AI feature setsOnly a fraction of lake data is promoted here at any given time.Hot tier (very selective, performance‑critical slices)Small number of tables/partitions further promoted from warm to in‑memory hot for:Executive dashboardsTime‑critical planning and simulationsHigh‑throughput AI inference and agentsTypically:Recent periods (e.g., current month/quarter)Narrow sets of attributes and key metricsRemote tier (optional)Some data may remain in source systems and be accessed virtually instead of being landed in the lake (e.g., SAP SaaS applications).Treated logically as warm/cold depending on latency and SLA.Design guidanceTreat the cold lake as default; movement to warm/hot must be justified by:Access frequencyLatency requirementsBusiness criticalityAutomate usage‑driven promotion:Monitor query logs on lake and warm tables.Identify top N datasets to copy or materialize in warm.From warm, promote an even smaller top of top to hot.For AI:Training and experimentation pull broadly from the cold lake.Production agents and data products should rely primarily on warm/hot, with controlled fallbacks to cold for rare, deep questions.Example: Sales Order Data with below persisted data tiering:5.3 Pattern 3 – Warm‑First Datasphere Core, Selectively Promoted → Hot/Cold(Default warm in Datasphere, tuned over time up or down)Intent: Start by loading everything needed for analytics into the warm tier of SAP Datasphere (default disk storage), then rebalance over time: move critical subsets up to hot and aging/low‑value data down to cold.Common scenariosTypical SAP Datasphere rollout where data replication lands in warm (Disk/HANA NSE) by default.Traditional BI / reporting, planning, and first wave of AI use cases.Customers that want simplicity first, then optimization.Architecture viewWarm tier as default (Datasphere core Disk Storage)All relevant operational and external data needed for reporting and AI is replicated/ingested into warm local tables.Characteristics:Good general‑purpose performanceSimple to model (everything is in one analytical environment)Most dashboards, planning models, and initial agent use casesHot tier (promoted, latency‑critical slices)After observing usage and SLAs, promote key tables and partitions to in‑memory hot for:Mission‑critical dashboards and High‑frequency queriesAI agents needing sub‑second contextTypical candidates:Current‑period transactional slices (e.g., last 30–90 days)High‑value master data (customers, products, locations)Control / security / data access lookup tablesCold tier (demoted aging or low‑value data)As data ages or is rarely accessed, move older partitions from warm into lake storage (HDFS/object store).Characteristics:Read‑only or minimal updates.Year‑over‑year, long‑term trend analysis, audit and regulatory requirementsOccasional ML retraining on full historyThe warehouse/lakehouse keeps a logical view that transparently spans warm + cold.Remote tier (where appropriate): For some source systems, keep data remote and blend it with warm/hot only where necessary.Design guidanceStart by designing semantic models and data products against the warm tier; this keeps onboarding simple.Define lifecycle rules:Age‑based: e.g., after 12-24 months in warm, move to cold.Criticality‑based: e.g., if a dataset feeds a critical report/agent, promote its latest partitions to hot.For AI:Keep agent‑critical context (current state, key masters, control rules) in hot or the hottest partitions of warm.Use warm for general analytics, trend analysis, and feature engineering.Store long‑term histories in cold but link them semantically so agents and analysts can still reach them when necessary (e.g., via separate historical data products).Example: Sales Order Data with below persisted data tiering:6. Best practices and Recommendations for Data Tiering: Remote tier:Use remote tables when you need real‑time S/4HANA, LoB Apps, or source data with no replication delay.Keep remote queries narrow and push filters and aggregates down to the source. Always filter by time, company code, plant, etc. and Push filters and aggregates down to the source to avoid large data transfers.Avoid overusing remote access for heavy self‑service; replicate to warm if queries become frequent or complex.Use remote + local table together: Keep always‑hot context (small master data, control tables) in local hot and Join remote operational facts with local hot masters for fast, enriched views.If a remote dataset becomes high‑volume or heavily used: Replicate it into warm and optionally promote the most critical slice to hot.Hot tier, In‑memory (ultra‑fast, mission‑critical):When to use ?Data powers mission‑critical dashboards, planning, or AI agents.You need sub‑second to low‑second response and high concurrency.Data changes frequently and is used continuously (daily / hourly).What to put in hot ?Current period slices of key transactional tables (e.g., last 30–90 days).High‑value master data (Customer, Material, Plant, BP, cost center, etc.).Control and data access control tables used in joins and authorization logic.Persisted outputs of core semantic models and AI‑critical data products (e.g., Daily Revenue, Open Orders, Customer 360 Core).Best practices:Keep hot small and focused: Frequently accessed, Feeding critical reports or Joule/AI agents, needed for very recent periods.Partition by time and business keys: Make only the most recent partitions hot; keep older ones warm/cold.Materialize complex logic: Persist heavy joins and KPI logic into hot tables (via View persistency/Transformation Flows) instead of recomputing in every query.Minimize width: Only store required columns; move verbose or rarely used attributes to warm/cold dimensions.Protect hot capacity: Use separate spaces or roles to keep ad‑hoc/lab workloads off production hot tables.Warm tier (HANA disk / NSE):When to use? Default tier for replicated SAP and non‑SAP data in Datasphere.You need good performance for 12–36 months of history, but not in‑memory speed.Data is still updatable and frequently used for reporting and planning.What to put in warm?1-2-3 years of transactional fact tables (FI, SD, MM, PP, CO, etc.) and less time‑critical master data and lookup tables.Curated, harmonised reporting models that are widely reused.Staging/curated tables used for feature engineering and AI training.Best practices:Treat warm as your default landing zone for replication.SAP‑managed and customer‑managed data products can be replicated into local table Ideal for Monthly/quarterly reporting, Cross‑domain analytics or Standard AI feature sets and model training.Use time‑based partitioning so:Recent partitions (e.g., last 12–18 months) behave near‑hot.Older partitions are candidates to move to cold.Apply HANA Cloud warm‑store sizing rules and do not let warm grow uncontrolled.Use warm as a “parking lot”:Move data from hot → warm as it ages.Keep still occasionally needed data here before sending it to cold.Monitor usage:Promote warm tables/partitions to hot if they become latency‑critical.Demote rarely used warm data to cold to free capacity.Cold tier (HDLF / object store):When to use? Data is rarely accessed, mostly for audit, compliance, or deep historical analytics.You need very low storage cost and can tolerate high latency.Ideal for long‑term retention and cloud‑scale AI/ML training.What to put in cold?Detailed line‑item history older than 3–5+ years.Full raw IoT, log, and event streams once aggregated metrics exist in warm/hot.Long‑term snapshots, backups, and regulatory archives.Large external datasets used mainly for ML experimentation.Best practices:Sizing the SAP Datasphere Object Store.Design cold primarily for: Historical analytics, large‑slice ML training and forensic investigations.Avoid designs where agents or business users need frequent single‑record lookups in cold.Expose aggregated summaries (yearly/quarterly totals, high‑level KPIs) into warm/hot so everyday analytics don’t touch cold.Implement automated lifecycle: Hot → Warm → Cold rules based on age and usage (e.g., after 18 months move to warm, after 3–5+ years to cold).Customer‑managed data products: Built by customers in SAP Datasphere using Object store mainly from SAP sources (S/4HANA, ECC), non‑SAP data, or other data products. Replicated in Object Store→Data Product→shared via SQL on Files/Delta Sharing. Note: SAP-Managed Data Products are stored in the SAP BDC Foundation Services (FOS) layer, which is separate from the SAP Datasphere Object Store, although both use HDLF technology.Cross‑tier recommendations (tying it all together):Start from business SLAs and AI needs, not storage and Define clear policies:Remote/Hot = real‑time and daily/weekly decisions (including agents).Warm = regular analytics and planning.Cold = retention, ML training, rare investigations.Promote and demote based on real usage: Instrument and review query stats and Move only proven hotspots to hot, demote unused data to cold.Use partitioning + lifecycle policies + automation instead of ad‑hoc moves to move data hot → warm → cold based on age and actual usage or vice-versa.Conclusion:Like we’ve seen throughout this article, data tiering is really about being intentional: put the right data in the right place for the right purpose. Hot and remote tiers serve real‑time decisions and AI agents, warm tiers power day‑to‑day reporting and planning, and cold tiers keep history available for audit and ML without wasting budget. SAP Datasphere and Business Data Cloud give you the building blocks, local in‑memory/disk tables, object storage, and remote access, to implement these tiers consistently. Once you design clear policies and align them with your business data, tables, data products and AI use cases, data tiering stops being a technical afterthought and becomes a core part of your business, analytics, planning and AI strategy.References:https://help.sap.com/docs/SAP_DATASPHEREhttps://help.sap.com/docs/hana-cloud-database Read More Technology Blog Posts by SAP articles
#SAPCHANNEL