失眠网 > ssas 分区设置_如何在Analysis Services多维中对SSAS多维数据集进行分区

ssas 分区设置_如何在Analysis Services多维中对SSAS多维数据集进行分区

时间：2021-11-08 05:00:23

ssas 分区设置

Partitioning is the concept where you divide your data from one logical unit into separate physical chunks. This can have several advantages, such as improved performance or easier maintenance. You can for example partition a table in a SQL Server database, but you can also partition your measure groups inside an Analysis Services (SSAS) Multidimensional cube. In this article, we’ll discuss how you can set-up partitioning. For more information about the expected benefits, take a look at Benefits of Partitioning your SSAS Multidimensional Cube.

分区是将数据从一个逻辑单元划分为单独的物理块的概念。这可以具有多个优点，例如，性能提高或维护更容易。例如，您可以在SQL Server数据库中对表进行分区，但也可以在Analysis Services（SSAS）多维多维数据集内对度量值组进行分区。在本文中，我们将讨论如何设置分区。有关预期收益的更多信息，请参阅对SSAS多维数据集进行分区的收益。

Note that you can also partition tables in Analysis Services Tabular. Although similar in concept on a high-level, we’ll keep the focus of this article on Multidimensional only.

请注意，您还可以在Analysis Services表格中对表进行分区。尽管从概念上讲在高层上相似，但我们将把重点仅放在多维上。

测试设置 (Test set-up)

In this article we’ll use the free sample database Wide World Importers Data Warehouse, which you can find on Github. I’ve imported three tables into the Data Source View: Sales (Fact table), Date and City (two dimensions).

在本文中，我们将使用免费的示例数据库Wide World Importers数据仓库，您可以在Github上找到它。我已将三个表导入到“数据源”视图中：销售（事实表），日期和城市（两个维度）。

The Date dimension has the following structure:

日期维度具有以下结构：

The following attribute relationships are defined:

定义了以下属性关系：

Since we don’t expect the calendar to change in format any time soon, all the relationships are defined asRigid.

由于我们不希望日历很快改变格式，因此所有关系都定义为Rigid。

The City dimension has the following structure:

城市维度具有以下结构：

There are also attribute relationships defined and because continents usually don’t change that much they are defined as Rigid as well.

还定义了属性关系，并且由于各大洲通常变化不大，它们也被定义为“刚性”。

The cube itself is simple: one measure group with all measures from the Sales fact table (except the Tax Rate measure), linked with our two dimensions.

多维数据集本身很简单：一个包含来自销售事实表的所有度量（税率度量除外）的度量组，并与我们的两个维度关联。

One important detail: the foreign keyDelivery Date Keyfrom the Sales table to the Date dimension might contain null values, so we set theNull Processingproperty toUnknownMember. If you don’t set this property, the cube will fail during processing.

一个重要的细节：从Sales表到Date维度的外键Delivery Date键可能包含空值，因此我们将Null Processing属性设置为UnknownMember。如果不设置此属性，则多维数据集在处理期间将失败。

In theory, the cube is finished. You can deploy it to a server, process it and analyze the data with your favorite front-end tool.

理论上，立方体是成品。您可以将其部署到服务器上，使用您喜欢的前端工具对其进行处理并分析数据。

创建分区 (Creating Partitions)

Typically, partitions on a measure group are based on a time column. This makes sense, as it is a natural way of partitioning data: every day new data comes in and all time boundaries are clearly defined. Partitioning on time also allows you to partition on different levels: old data can for example be put in yearly partitions, while new data can be partitioned at the month level or even at the day level. Most examples you can find online are based on time partitioning. In this article, we’re going to look at a different angle: we are going to partition the measure group onSales Territories. There are two possible reasons for making this choice:

通常，度量值组上的分区基于时间列。这是有道理的，因为这是对数据进行分区的自然方法：每天都会有新数据进入，并且明确定义了所有时间范围。按时间分区还使您可以按不同级别进行分区：例如，可以将旧数据放入年度分区中，而可以将新数据按月级别甚至按日级别进行分区。您可以在线找到的大多数示例都是基于时间划分的。在本文中，我们将以另一个角度来看：我们将在Sales Territories上对度量值组进行分区。做出此选择可能有两个原因：

Sales Territories are relatively fixed. Normally, they aren’t any new ones created daily. This means that you can define your partitions once and they don’t need much maintenance afterwards. With time partitions, you need constant maintenance to create new partitions or to merge older partitions together. Typically, you’d solve this by automating your partition management. 销售地区相对固定。通常，它们不是每天创建的任何新文件。这意味着您只需定义一次分区，之后便不需要太多维护。对于时间分区，您需要不断维护才能创建新分区或将较旧的分区合并在一起。通常，您可以通过自动化分区管理来解决此问题。 Later, we might want to add security on top of our Sales Territories. In other words, some users might only see data for a specific sales territory. Partitioning on those territories might give additional performance benefits. 稍后，我们可能想在我们的销售地区之上添加安全性。换句话说，某些用户可能只会看到特定销售地区的数据。在这些区域上分区可能会带来其他性能优势。

With the following query, we can identity our potential partitions and their size:

通过以下查询，我们可以确定潜在分区及其大小：

To start creating partitions, we need to go to thePartitionstab in the cube editor.

要开始创建分区，我们需要转到多维数据集编辑器中的“分区”选项卡。

There is a default partition that will contain all the data for the measure group. I already created an aggregation design and assigned it to the partition.

有一个默认分区，其中将包含度量值组的所有数据。我已经创建了一个聚合设计并将其分配给分区。

When you click on the Source, you can configure where the partition fetches its data from. The default isTable Binding.

单击“源”时，可以配置分区从何处获取其数据。默认值为表绑定。

With Table Binding, you select all the data from a certain table. The other option is to useQuery Binding, where you can specify the query yourself that fetches the data.

使用表绑定，您可以从某个表中选择所有数据。另一种选择是使用Query Binding，您可以在其中自行指定用于获取数据的查询。

You either have the choice to partition the data at the source with Table Binding – by using different views on top of a table for example – or by specifying different queries for each partition. With the first option, you might clutter your data source view with a lot of additional tables. Query binding is the easiest option and lends itself nicely for automatic partition creating.

您可以选择使用表绑定在源上对数据进行分区（例如，在表顶部使用不同的视图），也可以为每个分区指定不同的查询。使用第一个选项，您可能会因大量其他表而使数据源视图混乱。查询绑定是最简单的选项，非常适合自动分区创建。

Let’s specify the query for our first partition: The Southeast sales territory.

让我们为第一个分区指定查询：东南销售地区。

Don’t end your T-SQL statement with a semicolon (although this is the recommendation everywhere else), because this will cause the cube to fail during partition processing, even though the syntax check is successful.

不要用分号结束T-SQL语句（尽管这是其他地方的建议），因为即使语法检查成功，这也会导致多维数据集在分区处理期间失败。

Since we modified the default partition, the partition ID will always be the name of the measure group (Sales), even if we rename the partition to “Sales – Southeast”.

由于我们修改了默认分区，因此即使将分区重命名为“ Sales – Southeast”，分区ID仍将始终是度量值组（Sales）的名称。

If you want to avoid this, you can create a new partition first and then drop the default partition.

如果要避免这种情况，可以先创建一个新分区，然后删除默认分区。

Let’s create a new partition for the Mideast territory.

让我们为中东地区创建一个新分区。

In the wizard, you’ll first need to select a source table.

在向导中，首先需要选择一个源表。

In the next step, we choose to specify a query and enter a modified version of the earlier query:

在下一步中，我们选择指定一个查询并输入先前查询的修改版本：

After that, we can choose the storage location of our new partition and we can even configure the partition to be processed at another location. We’re going to leave everything at the default settings. If you want though, you could put partitions at different disks to optimize throughput.

之后，我们可以选择新分区的存储位置，甚至可以将分区配置为在另一个位置进行处理。我们将保留所有默认设置。如果需要，可以将分区放在不同的磁盘上以优化吞吐量。

At the final screen, we can configure aggregations – which we’re going to copy from our other partition – and if the partition needs to be deployed and processed right away.

在最后一个屏幕上，我们可以配置聚合-我们将从其他分区复制该聚合-以及是否需要立即部署和处理该分区。

Our partition is ready. You can also change theStorage Modefor a partition. A detailed explanation is out of scope for this article, but these are the high-level concepts:

我们的分区已准备就绪。您也可以更改分区的存储模式。详细的解释超出了本文的范围，但是这些是高级概念：

MOLAP. The data is read into the model, processed and stored on disk. If there are any aggregations, they are calculated and stored on disk as well. MOLAP。数据被读入模型，进行处理并存储在磁盘上。如果存在任何聚合，则它们也会被计算并存储在磁盘上。 ROLAP. The data stays at the source and the model only functions as a metadata layer between a front-end tool and the data source. This storage mode allows for more real-time analysis but might be slower. ROLAP。数据保留在源中，并且模型仅充当前端工具和数据源之间的元数据层。此存储模式可进行更多实时分析，但速度可能较慢。 HOLAP. A hybrid combination between the two above. HOLAP。两者之间的混合组合。

Similar partitions were created for the other sales territories.

为其他销售地区创建了类似的分区。

The last three territories – Rocky Mountain, New England and External – and the dummy value N/A are combined into one single partition:Sales – Other Regions. Those territories don’t have enough data to justify a partition on their own, so we combined them into one bigger partition.

最后三个地区-落基山，新英格兰和外部-以及虚拟值N / A合并为一个分区：销售-其他地区。这些地区没有足够的数据来证明一个分区的合理性，因此我们将它们合并为一个更大的分区。

In fact, SSAS warns us that our partitions are too small: SSAS advices to not partition a measure group with less than 2 million rows. However, we are just working with sample data so we can safely ignore this warning.

实际上，SSAS警告我们分区太小：SSAS建议不要对少于200万行的度量值组进行分区。但是，我们仅使用示例数据，因此我们可以放心地忽略此警告。

All that is left is to deploy and process the cube.

剩下的就是部署和处理多维数据集。

The article continues with the advantages of partitioning in Benefits of Partitioning your SSAS Multidimensional Cube.

本文在“对SSAS多维多维数据集进行分区的好处”中继续介绍了分区的优点。

结论 (Conclusion)

Creating partitions in Analysis Services isn’t a difficult task. In this article, we showed you how you can manually create and configure partitions for a measure group. Your partition strategy defines if you need to set-up partitioning only once – as in the article – or that you continually must create new partitions (for example when you partition on time). If partition maintenance is an ongoing task, you might want to consider to automate it.

在Analysis Services中创建分区不是一件容易的事。在本文中，我们向您展示了如何为度量值组手动创建和配置分区。您的分区策略定义了是否只需要设置一次分区（如本文所述），或者是否必须连续创建新分区（例如，按时分区）。如果分区维护是一项正在进行的任务，则可能需要考虑使其自动化。

The next articles in this series:

本系列的下一篇文章：

Benefits of Partitioning your SSAS Multidimensional Cube对SSAS多维数据集进行分区的好处 How to optimize the dimension security performance using partitioning in SSAS Multidimensional如何在SSAS多维中使用分区来优化维度安全性能