everything you always wanted to know about powerpivot...

33
Everything You Always Wanted to Know About PowerPivot Data Refresh but Were Afraid to Ask SQL Server Technical Article Writer: Mariano Teixeira Neto Technical Reviewers: Lee Graber, Heidi Steen, Dave Wickert First Published: September 2010 Minor updates September 2012 Applies to: SQL Server 2008 R2 Summary: The purpose of PowerPivot for SharePoint is to allow users to share their PowerPivot workbooks in a secure and scalable environment. To ensure that you are sharing and collaborating on the most recent data available to you, PowerPivot for SharePoint provides a data refresh feature that lets you automatically update PowerPivot data in an Excel workbook. This white paper describes in detail the data refresh feature in PowerPivot for SharePoint. It starts by explaining the steps for setting up a data refresh schedule in SharePoint, and then it continues with an in-depth view on how data refresh works on a SharePoint farm. Both administrators and the business analysts who author and manage PowerPivot workbook data can benefit from learning more about setting up and using data refresh in a SharePoint environment.

Upload: phungque

Post on 30-Jan-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Everything you always wanted to know about PowerPivot Data Refresh but were afraid to ask

Everything You Always Wanted to Know About PowerPivot Data Refresh but Were Afraid to Ask

SQL Server Technical Article

Writer: Mariano Teixeira Neto

Technical Reviewers: Lee Graber, Heidi Steen, Dave Wickert

First Published: September 2010

Minor updates September 2012

Applies to: SQL Server 2008 R2

Summary: The purpose of PowerPivot for SharePoint is to allow users to share their PowerPivot workbooks in a secure and scalable environment. To ensure that you are sharing and collaborating on the most recent data available to you, PowerPivot for SharePoint provides a data refresh feature that lets you automatically update PowerPivot data in an Excel workbook. This white paper describes in detail the data refresh feature in PowerPivot for SharePoint. It starts by explaining the steps for setting up a data refresh schedule in SharePoint, and then it continues with an in-depth view on how data refresh works on a SharePoint farm. Both administrators and the business analysts who author and manage PowerPivot workbook data can benefit from learning more about setting up and using data refresh in a SharePoint environment.

Copyright

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred.

2010 Microsoft Corporation. All rights reserved.

ContentsIntroduction4Creating a Schedule5Data Refresh7Schedule Details7Earliest Start Time8E-mail Notifications9Credentials9Data Sources11Infrastructure12PowerPivot Data Refresh Timer Job12Anatomy of PowerPivot Data Refresh14Parallelization15Sample Situations17Scenario 117Scenario 218Scenario 320Best Practices (Tips and Tricks)23Managing the Data Refresh Settings in a PowerPivot Service Application23Refreshing a Workbook More Than Once a Day25Pausing All Data Refresh Operations on a Given Server25Working with the Unattended Data Refresh Account26Conclusion26Appendix27

Introduction

Because PowerPivot for Microsoft Excel simplifies data integration in Excel, it is common for a PowerPivot workbook to contain data from multiple sources, including public, corporate, or local data in a single workbook. Some of this data might change slowly or not at all, while other data changes so quickly that it is virtually out of date as soon as you import it.

To help you keep your PowerPivot workbooks using the most current data, PowerPivot for SharePoint provides a data refresh feature that retrieves newer data from the external data sources you used to build the workbook. After a SharePoint administrator enables data refresh on the server, any PowerPivot workbook owner can schedule data refresh for workbooks saved to Shared Documents or some other SharePoint library. Figure 1 shows the drop-down menu that opens the data refresh schedule page.

Figure 1: Accessing PowerPivot Data Refresh from a Shared Documents library, from the context menu

PowerPivot Gallery, used for showcasing reports and workbooks, offers another way to get to the data refresh feature. SharePoint users who have Contributor permission can click the Calendar icon shown for each workbook in a PowerPivot Gallery. Note that if the user does not have sufficient privileges on the workbook, the Calendar icon is not available on the page. Figure 2 shows the icon that opens the data refresh schedule definition page.

Figure 2: Accessing PowerPivot Data Refresh from a PowerPivot Gallery, from the Calendar icon

Creating a Schedule

Regardless of how you get to the schedule definition page, the details of the page remain the same. Figure 3 shows the initial view of the page. The first step to setting up a data refresh schedule is to click Enable. This makes the page active so that you can fill in the values you want to use.

Figure 3: Enabling Data Refresh in the Manage Data Refresh page

The Manage Data Refresh page is organized into six sections.

Section

General description

Data Refresh

Enable or disable a data refresh schedule.

Schedule Details

Define the frequency and timing details of a data refresh.

Earliest Start Time

Specify the earliest start time for a data refresh.

E-mail Notifications

Specify e-mail address of the users to be notified in the event of data refresh failures.

Credentials

Provide the credentials that will be used to refresh data on your behalf.

Data Sources

Select which data sources should be automatically refreshed. You also use this section to create custom schedules that vary for each data source, or specify credentials for connecting to the data source.

Table 1: Manage Data Refresh page sections

Data Refresh

To enable or disable a data refresh schedule, select or clear the Enable check box on the Manage Data Refresh page. If this check box is selected, you can edit all parts of the data refresh schedule. If the check box is cleared, the page is read-only, and after you click OK, subsequent data refresh operations are prevented for that workbook.

Schedule Details

In the Schedule Details section, you can specify the frequency and timing details of the data refresh. There are four options to choose from:

Daily

Weekly

Monthly

Once

The Daily option enables you to schedule data refresh to occur every n day(s), every weekday, or on specific days of the week.

If you select the Also refresh as soon as possible check box, data is refreshed as soon as the server can process it. This refresh occurs in addition to the periodic data refresh schedule. This option is available for periodic schedules only (that is, daily, weekly, and monthly schedules). Select this check box if you want to verify that the data refresh will run properly. For example, you may not know whether data credentials are configured correctly. This option provides a way to test the data refresh before its scheduled execution time.

Figure 4: Daily schedule details options

The Weekly option is for scheduling data refresh on a weekly basis, such as every n week(s) or on specific days of the week.

Figure 5: Weekly schedule details options

The Monthly option schedules data refresh to run on a specific day of the month, or on the first, second, third or last specific day of the week every n month(s).

Figure 6: Monthly schedule details options

The Once option is for scheduling a one-time data refresh operation that runs as soon as the server can process the request. After the data refresh is complete, the system disables this schedule. Notice that the Also refresh as soon as possible check box is not available for this option.

Figure 7: Once schedule details option

Earliest Start Time

In the Earliest Start Time section, specify details about when you prefer data refresh to occur. You can enter a specific time before which data refresh should not start, or you can choose to refresh data after business hours. This page does not determine the time at which the data refresh actually starts; the schedule is queued and processed based on available resources. For example, if the server is busy with on-demand queries (which take precedence over data refresh jobs), the server waits to refresh your data until after those queries are processed. Another option for specifying earliest start time is to run data refresh after business hours. The definition of business hours for your organization is determined by the administrator of the PowerPivot service application (for more information, see Managing the Data Refresh Settings in a PowerPivot Service Application).

Figure 8: Earliest Start Time options

E-mail Notifications

In this section of the page, you can specify the e-mail addresses of anyone who should be notified of a data refresh failure. If you also want to receive notifications of successful data refresh operations, you can use the regular SharePoint alerting system for e-mail notification (the basis of the alert would be a new file added to the target document library).

Figure 9: E-mail Notifications option

Credentials

PowerPivot for SharePoint uses the SharePoint Secure Store Service to store any credentials used in data refresh. In the Credentials section of the schedule page, the schedule owner can specify the Windows credentials that will be used to refresh data on his or her behalf. Any data source that uses trusted or integrated security will be refreshed as this user. Note that in order for the data refresh to succeed, the credentials should have access to the data sources for this workbook. There are three options to choose from:

Use an account preconfigured by the administrator (this is the service applications unattended data refresh account).

Use a specific Windows user name and password.

Use a predefined Secure Store Service target application ID that stores the Windows credentials you want to use.

Both the PowerPivot unattended data refresh account and the target application ID that stores arbitrary Windows credentials must be set up by a SharePoint administrator in Central Administration. Because these credentials are shared among all users, for instance, this option is typically used where additional credentials would be actually used for data access. A good example for when this option is advised is when all of the data sources use SQL Server authentication (that is, the actual user names and passwords are on each data source). In this case the unattended execution account can be a low permission service account. It is normally not a good idea to have the unattended execution account be someones primary user account: Because of the way data refresh uses Windows accounts, this means that anyone can impersonate that user if he or she accesses data using a trusted connection. For more information, see Anatomy of PowerPivot Data Refresh.

Figure 10: Credentials options using an account configured by the administrator

A schedule owner can also choose to type the Windows user credentials to be used on the data refresh. These credentials are securely stored in SharePoints Secure Store Service.

Figure 11: Credentials options specifying Windows user credentials

The third option lets a schedule owner specify credentials previously saved in a Secure Store Service Target Application ID. In order to use this option, you must enter the Target Application ID used to look up the credentials in the Secure Store Service. The Target Application ID specified must be a group entry, and both the interactive user and the PowerPivot System service account must have read access.

Figure 12: Credentials options specifying a Target Application ID

Note: Setting up and maintaining Secure Store Service is outside the scope of this white paper. For more information about Secure Store Service, see the TechNet articles referenced in the appendix.

Data Sources

A workbook can have many data sources with different characteristics. You can choose to create a data refresh schedule using different options for each data source.

Figure 13: Data sources settings

The schedule definition page provides options for choosing the data sources to be refreshed and when to refresh them. It also provides fields for specifying database credentials or other non-Windows credentials used on the database connection. At least one data source must be selected in order to save the schedule. The data sources credentials are not used for impersonation, but are instead included on the connection string as UserName and Password. These credentials override those used on the connection string for the original data import.

For each data source, different settings are available. You can specify a custom schedule data source or use the general schedule specified for this workbook.

Note: The only elements that can be changed on the connection string are the UserName and Password elements. To edit any of the other elements, for example, to change the source server name, you must download the workbook to your desktop, edit it using the PowerPivot Excel add-in, and then republish to SharePoint.

Figure 14: Data Sources details

Infrastructure

PowerPivot data refresh infrastructure is designed to process data refresh jobs as long as there are resources available to do the job. This section provides an overview of the PowerPivot Data Refresh infrastructure and shows the data refresh execution flow.

PowerPivot Data Refresh Timer Job

The PowerPivot data refresh timer job can be accessed through Central Administration. From Central Administration, click Monitoring, and then click Review job definitions. Look for PowerPivot data refresh timer job, as shown in Figure 15.

Figure 15: The PowerPivot data refresh timer job

By default, the PowerPivot data refresh timer job is set to run every minute. This is how frequently the system will look for data refresh schedules to be processed. You can change the periodicity of the timer job by clicking PowerPivot Data Refresh Timer Job, as shown in Figure 15. Changing this setting causes the data refresh schedules to be processed less frequently, because the default setting is the most frequent setting available. In general, this setting shouldnt be changed.

Figure 16: The PowerPivot data refresh timer job details

This timer job is responsible for calling the PowerPivot System Service (PSS) whose task it is to check the queue of scheduled data refresh jobs. When the timer job runs, it verifies which data refresh schedules are past due according to the schedules specifications. That is, the system checks to see whether the current time is equal to or greater than the date and the earliest start time specified in the data refresh schedule. If there are runnable data refreshes, and if there are available resources to process it, a server with enough memory and CPU will kick off the data refresh process. A step-by-step walkthrough shows how data refresh works.

Anatomy of PowerPivot Data Refresh

1. First, the system looks for enabled schedules that are runnable, meaning that their schedule time period has come due. Because many jobs might be scheduled to run at the same time, the system tries to run a job as soon as it has available resources. All of the PowerPivot SharePoint servers in a farm monitor and handle data refresh jobs the same way. Ultimately, one of them detects that your job is runnable and runs it.

2. After impersonating the Windows user specified in the schedule, the system extracts the workbook from the content database using the SharePoint object model. For impersonation to work, the user must have supplied a valid Windows account in the schedule, and he or she must ensure that that account has Contribute (read/write) access rights to the workbook. The workbook is stored in a temporary folder (in the OLAP Backup folder) so that it can be used later (see step 9).

3. The system sends a request to a local Microsoft SQL Server Analysis Services engine instance to extract the embedded database from the workbook and then loads the database into it. The database is loaded as read/write (so that it can be updated). This database is used for this data refresh job only the system ensures that it is not used for querying while data refresh is underway (that is, no Analysis Services query commands are allowed to execute).

4. If one of the data sources specified in this schedule has custom data source credentials, the connection string for that data source is modified to use those credentials. This is done using an XMLA command to the data source.

5. The system impersonates the Windows user for a second time and sends processing commands to the database. This Windows account is the one that will be used for trusted or integrated security connections. In this step, Analysis Services establishes a connection to the external data sources and pulls updated data into the PowerPivot database. The processing command is not sent to all tables or dimensions. The process commands are sent to just those objects that are dependent on the data sources listed in the schedule.

6. The data source credentials (if any) are reset.

7. The PowerPivot database is saved back to the workbook.

8. If it is not set already, the embedded connections property Refresh data when opening the file is set to true. This ensures that users immediately see the new data when the workbook is opened. It also means that snapshot generation will include the new data in a PowerPivot Gallery thumbnail image.

9. Impersonating the Windows account for a third time, the workbook is saved back to the content database using the SharePoint binary object model. If the document library is a PowerPivot Gallery, the object model fires its updated file event handler, which starts the snapshot generation process.

10. The schedules status is updated with information about the job and saved in data refresh history.

11. Finally, the refreshed read/write database is unloaded and cached as a read-only database so it is available for on-demand query requests for that data.

Parallelization

In order to scale out data refresh processing, there are two ways that PowerPivot for SharePoint provides parallelization. The first one is to have more than one application server that is data refresh enabled, and the second is to allow more the one data refresh job to run concurrently on the same application server.

The PowerPivot service application administrator can specify the server role through in Central Administration. How you set the role determines whether the server is allowed to load read-only databases, process data refresh, or both. You can also specify the maximum number of concurrent data refresh jobs allowed on this server, as shown in Figures sequence 17 and 18.

To configure server roles and properties, click Manage services on server.

Figure 17: The SharePoint Central Administration page

The Server box (the drop-down menu circled in green at Figure 18) displays the name of the server youre configuring. You can pick any server from your SharePoint farm. Be sure youre viewing the services for a server that has a SQL Server Analysis Services server running on it, and then click the SQL Server Analysis Services link.

Figure 18: The SQL Server Analysis Services instance properties

Clicking SQL Server Analysis Services will lead you to the Service Instance properties page, where you can specify the server role.

Figure 19: Analysis Services Server instance settings

This page contains three settings.

Setting

Description

Enable loading of read-only PowerPivot data files

Select this check box to allow the server to load in read-only mode the PowerPivot databases that are embedded in workbooks. This option enables on-demand query operations. If you clear this check box, the server support only data refresh requests.

Enable loading of PowerPivot data for refresh

Select this check box to allow the server to process data refresh jobs.

Maximum number of concurrent refresh jobs

Specify the maximum number of concurrent data refresh jobs that can be processed at any given time on this server. Regardless of the value specified, the system will only run one job per processor. If your server has additional cores or processors, you can increase this value to run multiple data refresh jobs at the same time.

By default, this value is set to 1.

Table 2: PowerPivot Service settings

These settings are configured separately for each PowerPivot service application.

Sample Situations

One of the best ways to determine whether data refresh will work for you is to evaluate scenarios that are similar to those you encounter on a regular basis.

Scenario 1

You have prepared a workbook with sales information for your department to be used in a meeting with the marketing team that happens every Thursday morning.

To keep this workbook up to date, you upload it to your teams SharePoint document library and schedule a weekly data refresh for every Wednesday after business hours, as shown in Figure 20. Your workbook will be refreshed after hours on Wednesday, and it will be ready for your meeting on Thursday morning.

Figure 20: Configuring the general schedule for the weekly meeting workbook

Scenario 2

You have prepared a workbook that combines data from a production database and your sales database that is used on a daily basis. The sales database is updated constantly, but the production database is only updated during the weekends.

To keep your workbook up-to-date, you set up different schedules to refresh the workbook from each data source independently. For sales data, you want your workbook to show the sales figures as they become available, so you pick a schedule that retrieves sales data on a daily basis, as shown in Figure 21. For the production database, you schedule data refresh to run every Monday on a custom schedule, as shown in Figure 22.

Figure 21: Configuring the general schedule for the Scenario 2 workbook

Figure 22: Configuring the Production data source to use its own schedule

The workbook will be updated daily with data from the SalesDB data source, but the ProductionDB data will only change on Mondays when its data refresh schedule runs.

Scenario 3

You work in the Production department and have created a workbook that uses data from your production database, your departmental sales database, and external data from a third-party ERP data source. The production database is updated constantly, the departmental database is updated weekly, and the ERP is updated quarterly, on the first day of that month. For the production database, the administrator has set up a data refresh account that can be used by the whole Production team. The administration has also given you read rights to the department sales database. And for the ERP data source, the SharePoint administrator has set up a Secure Store App ID with the proper credentials to query the ERP data source called ERPrefresh.

To keep this workbook up-to-date, you can schedule a data refresh to happen daily after business hours, as shown in Figure 23, and you can specify the department sales and ERP data sources their own schedule, as shown in Figure 24.

Figure 23: Configuring the general schedule for the Scenario 3 workbook

Figure 24: Configuring the data sources schedules for the Scenario 3 workbook

This workbook will be updated every day, according to the general schedule, with data from the production data source because it is set to use the default schedule. The Department sales data source data in the workbook will be updated every Wednesday using John Smiths credentials. And the ERP data source data will be updated on day 2 of every three months to reflect the changes that happen on the first day of every quarter.

Best Practices (Tips and Tricks)

SharePoint administrators can change data refresh configuration settings to improve the data refresh experience.

Managing the Data Refresh Settings in a PowerPivot Service Application

Start SharePoint Central Administration, and then in General Application Settings, click PowerPivot Management Dashboard.

Figure 25: Accessing data refresh settings through the SharePoint Central Administration page

In the top right corner, under Actions, click Configure service application settings.

Figure 26: Configuring the service application settings

Find the Data Refresh section, as shown in Figure 27.

Figure 27: The PowerPivot Data Refresh properties for this PowerPivot service application

Here you can change the definition of business hours for this service application. You can also configure the Unattended Data Refresh Account, which should already be created in the SharePoint Secure Store Service application. For more information about SharePoint Secure Store Service, see the appendix.) This account will be used by all the data refresh schedules configured to use the first Credentials option shown in Figure 11. This section is also where you allow the use of custom Windows Credentials (the second option of the Credentials settings in a data refresh schedule, as shown in Figure 11) and specify how long to keep data refresh history.

Refreshing a Workbook More Than Once a Day

In this release of PowerPivot for SharePoint, it is not possible to configure a schedule to refresh a workbook more frequently than once a day. However, there is a way to work around this restriction by manually running data refresh whenever you need more current data:

1. In the Timing details section of the data refresh page, set up a periodic schedule (that is, daily, weekly, or monthly) and select the Also refresh as soon as possible check box (see Figures 4, 5 and 6).

2. Save the schedule. The data refresh job will process as soon as soon as system resources become available (on average it will start processing in one minute).

3. Repeat these steps any time you require an immediate update. The previous schedule will be overwritten, and it will be queued again.

There is a caveat for this procedure. If the workbook is opened in a browser right before data refresh occurs, the Excel Calculation Service (ECS) caches it to speed up loading for the next time you open it.

Pausing All Data Refresh Operations on a Given Server

All data refresh jobs are controlled by the PowerPivot data refresh timer job. This timer job must be running in order for data refresh to occur. If you want to pause data refresh without changing the configuration settings of your service application (as shown in Figure 19), you can disable the PowerPivot data refresh timer job through the Review Job Definitions page in Central Administration, as shown in Figure 28.

Figure 28: Disabling the PowerPivot data refresh timer job

After the timer job is disabled, no requests for data refresh are queued or processed.

Working with the Unattended Data Refresh Account

As a general guideline, the unattended data refresh account should have extremely limited data access. Two approaches should be considered:

You can use an account with no data rights to any databases. In this case, the database access would be through SQL Server authentication, and the user name and password would be on each data source in the connection string. You can modify these for the data refresh schedule on a data source by data source basis in the schedule.

You can use an account with only the minimum rights necessary to be able to read from the data source(s) that will be queried during data refresh. That is, the unattended data refresh account needs data reader permissions if the data import connection string includes SSPI=IntegratedSecurity or SSPI=TrustedConnection.For this connection string, there is no user name or password to store, hence nothing to pass along later (for example, during data refresh).

Using this approach, you should never specify a regular user account as the unattended data refresh account otherwise virtually any user could access that accounts data.

Regardless of the approach used, the unattended data refresh account must have read/write access (Contribute permissions) to the workbook in SharePoint.

There is only one unattended data refresh account for a single PowerPivot service application. If you need multiple unattended data refresh accounts, you must create multiple service applications (for manageability purposes, its advised to create as few service applications as possible).

Conclusion

PowerPivot data refresh is an essential feature for workbook authors who want an automated approach to getting the most current data in their workbooks. Data refresh can play a key role in how you share information and collaborate with others, and we hope that the information in this white paper helps you to take advantage of it.

For more information:

http://powerpivot.com/: PowerPivot official site

/: PowerPivot official blog site

Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example:

Are you rating it high due to having good examples, excellent screen shots, clear writing, or another reason?

Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?

This feedback will help us improve the quality of white papers we release.

Send feedback.

Appendix

TechNet articles on Microsoft SharePoint 2010 Secure Store Service:

Plan the Secure Store Service (http://technet.microsoft.com/en-us/library/ee806889.aspx)

Configure the Secure Store Service (http://technet.microsoft.com/en-us/library/ee806866.aspx)

27