Tuesday, May 30, 2023

Deleting File Versions to reduce the SharePoint Storage Consumption

 


This is a fellow-up post on the MS365thinking: Don't pay more for SharePoint Storage than you have to :-) post, where I went through the options you have when reducing the SharePoint Storage consumption.


The first action could be to reduce the default number of versions from 500 to a more reasonable number, like 50.

This will minimize any future storage increase but will not delete any existing versions. So, we will have to trim the existing libraries. 



In this post I will show how you can reduce the consumption using the PnP.PowerShell command Remove-PnPFileVersion


First of all, let's provide some evidens on how SharePoint storage is calculated in a way that will convince your management that you should investigate this.

This script will create a brand-new Modern SharePoint site collection (STS#3) or use the site collection specified by you.

DummyFileVersionGenerator

The script will then create a number of major and minor versions using a file provided by you:


#how many major versions the script should create
$majorVersionCount = 30
#how many minor versions that should create per major version
$minorVersionCount = 10


It will then calculate the current amount of SP storage used in this site collection.


If the file you provided is 5MB then we would expect the storage to be 5*10*30 = 1500 MB.

Wait...I have heard that Microsoft only saves the diffs when dealing with modern office files as these are XML files behind the covers. So the 300 versions of the file will only take up a smidge more than 5 MB, right?

Yes, you are correct, but that is not the way SharePoint storage is calculated :-) as Microsoft calculates this as the aggregation of File Size for each version of the file.


The re-calculation of Storage seems to be on a schedule so you should expect that the new storage numbers will take serveral hours before it shows up in the Admin center



Once we can see the Storage being consumed on our site collection, it is time to bring out the harvester:


File Version Trimmer | PnP Samples


Please be aware that the script is a sample and NOT production grade code. That will most likely be a number of questions you will need to address before you can start trimming the file versions.

Typical questions could be:

  • Which Site Collections should not be trimmed.
  • Which archived Site Collections should be unarchived and trimmed and then archived again.
  • Are we going to use the same pruning parameters on all sites, or should we only trim the minor versions on some Site Collections.
  • Are some of the files tagged as records and should receive special treatment
  • and so on :-)

 

Have fun and remember Sharing is caring


Monday, May 22, 2023

SharePoint People search - How to clean up your results by exclude accounts


People search in SharePoint is based on the accounts you can see in the User Profiles blade in the SharePoint Admin center



And as you properly know the accounts in the User Profiles will very often be a mess of expired accounts, Meeting Room, External users, Test account and a lot more.


These are my notes for cleaning up the People/Employee search in SharePoint 



Exclude former employees

“SPS-HideFromAddressLists” originates in Exchange and is primarily used to decide which accounts that should not be shown in the Global Address list. In many companies/orgs this property is set to true/1 when somebody leaves the org.

This makes it a very useful property as it is the best indicator we have on whether an account is active or not.

"SPS-HideFromAddressLists"<>1  (show only accounts that should not be hidden in the GAL)

Exclude test, admin or similar accounts

-preferredname:admin*

-preferredname:test*


Exclude accounts which does not use a specific email domain

This filter is an excellent way to exclude external users and consultants with a full account within the org. Please note that we often exclude accounts by specifying that the field should contain a specific value.

WorkEmail:@contoso.com 


Exclude members of a specific department

-Department:External

-Department:Management

Exclude accounts which does not have a value in a given property

It is possible to define a query that will exclude the accounts that do not have any value in a specific field.

Since KQL (the Language we are using in the search queries) can't search for "Fields which does not contain any value" we have to use a little trick:

If we want to exclude the accounts that does not have a cell phone number, we simply must reverse that requirement, and hence it will be "include those accounts that have a value in the cell phone field":

(MobilPhone:0* OR MobilPhone:1* OR MobilPhone:2* OR MobilPhone:3* OR MobilPhone:4* OR MobilPhone:5* OR MobilPhone:6* OR MobilPhone:7* OR MobilPhone:8* OR MobilPhone:9*)


User Profile properties such as Department and JobTitle are based on Term Store Term sets and we can use the Term Guid values in our query.

In this example we are using the auto generated  property owstaxIdSPShDepartment and the guid specified is the root node of the Department term set: (look in the Term Store for those guids)

owstaxIdSPShDepartment:"#8ed8c9ea-7052-4c1d-a4d7-b9c10bffea6f" 


So using that in our query will insure that only account with a value in the Department field in included in our result. Neat, right?

Exclude accounts having a specific AccountName

-AccountName:Prod\B

This will enable you to exclude specific Accounts or groups of accounts like meeting rooms, cars and similar object as the AccountName often indicates which kind of account it is.

 

Search Scope

In this case search scope is defined as the list of fields/properties in the SharePoint User Profile used in the matching. By default we are matching against every field, however this will sometimes cause some rather strange results, espcially if you are not using Ranking.

Example: In this Org we have a department called QA, and would expect the members of that department to show up when doing a search for the query "QA", however Bob from Accounting shows up as number 4 result, ahead of several collegues from the QA department. Why?

Well, after going through Bobs account we can see he is a member of a security group named "Financial QA".......and that is sufficient to mess up the search results.

Of the UPA fields Description, Interests, Memberships and PeopleKeywords can contains values that causes unexpected results.

 

One option that will solve this issue could be to limit your search to only match on a set of specific fields. This will to some extend reduce the general usebility of people search as the end users might not be awere of this limitation.

Often these fields are used in such at limited search: 

FirstName

LastName

PreferredName

Department

PeopleKeywords

WorkEmail

MobilePhone

Location

Responsibilities

PastProjects

MobilePhone

Interests

Description

JobTitle

MobilePhone

Skills


Ranking and sorting

Ranking and sorting decides in which order the search results will be shown. 


Sorting by Name

FirstName

LastName

PreferredName

 

Sorting using XRANK

XRANK specifies a numeric value for each result, usually calculated by Microsoft using the selected Ranking Model. But we are able to tweak this ranking calculation by boosting results that matches a keyword. 

Example:

XRANK(cb=1.5) FirstName:{searchTerms} OR XRANK(cb=0.5) Department:{searchTerms}

In this case we are boosting an account by factor 1.5 (150%) if the query matches FirstName and by 50% if the query matches Department.

Ranking model

In most cases we are using the ”People Search social distance model” ranking model as it is a good allround model for people search.

 

Additional info: Tech and me: People Search ranking for dummies (techmikael.com)

 

Tooling

When working with User profile data and search you should be familiar with these tools:

SharePoint Administation, specificly User Profiles (raw data) and the Search Schema on the tenant level ( mapning fields from UPA to Managed Properties)

SP Query Tool, PnP-Tools/Solutions/SharePoint.Search.QueryTool at master · pnp/PnP-Tools · GitHub, a desktop app that allows you to test both your quiries and your data.

 

SP Editor, a Chrome/Edge extension , GitHub - tavikukko/Chrome-SP-Editor: Extension for creating and updating files (js, css) in SharePoint Online from Developer Tools

Similar to the SP Quiry Tool as far as Search goes, and contains a lot of other SP related features. 


Wednesday, May 3, 2023

Content Type issue with the PnP Provisioning Engine

 

Often we have a number of Content Types which inherences from each other:











In this case Contoso Document is based on Document, and Contoso is based on the Contoso Document.

The Contoso Project Document is added to the default Document Library on our Contoso Project Template site that serves as the template site for our PnP Provisioning Engine template.







We create the PnP Template and specify the Documents library as it is the only one of interest for us:

Get-PnPSiteTemplate -out C:\temp\basetemplate2.pnp -Force
-ListsToExtract "Documents" -Connection $conn


However, when applying the PnP Template on a new Site Collection we get an error:





OK, so the base type for Contoso Project Type which is the Contoso Document does not exist on this new Site Collection.


This can be resolved in at least two ways:

Either run the PnP PowerShell cmdlet Add-PnPContentTypesFromContentTypeHub and specify that the Contoso Document content type should be added to the target site

or

Add the Contoso Document content type to the document library on our Contoso Project Template Site: 







And once we have applied the PnP Provisioning template to our new site, we can delete both the Document and the Contoso Document content types from the document library, just to ensure that a creative end user doesn't manage to use the wrong content type while working with the document library.