Tuesday, May 30, 2023

Deleting File Versions to reduce the SharePoint Storage Consumption

 


This is a fellow-up post on the MS365thinking: Don't pay more for SharePoint Storage than you have to :-) post, where I went through the options you have when reducing the SharePoint Storage consumption.


The first action could be to reduce the default number of versions from 500 to a more reasonable number, like 50.

This will minimize any future storage increase but will not delete any existing versions. So, we will have to trim the existing libraries. 



In this post I will show how you can reduce the consumption using the PnP.PowerShell command Remove-PnPFileVersion


First of all, let's provide some evidens on how SharePoint storage is calculated in a way that will convince your management that you should investigate this.

This script will create a brand-new Modern SharePoint site collection (STS#3) or use the site collection specified by you.

DummyFileVersionGenerator

The script will then create a number of major and minor versions using a file provided by you:


#how many major versions the script should create
$majorVersionCount = 30
#how many minor versions that should create per major version
$minorVersionCount = 10


It will then calculate the current amount of SP storage used in this site collection.


If the file you provided is 5MB then we would expect the storage to be 5*10*30 = 1500 MB.

Wait...I have heard that Microsoft only saves the diffs when dealing with modern office files as these are XML files behind the covers. So the 300 versions of the file will only take up a smidge more than 5 MB, right?

Yes, you are correct, but that is not the way SharePoint storage is calculated :-) as Microsoft calculates this as the aggregation of File Size for each version of the file.


The re-calculation of Storage seems to be on a schedule so you should expect that the new storage numbers will take serveral hours before it shows up in the Admin center



Once we can see the Storage being consumed on our site collection, it is time to bring out the harvester:


File Version Trimmer | PnP Samples


Please be aware that the script is a sample and NOT production grade code. That will most likely be a number of questions you will need to address before you can start trimming the file versions.

Typical questions could be:

  • Which Site Collections should not be trimmed.
  • Which archived Site Collections should be unarchived and trimmed and then archived again.
  • Are we going to use the same pruning parameters on all sites, or should we only trim the minor versions on some Site Collections.
  • Are some of the files tagged as records and should receive special treatment
  • and so on :-)

 

Have fun and remember Sharing is caring


Monday, May 22, 2023

SharePoint People search - How to clean up your results by exclude accounts


People search in SharePoint is based on the accounts you can see in the User Profiles blade in the SharePoint Admin center



And as you properly know the accounts in the User Profiles will very often be a mess of expired accounts, Meeting Room, External users, Test account and a lot more.


These are my notes for cleaning up the People/Employee search in SharePoint 



Exclude former employees

“SPS-HideFromAddressLists” originates in Exchange and is primarily used to decide which accounts that should not be shown in the Global Address list. In many companies/orgs this property is set to true/1 when somebody leaves the org.

This makes it a very useful property as it is the best indicator we have on whether an account is active or not.

"SPS-HideFromAddressLists"<>1  (show only accounts that should not be hidden in the GAL)

Exclude test, admin or similar accounts

-preferredname:admin*

-preferredname:test*


Exclude accounts which does not use a specific email domain

This filter is an excellent way to exclude external users and consultants with a full account within the org. Please note that we often exclude accounts by specifying that the field should contain a specific value.

WorkEmail:@contoso.com 


Exclude members of a specific department

-Department:External

-Department:Management

Exclude accounts which does not have a value in a given property

It is possible to define a query that will exclude the accounts that do not have any value in a specific field.

Since KQL (the Language we are using in the search queries) can't search for "Fields which does not contain any value" we have to use a little trick:

If we want to exclude the accounts that does not have a cell phone number, we simply must reverse that requirement, and hence it will be "include those accounts that have a value in the cell phone field":

(MobilPhone:0* OR MobilPhone:1* OR MobilPhone:2* OR MobilPhone:3* OR MobilPhone:4* OR MobilPhone:5* OR MobilPhone:6* OR MobilPhone:7* OR MobilPhone:8* OR MobilPhone:9*)


User Profile properties such as Department and JobTitle are based on Term Store Term sets and we can use the Term Guid values in our query.

In this example we are using the auto generated  property owstaxIdSPShDepartment and the guid specified is the root node of the Department term set: (look in the Term Store for those guids)

owstaxIdSPShDepartment:"#8ed8c9ea-7052-4c1d-a4d7-b9c10bffea6f" 


So using that in our query will insure that only account with a value in the Department field in included in our result. Neat, right?

Exclude accounts having a specific AccountName

-AccountName:Prod\B

This will enable you to exclude specific Accounts or groups of accounts like meeting rooms, cars and similar object as the AccountName often indicates which kind of account it is.

 

Search Scope

In this case search scope is defined as the list of fields/properties in the SharePoint User Profile used in the matching. By default we are matching against every field, however this will sometimes cause some rather strange results, espcially if you are not using Ranking.

Example: In this Org we have a department called QA, and would expect the members of that department to show up when doing a search for the query "QA", however Bob from Accounting shows up as number 4 result, ahead of several collegues from the QA department. Why?

Well, after going through Bobs account we can see he is a member of a security group named "Financial QA".......and that is sufficient to mess up the search results.

Of the UPA fields Description, Interests, Memberships and PeopleKeywords can contains values that causes unexpected results.

 

One option that will solve this issue could be to limit your search to only match on a set of specific fields. This will to some extend reduce the general usebility of people search as the end users might not be awere of this limitation.

Often these fields are used in such at limited search: 

FirstName

LastName

PreferredName

Department

PeopleKeywords

WorkEmail

MobilePhone

Location

Responsibilities

PastProjects

MobilePhone

Interests

Description

JobTitle

MobilePhone

Skills


Ranking and sorting

Ranking and sorting decides in which order the search results will be shown. 


Sorting by Name

FirstName

LastName

PreferredName

 

Sorting using XRANK

XRANK specifies a numeric value for each result, usually calculated by Microsoft using the selected Ranking Model. But we are able to tweak this ranking calculation by boosting results that matches a keyword. 

Example:

XRANK(cb=1.5) FirstName:{searchTerms} OR XRANK(cb=0.5) Department:{searchTerms}

In this case we are boosting an account by factor 1.5 (150%) if the query matches FirstName and by 50% if the query matches Department.

Ranking model

In most cases we are using the ”People Search social distance model” ranking model as it is a good allround model for people search.

 

Additional info: Tech and me: People Search ranking for dummies (techmikael.com)

 

Tooling

When working with User profile data and search you should be familiar with these tools:

SharePoint Administation, specificly User Profiles (raw data) and the Search Schema on the tenant level ( mapning fields from UPA to Managed Properties)

SP Query Tool, PnP-Tools/Solutions/SharePoint.Search.QueryTool at master · pnp/PnP-Tools · GitHub, a desktop app that allows you to test both your quiries and your data.

 

SP Editor, a Chrome/Edge extension , GitHub - tavikukko/Chrome-SP-Editor: Extension for creating and updating files (js, css) in SharePoint Online from Developer Tools

Similar to the SP Quiry Tool as far as Search goes, and contains a lot of other SP related features. 


Wednesday, May 3, 2023

Content Type issue with the PnP Provisioning Engine

 

Often we have a number of Content Types which inherences from each other:











In this case Contoso Document is based on Document, and Contoso is based on the Contoso Document.

The Contoso Project Document is added to the default Document Library on our Contoso Project Template site that serves as the template site for our PnP Provisioning Engine template.







We create the PnP Template and specify the Documents library as it is the only one of interest for us:

Get-PnPSiteTemplate -out C:\temp\basetemplate2.pnp -Force
-ListsToExtract "Documents" -Connection $conn


However, when applying the PnP Template on a new Site Collection we get an error:





OK, so the base type for Contoso Project Type which is the Contoso Document does not exist on this new Site Collection.


This can be resolved in at least two ways:

Either run the PnP PowerShell cmdlet Add-PnPContentTypesFromContentTypeHub and specify that the Contoso Document content type should be added to the target site

or

Add the Contoso Document content type to the document library on our Contoso Project Template Site: 







And once we have applied the PnP Provisioning template to our new site, we can delete both the Document and the Contoso Document content types from the document library, just to ensure that a creative end user doesn't manage to use the wrong content type while working with the document library.








Thursday, April 20, 2023

Surfacing a new User Profile Property in SharePoint Search

 

As part of a Document Management or Intranet project I have often been asked to add a new property to the User Profile Properties. 

Some of the more common purposes are:

Employee Number

Initials

Team (subdivision of Department)


It seems to be a fairly common request as questions about how to enable searching on those new properties occur often in the PnP Modern Search Web Parts GitHub project.



I have used these links as a reference for making a new User Property searchable:

How to create searchable user profile properties in SharePoint Online | RLV Blog (rlvision.com)







Warning: patience is required when working with Search 

Thursday, April 13, 2023

New videos on YouTube: Useful tools when working with Search

 

Both when answering questions and handling issues on the PnP Modern Search GitHub project and in my daytime job as a Senior Solution Architect with Fellowmind I work a lot with both Microsoft and SharePoint Search related issues.


When people report an issue with Microsoft Search or the PnP Modern Search web parts the root cause is often related to either:

  • The KQL query is not correct
  • The fields they are missing has not been created as Site Columns 
  • The fields have not been turned into crawled properties as the users has forgotten to add content in the list/library
  • The crawled property has not been mapped or mapped incorrectly


When this kind of issues shows up in a Search Results web part or in the Out Of The Box search it is pretty difficult to debug, hence this blog post and the Useful tools when working with Search series on YouTube:


Episode 1 is about Using the SP Editor

Episode 2 is about Using the SP Query Tool 

Episode 3 is about Using the Graph Explorer ( planned for week 18, early May)

  

If you have any ideas about additional tools or methods that would be useful when debugging a search related issue please let me know on twitter

Thursday, March 23, 2023

The joys of hiding content from Microsoft Search

 

It was just a question of time, sooner than later we would get a request to hide content in Microsoft Search.


"But why not just use permissions to ensure that only the right group of people can see the content", I hear you say?


Well, in this case the content is visible for everybody, but the business rule is that the content MUST be reviewed at a set interval, and if the content has not been reviewed within this deadline, the end user should not see the content in search anymore.


One option could be to have a workflow that breaks the permissions on the item level once the deadline has been reached, but that option comes with a certain smell of substandard performance and complexity.


The Microsoft Search approach looked less intrusive and gave me some options:

Option A)

I could add a Result Type in the Search and Intelligence Admin center that would "intercept" all items of the specific type (using Content type id or similar property) and only display those items that have been reviewed using the $when clause.


Option B) 

I could change the query for the verticals in order to exclude specific content using a subset of KQL (Manage search verticals


Initially I selected Option A, and use the Search Layout Designer to define the layout, but when the layout was used in the Result Type it had a hard time deciding if the date of the deadline is in the past or the future.

Both blocks below were displayed...but only in the Result Type, in the Search Layout Designer it worked as expected. 😕




Due to this issue, I investigated Option B.

Step one was to get the query right, and using Graph Explorer this was pretty easy:



When I tried to negate the query I found that using the - sign as a shorthand for NOT seems to be a no-go going forward.



Works




Once that query works I just updated the Query for the relevant verticals, in this case only the All vertical.


Add "cacheClear=true" to your URL when testing the update as per Manage search verticals


Can this approach scale?

For now, this solution does solve the current problem, but I doubt very much that it can scale as additional content needs to be excluded. 
It will be interesting to see the solution patterns that will be developed in the coming months as Microsoft Search becomes more top of mind in the MS365 world.