Data Connect now lists Teams Chat Data as a Source DataSet – meaning bulk export of chat data via Data Factory now possible
Big news for anyone wanting to export Microsoft Teams Chat Data at scale. Whilst it’s been possible to get hold of all chat message sent by a specific user via API calls and be notified of new chats via change notifications, for solutions that need a one-time bulk export of data there hasn’t been anything that really meets the need.
Now, however, there could be an answer. Data Connect is Microsoft’s answer to accessing large amounts of Microsoft 365 information and working with it in Microsoft Azure. Data Connect offers a way to ingest data into Azure Data Factory where it can then be processed and worked on, or moved out into another data store.
I talked about Data Connect in the video Building applications around Microsoft Teams, around the 6:50 mark:
However, not all data is available in Data Connect and until recently there hasn’t been any Teams-based data. In fact, in that video I even say that it’s not for Teams data!
But now it is!
Excitingly, and as announced by Nik Charlebois, Senior PM for Graph on Twitter, Teams Chat data is now showing up as a selectable table in the Office 365 dataset for Azure Data Factory, with the title BasicDataSet_v0.TeamChat_v1:
This is message-level content about chats happening in Microsoft Teams. Data available includes:
- Created Date/Time
- Received Date/Time
- Send Date/Time
- Has attachments?
- Body Preview
- Parent Folder ID
- Conversation ID
- Conversation Index
- Is Read?
- Is Draft?
- Body Content Type
- Body Content
- From Name & Email Address
- To Name & Email Address
- Sender Email Address (I guess because of send on behalf of etc)
- Information about any attachments, such as the file ID, type, size etc.
- User Object ID
- Tenant ID
The actual JSON data returned is a Microsoft Outlook Services Message and so there are a couple of other fields that don’t feel like they directly related to Teams messages, such as Flag Status, and ReplyTo data.
If you want more data samples to look at, check Nik Charlebois’s sample data set.
This is a really rich source of Microsoft Teams Chat data that can be made available to Azure-based applications at scale to support even the largest Microsoft 365 tenants. I fully expect that Microsoft will start recommending this approach to ISVs and developers who want to export lots of Teams Chat data from Microsoft Graph, rather than using the GET-based APIs. We may even see Microsoft further throttle those APIs now that a bulk option is available.
Good things to know
For any Data Connect activity, there is an initial “warm up” time of 45 minutes regardless of the amount of data collected. Depending on your scenario this may be a problem or may be a negligible part of how long it takes to extract large amounts of data.
It’s not free to use though. Data Connect is charged per item, in this case, per Teams chat message with a minimum charging batch size of 1000 items. This is a relatively recent change that I blogged about in May 2021: Microsoft is now charging to use Microsoft Graph Data Connect – here’s how much | The thoughtstuff Blog. The current price per object is available on the Azure Pricing page for Data Connect.
If you want to get started and explore this data set, I highly recommend the quickstart Build your first Microsoft Graph Data Connect application which walks you through the process of setting up Data Factory and accessing this data. All you need to change to make this work for Teams Chat data is the Table field, from BasicDataSet_v0.Message_v0 to BasicDataSet_v0.TeamChat_v1.