russia is waging a genocidal war in Ukraine. Please help Ukraine defend itself before russia has a chance to invade other countries.
Exploring the Intersection of Software Development, AI Innovation, and Entrepreneurial Success | Graph Knowledge

Graph Knowledge

Microsoft Corp. is enhancing its Excel spreadsheet software with two new "rich data" types that provide a better way to access information on companies and places.  

Let's say you're writing a report on Washington state that includes historical dates and a spreadsheet of cities and populations. With added AI capabilities, Excel can now recognize rich data types beyond numbers and text strings. For example, Excel recognizes that "France" is a country and automatically associates it with additional attributes such as population and gross domestic product. These attributes can then be populated into different cells or used directly in formulas and stay updated with the latest data. Based on machine learning, these new data types will simplify the process of working with real-world data. In the future, we'll add organizational data types backed by the Microsoft Knowledge Graph to Excel—intelligently enhancing your spreadsheets with even richer content. A collaboration ensued while refining the vision, and the Excel team funded for Project Yellow. Right about this time, InstaFact was busy winning the second annual global Hackathon. Through winning the Hackathon and the subsequent meeting with the CEO — the reward for winning over thousands of hackathon projects — the visibility of this project caught the attention of the Project Yellow team and provided solid proof of concept.

Graph Knowledge in Microsoft ExcelSkydance Team

The original InstaFact team is (from left) Srivatsava Daruru, Rohit Paravastu, Silviu Cucerzan, Rajeev Kumar, and Deepak Zambre. (Photo by Scott Eklund/Red Box Pictures).

After Hackaton was over the team was separated and the main crew went to fix Knowledge Graph Conflation problems. But we incorporated another team to solve the real production problem for the customers in Excel, it was Rusty, Silviu, and me (Slava). The image below follows the same id (Rusty, Silviu, Slava from left).

  1. Rusty Deschenes worked in Microsoft for 20 years: https://www.linkedin.com/in/rusty-deschenes-3a340440/
  2. Silviu Cucerzan is one of the first people in the industry on entity recognition, and the main contributor to Hackaton success, he was worked in Microsoft Research since 2003: https://www.linkedin.com/in/silviu-cucerzan-245a213/
  3.  And me, without much experience on the NLP, but with a huge desire to learn the ropes from such a wonderful team: https://www.linkedin.com/in/agafonovslava/

When we delivered first prototype other teams joint and still working on components of Project InstaFact/Yellow/SkydanceЖ

  • Project Yellow team, Excel
  • User Content and Insights, Skydance
  • Microsoft News Team, MSN
  • Satori, Artificial Intelligence and Research (AI&R)
  • Microsoft Research Cambridge, Artificial Intelligence and Research (AI&R)
  • People who contributed may not be listed. However, their contribution is deeply appreciated.

To bring Project Yellow to life, the team had to bring together multiple organizations and technologies to deliver the end-to-end solution of bringing new data types into Excel.

  • The Excel team provided native integration with the app, calculations, and changes to the formula language to support the vision.
  • The User Content and Insights team in Office provided Skydance, which is a new service that sends strings to the knowledge graph and a user interface that allows a user to curate the knowledge graph's results to provide deep insights.
  • The Satori team provided intelligence and data packaging. I built the knowledge base for stocks from Refinitiv (Thompson Reuters) that includes NEMO. This powerful disambiguation engine was built in Microsoft Research and further developed by the Satori group.
  • The MSN team provided financial data from Stocks, one of the hero scenarios of the first release.

 

The result? A feature in Excel for Office 365 customers who are part of Office Insiders.

This journey starts with Stocks and Geography as the first two AI-powered data types, which will help users quickly turn complex data into action.

The new data types are being released as preview to Office 365 subscribers enrolled in the Office Insiders program, in the English language only, starting in April 2018.  Read more detail in this blog post.

To do that now you need to search for the info online and then copy and paste, which is tedious and slow, as you toggle between web, report, and data columns.

Last few years, I am working inside the Satori Knowledge Graph team, and I could not be excited more to share the Skydance project with all of you. The integration of knowledge about data with the data itself. A kind of cool to see stocks, mutual funds, currencies, bonds, commodities, cities, states, zip codes, people, and other metadata in Excel, right? Other features include an extension of the now-common idea of 'spell-checking.' And using a 'knowledge graph' to do it.  I participated in some projects recently where this would have been very useful. It's a step towards valuable intelligence in a commonly used business tool. Looking forward to trying it and see how it might be extended. Hundreds of millions of customers have created genuinely amazing solutions in Excel – and they've done all this while working with cells that most often contain (or evaluate to) just text and numbers. What becomes possible if Excel evolved into a world where cells weren't limited to just a single, flat, piece of text, but could instead hold a far richer concept? Today we are announcing a preview of new data types that will, over time, fundamentally transform what's possible in Excel. We're starting this journey with Stocks and Geography as the first two AI-powered data types, which will help users quickly turn complex data into action. To see new data types in action, we'll start with some text in cells – in this case, and the text represents countries from different places in the world (please forgive the typos – we'll come back to those in just a moment) -

This is just text in cells at this point…This is the only text in cells at this point… 

Just click the data type we want to convert the text to (Geography in this case)…

You can find this feature on the Data tab…You can find this feature on the Data tab…

  …and the cell is converted! It now holds a new data type – representing countries in this case. This content is rich – the cell isn't holding just a single piece of text anymore – it's been transformed into a new kind of value that has lots of information. Notice the icon next to the name of the country, signifying that this cell holds a data type – clicking that icon will display a card showing all the data in that cell:

 

Cards are the way to view the full contents of the cellCards are the way to view the full contents of the cell

Excel uses Microsoft Knowledge Graph, the same intelligent service that powers Bing, to provide the data. It recognizes, in context, what is meant by your text and converts that to the right type of data. It even fixed the typos and capitalization mistakes in the country names!

It's easy to work with this data in the grid – if you happen to have this data in a table, you may notice a widget that lets you grab fields and pull them into a column of their own. 

It’s easy to pull fields out to a column of a Table – Excel writes the formula for youIt's easy to pull fields out to a column of a Table – Excel writes the formula for you.

And Excel didn't just copy that data out of the cell – it dereferenced it by writing a formula for you! You can use them in any function as well – just use the dot "." operator to get a list of fields – as shown below, this time shown using City-data. These are full, calculation enabled, first-class data types in cells!

The dot operator and autocomplete are how you use fields in a formulaThe dot operator and autocomplete are how you use fields in a formula.

Journey

The new data types in Excel feature demonstrates how teams from across the company contribute to building something quite remarkable –a showcase of One Microsoft at work, and a shining example of the not-so-spontaneous combustion that can come from the simple process of using a hackathon to explore an idea.

InstaFact, the 2016 Hackathon Grand Prize winner, was a perfect storm of the right project, right place, right time, and became a catalyst between multiple teams to take the concept of pulling rich data into Excel to the next level.

InstaFact or (Skydance) allowed people to access data from the knowledge graph in their Word documents and Excel spreadsheets. For example, when a user wrote a sentence such as "Peyton Manning played for" in Word, Instafact immediately provided the information Indianapolis Colts and Denver Broncos.  Similarly, when a user inputs a few names such as Labrador, Boxer, Brittany, Chihuahua, and Yorkshire in the cells of a row or column in an Excel spreadsheet, the backend AI disambiguation engine figured out that the names in this example refer to dog breeds.

Machine Learning and Knowledge Graphs allowed the user to retrieve information instantly about the height, weight, fur color, and other breed characteristics by using natural language in the table header and then use the full power of Excel to examine the data retrieved from the graph.

These work with other features in Excel as well – for instance, here's an example with all the U.S. states – with some U.S. Census data showing the % of population change pulled out into a column. You can create a Map Chart in a single click.

A map of the U.S. showing % population changeA map of the U.S. showing % population change

If you want to filter that column of States by time, you can tell the Filter to operate on Time Zone by changing the selection from the dropdown at the top

The Select field dropdown lets you work with a field from column of data typesThe Select field dropdown lets you work with a field from a column of data types

…and just like that, the table and map update.

The filter is applied, and the map updates, without needing to pull fields into the gridThe Filter is applied, and the map updates without needing to pull fields into the grid.

It's not just States or Countries either – we support things like Zip Codes, Cities, and other types like Stocks, Index Funds, and other financial data (to use these, just type in some ticker symbols, fund names, or company names, and hit the Stock button). This data is refreshable as well – for example, many of the Stocks will fetch up to date prices when the market is open, and you Refresh.

As you try this feature, you may notice the intelligent conversions sometimes aren't sure what to convert to. In those cases, Excel will ask you to specify which data type should be returned from the service. For example, the city Portland works fine when it's a list of other cities that are nearby, but when it's in a blank grid, with no different textual context, then Excel will ask you which Portland you meant. You can always change the data type via the right-click menu as well. Read more about how to use data types in this article.

Ambiguous cases are detectedAmbiguous cases are detected.

With the introduction of these two data types, Excel moves beyond just Text and Numbers -into much richer data types that don't always have to be single, flat, value. For now, we're starting with these two domains of data, but we'll be adding more over time, including those based on data unique to your organization. Also, data types are only one of a new wave of intelligent features coming to Excel - Insights was another that we launched recently. Read about Insights and other smart features in Excel.

Our team built and tested different machine learning models to see which ones gave the most accurate results. If you write "Rehnquist, Thomas, and Kennedy" without context, the tool knows you're referring to three Supreme Court justices and that "Thomas," a widespread first and last name, is Clarence Thomas. But because algorithms and data can be wrong, the tool mitigates for ambiguity by sometimes also giving users a list of facts to choose from.

We can help people by not just correcting spelling, but by giving them facts and new types of knowledge and making everyone's lives easier,

The geography and stocks data types allow users to pull information from Microsoft's extensive Knowledge Graph and insert it inside their spreadsheets. The general idea is to make Excel smart enough to understand some entries and offer additional information, Kirk Koenigsbauer, Microsoft's corporate vice president for the Office team, wrote in a blog post today.

For example, after adding a list of cities to a sheet, click on the Geography button would bring up a list of all the data Microsoft has on those locations, which can be accessed directly from within Excel. It could be not just city, state, or country but other attributes for county or zip code locations. Still, for the town, it includes information such as a city's population, area, the median income of its residents, and so on.
Graph Machine Learning
Excel also draws on Microsoft's artificial intelligence capabilities to help define any ambiguous entries. For example, if a user enters the commonplace name "Springfield" in a list of cities, Excel will reference the correct one depending on the context, or else it will prompt the users to choose which one they mean – such as "Springfield, Massachusetts," or "Springfield, Missouri."... "
 
We had multiple challenges dealing with the biggest Knowledge Graph in the world. Microsoft is first in terms of size even bigger than Google :) however, the full graph is not exposed outside of Microsoft and mostly used 10-20% for Bing.com.
Graph Knowledge in Microsoft Excel Satori
Here are some NLP challenges with disambiguations:
  1. Handling table format; invariance properties
  2. Forcing segment consistency for some columns/rows in tables (at the same time with semantic consistency)
  3. Speed (fast answers regardless of input type/size)
  4. Employing local context aggressively
  5. Incremental disambiguation
  6. Usable confidence measure / in-segment ranking
  7. Improving coverage beyond Wikipedia, ~5M Entities, ~6.5M Topics plus 1M entities more from other Knowledge.
  8. Handling any new domain of interest
  9. Case-insensitivity for entity mentions in table cells
  10. Spelling correction with entity context
 Geography ML Knowledge DataFinance ML Knowledge Data
One of the other great contributions to Excel was our team Fantasy Football template in 2019:
  1. Players
  2. Draft Board
  3. Team Schedule
  4. Player Positions
  5. Player Team
Excel: The original lifehack, dating back to the offline era of fantasy football, is still in use today by savvy and data-driven GMs. But if you’re better at slicing and dicing last season’s stats than spreadsheet formatting, let someone else do the heavy lifting and download one of the millions of free templates available in the application or searchable on Bing. Here is the download link: https://aka.ms/msftFFLfile
 
Knowledge Graph Machine Learning
 
Microsoft Knowledge Graph ahs Financial stock History data for 40 years...

How can we improve Excel for Windows (Desktop Application)?

All this is work was because we in Microsoft listened to the User Voice with more than 1000 people voting for the feature: https://excel.uservoice.com/forums/304921-excel-for-windows-desktop-application/suggestions/32223604-pull-current-stock-prices-and-historical-data-into 
 
Here are couple examples for Office365 or M365 subscribers:
  • =STOCKHISTORY("MSFT", TODAY()-5, TODAY(), 0, 2, 0, 1, 2, 3, 4, 5)
  • =STOCKHISTORY("USDCAD", TODAY()-5, TODAY(), 0, 2, 0, 1, 2, 3, 4, 5)
 
 Knowledge Graph ML
"Financial investments are among the more important things we need to track in everyday life, and millions of people choose Excel to manage their budgets and track their assets. To help make this seamless, last year we introduced Stocks, a Data Type in Excel powered by artificial intelligence (AI), which turns a stock ticker into an interactive entity with layers of rich information like price, change, currency, and much more."
  
 
Our work was presented in the earnings call with Microsoft investors by Amy Hood, who is a Chief Financial Officer (CFO) for Microsoft Corporation.
My role as a technical project lead in Skydance is to build a scalable ground infrastructure, including service for NER training and NLP pipeline:
  • Microsoft Researcher Silviu-Petru Cucerzan was instrumental in this project and built a core functionality called NEMO, and I helped him with AI and Machine Learning to Bing and built a finance knowledge base for his patented invention, and he is famous in the community for entity recognition work https://www.microsoft.com/en-us/research/people/silviu/.
  • Silviu was my mentor and manager in Microsoft for few years and left back to Microsoft Research after this, here are some other of his ML projects https://www.aclweb.org/anthology/people/s/silviu-cucerzan/
  • Shipped Historical Stocks Pipelines for Excel and integrated it with Satori Knowledge Graph for conflation, data mining, and data science. 
  • Created scalable service for state of the art natural language processing project NEMO; improved backend with 15% less loading time, 5% less memory, and 50% faster performance without a change in precision or recall.
  • I created the Knowledge Graph Machine Learning pipeline for web tables extraction over 800M tables from the web and Wikipedia data.
  • Mentored and led a team of four to determine a content type and generate attributions or encumbrances automatically. Drove a team developed internal Encumbrace Knowledge Graph Admin portal and encumbrance rules for the machine learning in the graph, on Bing.com

References:

  1. Expanded Skydance: https://www.microsoft.com/en-us/microsoft-365/blog/2018/03/29/new-in-march-rich-data-types-intelligent-search-and-expanded-datacenters/
  2. Microsoft Hackaton: https://news.microsoft.com/features/microsoft-hackathon-2016-winner-instafact-uses-bing-knowledge-graph-to-help-people-do-more/#sm.0000ioucc6oncehr10tgptcfnlfyf
  3. Excel Fantasy Football https://techcommunity.microsoft.com/t5/excel-blog/fantasy-football-draft-materials-brought-to-you-by-excel/ba-p/236551

Graph Knowledge in Microsoft Excel