Incomplete data is as useless as no data.
Especially true in the case of the patent services industry, insights generated from data can prove both useless and absolutely worthless if the data on the basis of which insights were built was incomplete or deficient.
To give you an instance, let us consider that automaker Toyota wants to get a portfolio analysis done to figure out patents with great market potential. They reach out to an analytics firm and state their request.
The analysts at the firm, fetch patent data from Thomson Innovation (Now known as Derwent) before performing further analysis. Thomson database returns 1698 results and the analysts perform further analysis on the accumulated data set.
The data takes the form of a report, each section highlighting insights that Toyota could use to plan its next business move.
But there is something amiss. The executives at Toyota figured there is no mention of patents of a particular technology, which according to them have very high market potential and were used to protect the core innovations being developed by the company.
There might have been a mistake. With confusion in the room, the report finds its way on multiple desks and someone points out, “The analysis is performed for only 1698 patent families. According to company records, we have a little over 2340 active patent families in our portfolio”.
640+ patent families not finding their way into the report, no wonder the insights were both incomplete and valueless to the company. Maybe the patent families with the highest monetization potential did not find their way to the report and if not cross-referenced, it could have been the case that those patent families would have never met their true potential(metaphorically) and stayed hidden beneath, akin to diamonds in a coal mine.
The implications of incomplete data are huge.
But why does that happen?
Why do Patent databases like Derwent and others not show up complete patent records?
Lack of Assignee Name Standardization
What does that mean? you might ask.
Usually, in patent records, the assignee field of a patent might contain multiple variants of a company’s name. Even a small patentee might be represented by two or more versions of their name containing different corporate designations such as LLC, CO, and Corp.
For instance, a hypothetical company Xyzal could be represented as Xyzal Inc, Xyzal LLC, or Xyzal corp.
While the small patentees face this scenario, the case gets even worse for large corporations with multiple names and branches spanning across the world. For instance, Daimler with its multiple entities has patent records in assignee fields ranging from Daimlerchrysler Ag, Stuggart, De to Daimler-Benz Aktiengesellschaft.
Familiar with the unstructured nature of assignee names and how virtually all companies have multiple variants of their name represented on patents, researchers have to input all possible combinations that they could think of.
But shouldn’t this be the job of machines?
Shouldn’t patent database providers do a thorough job so that researchers do not have to go through the woes of figuring every potential variant of the assignee name?
It should be the case but it isn’t.
Challenges due to lack of Assignee Name Standardization
Though commercial patent databases facilitate assignee names normalization using an algorithm to eliminate redundancies like punctuation anomalies and obvious typographical errors– this does not resolve all the issues.
There are multiple different challenges that researchers have to face with respect to Assignees when working with patent data, including –
- Unassigned patent – Where the Assignee details are not available
- Unknown Assignees – where the Assignee names not available in the published data
- Unclean Assignee Names (Misspelt or different variations in Assignee names)
- Multiple Assignees or Inventor names appearing in the Assignee field
- Mergers and Acquisitions of companies (subsidiaries with different company names)
For a researcher who is bound on time with extremely tight deadlines, these issues can impact the quality of results delivered.
This is a major problem for the final client, the one who hopes to utilize insights gathered from data to make better business decisions.
Incomplete data can lead to bad business decisions.
Also Read: How Xiaomi Built a Patent War Chest — A Complete Patent Portfolio Analysis
How are we trying to solve this problem?
We at GreyB understand the value the insights hold for the clients. Such that the issues don’t become a hindrance in our client’s path to better business decisions, we adopted a mix of unconventional strategies to solve the above-mentioned problems which help us deliver the best results to our clients.
Want to know how we solve these issues? Let’s have a look.
How we Find Assignee Data for Patents with unknown Assignees?
Unknown Assignee- This aspect is seen in US patent applications that do not have an assignee name until right before grant.
We use different strategies to locate unknown Assignees like
- Merging the Assignment and Reassignment data– We merge the latest assignment and reassignment data to figure the current owner of transacted IP, thus enabling us to figure all the patents that are currently assigned to a corporation.
- Locate assignee from US Assignments Database (Only for US records) – One good strategy to figure assignee data is by searching through the US assignment database as there are good chances that an application may already have had an assignment event at the PTO, which could be viewable from the assignment database.
US Assignment database is a highly valuable resource to track ownership data and manual efforts are put to find updated ownership information such as change of name, mergers, or acquisitions. Referring to the database, time and again, helps us in detecting the transacted patents.
For instance, the entire patent portfolio belonging to Corus Group would now belong to Tata Steel as a result of M&A. When portfolio analysis of Tata steel is carried out, the patents that belong to Corus Group are taken into account. Having a track of acquisition data goes a long way in delivering clients the best results when it comes to portfolio analysis for competitive analysis.
- Multiple Assignee names or Inventor name: There are many instances when Multiple inventors or Assignee names that as of present as co-applicants, show up in the Assignee field. To remove such inconsistencies, we use automated tools to rectify the problem.
How do we deal with Unclean Assignee Names?
Commercial databases, based on automated algorithms, though take an attempt at standardization, there are multiple instances when things go amiss.
For instance, assignee names may be in the form of misspellings of assignee names or there might be subtle differences in the naming of a company.
To rectify the situation, we perform manual assignee name standardization and increase accuracy.
For instance, a patent from the portfolio of Volkswagen might have the assignee name as Wolkswagen which could result in the patent never being considered in Volkswagen’s patents. By putting in manual efforts, such inconsistencies are eliminated.
How do we deal with Multiple Variations of an Assignee Name?
We previously mentioned the example of Daimler which has multiple variants of its name on the assignee field of its patents. To rectify the problem, we used standardization techniques- both manual and automatic, such that no patents get missed out just because a researcher missed to input another variant of a company name.
How we Keep Standardization and Normalization Woes at bay?
The above strategies, we highlighted, to curb the issues are a perfect combination of automated algorithms and manual intelligence. This is not it, as we have created our own databases of companies which include private as well as public sector organizations.
The database is an outcome of the Company Names Normalization strategies and additional unconventional efforts put in to identify the parent companies and the subsidiaries. Exploring through SEC filings, annual reports of companies, press releases, and referring to various companies’ databases constitute a few efforts that have helped us build the database.
How does this Database Benefits the End Client, You?
The end client is the one who benefits the most due to the presence of our database. Built to efficiently organize data while removing maximum redundancies, our database of normalized company names is monitored and updated frequently.
Referring to our databases while conducting analyses makes everything from prior art searches to patent landscape portfolio evaluations more accurate and more reliable, which in turn directly benefits you, the client to get better results.
After all, our actions are dependent on data, ain’t it?
Why should you ask your Analytics partner to adopt such practices?
Incomplete data had never been much fruitful and the lack of proper standardization and normalization practices by current databases has led to a scenario where all the data you have access to is 90% times deficient.
And deficient data is bad data, which has its own set of consequences. While spending so much money to get quality insights, you definitely do not want your analytics partners to work on incomplete data, thus fetching you bad results.
Here’s a suggestion – Ask your analytics partner to adopt such practices too. It will be of great benefit.
To Better Results.
Authored by: Geetika Dube, Project Manager, Report Generation.
No related posts found