Introduction: NLP and news categorization
In applications of business intelligence, news articles are an important source for relevant and timely information. Methods from natural language processing (NLP) and text mining can be used to analyze these data and extract relevant insights. For example, news articles can be used to gauge public sentiment — see my previous blog post.
NLP methods can also help the analyst explore a large collection of news articles more efficiently by detecting events and trends [Panagiotou et al. 2022], or summarizing key points [Ma et al. 2022].
Removing irrelevant results such as fake news [Capuano et al. 2023] can also reduce the data deluge. Conversely, the analyst greatly benefits from a system that helps them focus on the news most relevant to their domain.
One approach to identify the articles that are most likely to contain relevant information is automated news categorization. Technology and trend scouts in particular, who wish to collect data that informs innovation strategy, are particularly interested in news that belong to one or more of the following categories, which we can also refer to as genres:
- Market research reports provide in-depth analysis of a specific industry or market landscape, including assessments of competitive dynamics, trend forecasts, and opportunities. Example: “Global Plug-In Hybrid Electric Vehicle Market Report 2023: Rapid Adoption of EVs Points to Sustained and Substantial Growth”
- Startup news cover new technology companies, venture capital funding, new products and services from startups, startup accelerators and incubators, and stories of startup success and failure. Example: “Figure raises $70M to commercialize humanoids”
- News on business relations, partnerships, and mergers and acquisitions focus on strategic alliances between companies, joint ventures, and company acquisitions. These stories provide insight into how the competitive landscape may be changing and what new synergies or capabilities may emerge. Example: “Mullen Automotive Inc. (NASDAQ: MULN) Announces Strategic Partnership With Amerit Fleet Solutions”
- Consumer and product news include reviews and announcements of new technology products, coverage of major product fairs and shows, and insights into what features and designs are most appealing to customers. Example: “2023 BYD Atto 3 review”
- Legal news report on new legislation, regulations, government policies, lawsuits, and intellectual property issues that could impact technology and business. Example: “France bans short-haul flights on domestic routes to curb emissions”
A simple technique for automated news categorization is the search for keywords. For example, we can expect a news article that contains one of the keywords “startup”, “venture capital”, or “angel investor” to be an article that belongs to the genre of startup news.
The goal of this report is to compare the performance, in terms of accuracy and runtime, of traditional keyword search for news categorization with state-of-the-art methods from machine learning.
Trend signal detection
In addition to the news categories listed in the previous section, we want to detect trend signals which we understand as news articles that describe events, claim facts, or reflect on opinions that point to the potential development of significant change in the landscape of innovation and technology. In other words, trend signals can be understood as precursors to emerging trends.
The news genre of trend signals is very broad, and may refer to any of the following sub-categories. Some of those sub-categories may have a large overlap with one or more of the news genres defined in the previous section.
- Science and Technology
1a. Novel materials or methods. News articles discussing the development and launch of new, innovative manufacturing techniques, as well as newly-created materials that can improve products, services, or technologies. Example: ‘Smart plastic’ material is step forward toward soft, flexible robotics and electronics
1b. Advancements in efficiency or effectiveness. Articles covering successful improvements to existing products or technologies regarding functionality, adaptability, performance, or usability. Example: New battery tech boosts EV range by 20%
1c. Innovative applications of existing technologies. Articles reporting creative new uses of current technologies or products, giving them alternative purposes. Can relate to recycling or repurposing byproducts, materials, applications or systems. Example: 3 Surprising Uses for Depleted EV Batteries
1d. Scientific discoveries and breakthroughs. Articles covering major discoveries in scientific research, new inventions, advancements or discoveries that solve problems, reduce costs, or enable new applications. Example: Scientists break world record for solar power window material
- Economics and Politics
2a. Startups. Articles profiling innovative new startups with proprietary technologies, techniques, designs or materials that address industry challenges, reshape manufacturing, or promote sustainability. May detail a startup’s work, partnerships, or funding.
2b. Mergers, acquisitions, and partnerships. Articles covering companies investing in startups, collaborating strategically, merging, being acquired, going public, or raising capital to enable product development or launches.
2c. Policy changes, new legislation, and funding opportunities. Articles announcing government decisions, policies, public contracts, funding programs, laws, regulations, or economic policies affecting specific industries or markets. Policies may relate to changes in political leadership.
2d. New market entrants. Articles covering the emergence of new competitors in existing markets, including large firms expanding into new sectors or new firms entering established markets. Example: Tesla opens its EV charging network to the masses
- Society and Markets
3a. “Hype” or “buzz” surrounding technologies or high-tech products. Articles discussing temporary surges of attention for contemporary emerging technologies or products among researchers, industry players, policymakers, or users. Example: Five technology trends that will define the future of EVs
3b. Events or claims influencing public opinion of technologies or players. Articles reporting unforeseen news or events that negatively or positively impact public perception of specific technologies, companies, or industry leaders. Could cover accidents, lawsuits, misconduct allegations, product defects, or reputation-building announcements. Example: 10 Dirty Truths Of Electric Cars Nobody Is Talking About
3c. Launch of new high-tech products. Articles announcing the release of new technology products, systems, materials, techniques, features or designs that provide additional functionality, applications or benefits. Example: The New Abarth 500e: The Scorpion Stings Again, Now In Full Electric Mode
Trend signals can be strong signals, i.e., about events that are widely reported on. As a result, many trend signals may point to the same event, and cluster analysis can help identify those events.
What makes trend signal detection a powerful concept, however: trend signals are not defined by signal strength, and can therefore also be early signals or weak signals. More traditional methods for trend detection based on time series analysis have a difficult time detecting early signals because there is no emerging trend yet that could be identified in a robust manner. Similarly, weak signals are often drowned out by noise, which makes them difficult to detect by unsupervised analysis of the data stream alone.
Read the full article on medium.