Close Menu
  • Home
  • Learn
  • Web Hosting
  • Website Optimization
  • Elementor
  • Tech Jobs
  • Consultations NEW
  • More
    • About
    • Contact
    • Artificial Intelligence
    • CDN
    • Deals & Discounts
    • eCommerce
    • Movies & TV Shows
    • MyListing
    • Small Business
    • Themes & Templates
    • Tools
      • Internet Speedtest
      • VPN
    • Voxel
    • VPN
    • Web Hosting Services
    • Web Security
    • WooCommerce
    • WordPress
Tags
Analytics Archive auctions wordpress theme Backups Business business directory ChatGPT city guide classified Code Editors cPanel Crocoblock Deals directory Discord Discounts dokan ecommerce education wordpress theme Featured FTP Generative AI Google Cloud Google DeepMind grocery multivendor learning management system LiteSpeed Cache lms marketplace Matomo multi-vendor Opera PHP Plugin Update responisve shopify theme Sale SEO shop SSH Trending Updates Web Hosting woocommerce wordpress World Backup Day
Facebook X (Twitter) YouTube
Binary Blackboard
  • Home
  • Learn
  • Web Hosting
    LiteSpeed Cache vs WP Rocket

    LiteSpeed Cache vs WP Rocket

    August 3, 2023
    Storage racks aligned in a computer server room.

    Shared Web Hosting: Is It the Right Choice for Your Website?

    June 10, 2023
    Memorial day seal with the word deal next to it

    Memorial Day Weekend Deals

    May 25, 2023
    Woman holding a laptop as she works on web hosting servers

    Crucial Things to Know When Choosing Web Hosting Services

    March 27, 2023
    This is the A2 Hosting logo. It says “A2 Hosting Our Speed Your Success.”

    Switching to cPanel’s Jupiter Theme

    March 27, 2023
  • Website Optimization
    Logo for Elementor

    Automatically Clear Elementor Cache and Regenerate CSS

    July 25, 2023
    Screenshot of a macOS shortcut

    Website Speedtest macOS Shortcuts

    June 24, 2023
    New method accelerates data retrieval in huge databases

    New method accelerates data retrieval in huge databases

    March 15, 2023
    LiteSpeed Cache plugin settings dashboard

    LiteSpeed Cache Settings for Voxel

    March 9, 2023
    Logo for Redis Cache

    Are You Using Redis Cache on Your Website?

    March 8, 2023
  • Elementor
    Logo for Elementor

    Automatically Clear Elementor Cache and Regenerate CSS

    July 25, 2023
    Elementor helpful tips

    Unlock the Full Potential of Elementor with These 10 Advanced Tips

    May 20, 2023
    Logo for Elementor

    Master the Art of Web Design with Elementor Pro

    May 20, 2023
    Elementor CSS Print Method Settings

    What Is CSS Print Method in the Elementor Settings? Which Should I Choose?

    May 18, 2023
    Widgets for the Elementor page builder

    Remove Unused Elementor Widgets

    January 15, 2023
  • Tech Jobs
  • Consultations NEW
  • More
    • About
    • Contact
    • Artificial Intelligence
    • CDN
    • Deals & Discounts
    • eCommerce
    • Movies & TV Shows
    • MyListing
    • Small Business
    • Themes & Templates
    • Tools
      • Internet Speedtest
      • VPN
    • Voxel
    • VPN
    • Web Hosting Services
    • Web Security
    • WooCommerce
    • WordPress
Binary Blackboard
Home»Website Optimization»New method accelerates data retrieval in huge databases
Website Optimization

New method accelerates data retrieval in huge databases

March 15, 20237 Mins Read10
Facebook Twitter Pinterest LinkedIn Email WhatsApp Reddit
New method accelerates data retrieval in huge databases

Hashing is a core operation in most online databases, like a library catalog or an e-commerce website. A hash function generates codes that directly determine the location where data would be stored. So, using these codes, it is easier to find and retrieve the data.

However, because traditional hash functions generate codes randomly, sometimes, two pieces of data can be hashed with the same value. This causes collisions — when searching for one item points a user to many pieces of data with the same hash value. It takes much longer to find the right one, resulting in slower searches and reduced performance.

Certain types of hash functions, known as perfect hash functions, are designed to place the data in a way that prevents collisions. But they are time-consuming to construct for each dataset and take more time to compute than traditional hash functions.

Since hashing is used in so many applications, from database indexing to data compression to cryptography, fast and efficient hash functions are critical. So, researchers from MIT and elsewhere set out to see if they could use machine learning to build better hash functions.

They found that, in certain situations, using learned models instead of traditional hash functions could result in half as many collisions. These learned models are created by running a machine-learning algorithm on a dataset to capture specific characteristics. The team’s experiments also showed that learned models were often more computationally efficient than perfect hash functions.

“What we found in this work is that in some situations, we can come up with a better tradeoff between the computation of the hash function and the collisions we will face. In these situations, the computation time for the hash function can be increased a bit, but at the same time, its collisions can be reduced very significantly,” says Ibrahim Sabek, a postdoc in the MIT Data Systems Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Their research, which will be presented at the 2023 International Conference on Very Large Databases, demonstrates how a hash function can be designed to significantly speed up searches in a huge database. For instance, their technique could accelerate computational systems that scientists use to store and analyze DNA, amino acid sequences, or other biological information.

Sabek is the co-lead author of the paper with Department of Electrical Engineering and Computer Science (EECS) graduate student Kapil Vaidya. They are joined by co-authors Dominick Horn, a graduate student at the Technical University of Munich; Andreas Kipf, an MIT postdoc; Michael Mitzenmacher, professor of computer science at the Harvard John A. Paulson School of Engineering and Applied Sciences; and senior author Tim Kraska, associate professor of EECS at MIT and co-director of the Data, Systems, and AI Lab.

Hashing it out

Given a data input, or key, a traditional hash function generates a random number, or code, that corresponds to the slot where that key will be stored. To use a simple example, if there are 10 keys to be put into 10 slots, the function would generate an integer between 1 and 10 for each input. It is highly probable that two keys will end up in the same slot, causing collisions.

Perfect hash functions provide a collision-free alternative. Researchers give the function some extra knowledge, such as the number of slots the data are to be placed into. Then it can perform additional computations to figure out where to put each key to avoid collisions. However, these added computations make the function harder to create and less efficient.

“We were wondering if we know more about the data — that it will come from a particular distribution — can we use learned models to build a hash function that can actually reduce collisions?” Vaidya says.

A data distribution shows all possible values in a dataset and how often each value occurs. The distribution can be used to calculate the probability that a particular value is in a data sample.

The researchers took a small sample from a dataset and used machine learning to approximate the shape of the data’s distribution or how the data are spread out. The learned model then uses the approximation to predict the location of a key in the dataset.

They found that learned models were easier to build and faster to run than perfect hash functions and that they led to fewer collisions than traditional hash functions if data are distributed in a predictable way. But if the data are not predictably distributed because gaps between data points vary too widely, using learned models might cause more collisions.

“We may have a huge number of data inputs, and the gaps between consecutive inputs are very different, so learning a model to capture the data distribution of these inputs is quite difficult,” Sabek explains.

Fewer collisions, faster results

When data were predictably distributed, learned models could reduce the ratio of colliding keys in a dataset from 30 percent to 15 percent, compared with traditional hash functions. They were also able to achieve better throughput than perfect hash functions. In the best cases, learned models reduced the runtime by nearly 30 percent.

As they explored the use of learned models for hashing, the researchers also found that throughput was impacted most by the number of sub-models. Each learned model is composed of smaller linear models that approximate the data distribution for different parts of the data. With more sub-models, the learned model produces a more accurate approximation, but it takes more time.

“At a certain threshold of sub-models, you get enough information to build the approximation that you need for the hash function. But after that, it won’t lead to more improvement in collision reduction,” Sabek says.

Building off this analysis, the researchers want to use learned models to design hash functions for other types of data. They also plan to explore learned hashing for databases in which data can be inserted or deleted. When data are updated in this way, the model needs to change accordingly, but changing the model while maintaining accuracy is a difficult problem.

“We want to encourage the community to use machine learning inside more fundamental data structures and algorithms. Any kind of core data structure presents us with an opportunity to use machine learning to capture data properties and get better performance. There is still a lot we can explore,” Sabek says.

“Hashing and indexing functions are core to a lot of database functionality. Given the variety of users and use cases, there is no one size fits all hashing, and learned models help adapt the database to a specific user. This paper is a great balanced analysis of the feasibility of these new techniques and does a good job of talking rigorously about the pros and cons, and helps us build our understanding of when such methods can be expected to work well,” says Murali Narayanaswamy, a principal machine learning scientist at Amazon, who was not involved with this work. “Exploring these kinds of enhancements is an exciting area of research both in academia and industry, and the kind of rigor shown in this work is critical for these methods to have a large impact.”

This work was supported, in part, by Google, Intel, Microsoft, the U.S. National Science Foundation, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator.

Source: MIT

Share. Facebook Twitter Pinterest LinkedIn Email WhatsApp Reddit

Related Posts

Working Copy - Git client

Working Copy – Git client

Koder Code Editor

Koder Code Editor

Textastic Code Editor

Textastic Code Editor

Leave A Reply Cancel Reply

You must be logged in to post a comment.

Affiliate Envato Wordpress theme banner adEnvato Wordpress theme banner ad
Menu
  • About
  • Contact
  • Developer Tools
  • Deals & Discounts
  • Sitemap
  • Privacy Policy
  • Terms of Service
Tags
Analytics Archive auctions wordpress theme Backups Business business directory ChatGPT city guide classified Code Editors cPanel Crocoblock Deals directory Discord Discounts dokan ecommerce education wordpress theme Featured FTP Generative AI Google Cloud Google DeepMind grocery multivendor learning management system LiteSpeed Cache lms marketplace Matomo multi-vendor Opera PHP Plugin Update responisve shopify theme Sale SEO shop SSH Trending Updates Web Hosting woocommerce wordpress World Backup Day
Facebook X (Twitter) YouTube
  • Privacy Policy
  • Terms of Service
Copyright © 2025 - binaryBlackboard.

Type above and press Enter to search. Press Esc to cancel.