“Elastic linking” is a very powerful way to search massive databases in real-time. Learn how Riskified uses this technology as part of the fraud review process
Most of you have heard of Tinder: the highly addictive (or so I’ve been told) dating application. Since launching in 2012, the original swiping app has generated over 20 billion matches which, yes, is far more than the number of humans on the planet.
A lot of Tinder’s success can be attributed to very advanced algorithms which ensure that people with high probabilities of mutual interest are shown to each other. That’s right: the profiles displayed to Tinder users are not shown in random order, they are placed very deliberately.
The specifics of the algorithms are kept secret to prevent users from gaming the system and competitors from stealing it. But we do know a big part of their technology is built on a platform called elastic – and Tinder generates 280 million queries on this system every day.
“Elastic linking” is a very powerful way to search massive databases in real-time, and Riskified uses this technology as part of the fraud review process. In this post I’ll explore the power and versatility of this technology, by showing how two companies as different as Tinder and Riskified use it to cross-check data in real-time.
The set-up: Getting the data ready
Tinder has a lot of information about each of their users. Most swipers log in using their Facebook credentials, which delivers personal info like college, favorite bands, number of friends, etc. Additionally, Tinder users specify their physical location, age and gender, plus what they’re looking for in a partner. Many users even go one step further by writing information in a bio. This might provide Tinder with hard data like height and profession, plus a variety of keywords that hint at a user’s personality. All in all, Tinder ends up with hundreds of pages of information about each user.
Riskfied also has no shortage of data. For each order we have hundreds of data points related to behavioral analytics alone. This is in addition to details like: the full name of the buyer, email address, shipping, billing and IP address, device, phone number, user agent, product(s) in the cart and many more.
Both Riskified and Tinder use elastic search to cross-check the swiper or the order against their database. To do this, they summarize all their data into a few dozen ‘nodes’ that are used as search dimensions. For Riskified, it’s crucial that these nodes cast a wide enough net to identify a shopper we’ve seen before even if they’ve changed their address, name, email, device–or all of the above.
But simply matching based on these nodes is no easy task: each Tinder query can generate up to 60,000 potential match suggestions, and a Riskified order may have nodes in common with an equally staggering number of orders. Choosing the best of these options is where elastic linking comes in.
How Tinder uses linking to show the best potential matches
Tinder is very secretive about the details of their algorithms. But it’s likely that they work something like this:
Mr. A and Ms. B (soon to be Mrs. A?) are swiping in San Francisco. Tinder’s algorithms would decide to match them for one of two general reasons:
- Mr. A has previously matched with users similar to Ms. B, and vice versa
- Users with characteristics similar to Mr. A have previously matched with users similar to Mrs. B.
Of course, the first is a stronger case for introducing Mr. A and Ms. B. But that doesn’t narrow it down much; in a densely populated city there could be thousands of matches for each swiper that fits the criteria. So Tinder has to decide on factors that influence the weight of each similarity.
The algorithm assesses both the quantity and quality of the nodes that connect two users. For instance, just because Ms. B once matched with someone who went to LSU, doesn’t mean men who attended LSU are her type. But if she swiped right on a man who used the phrase ‘I enjoy the opera’ when describing himself, and Mr. A also uses the word ‘opera’ in his bio, we might be onto something.
Rarity of the node observation is also important. For example, Mr. A matching with a 5’5” woman doesn’t tell Tinder much, because that’s a pretty common height for a woman. However if he matched with a woman who’s 6’3”, and had a nice conversation with her on the app, Tinder might reasonably infer that he likes tall women.
Riskified: The order matchmaker
The principles behind Riskfied’s cross-reference search are similar.
For each transaction we run an elastic query to get a list of similar historical orders from across our entire merchant ecosystem–a shopper may be new to one merchant, but we may have seen her shopping before at a different site. Just like with Tinder, we’ve decided that some nodes are more important than others: a link between email addresses is very strong, even stronger than full name since there can be two Bob Smiths, but only one firstname.lastname@example.org. The rarity of an observation is also important. A full name match is worth less if the name is ‘Bob Smith’, and more if it’s “Ephraim Rinsky’.
Along these lines, if a shipping address belongs to a one-family home then we’ll see it very rarely, so an order with the same node value is quite likely to be the same shopper. But if it’s the address of a large office building, the link will carry less weight.
Our models also consider combinations of different node matches. An email address or IP match is strong, but not conclusive: An IP address could belong to a public library or university, meaning it’s used by thousands of people a day, and emails can be hacked during account takeover attacks. But if we’ve seen both the same email and IP address in a previous order, it’s a near certainty that both orders were placed by the same shopper.
Finally, timing matters. A link between two orders based on shipping address, for example, is weaker if the orders were placed three years apart than if they were placed the same week. The longer the duration between them, the more likely that the person currently living at that address isn’t the one who lived there when the older order was placed.
Linking is an important factor for our order review, but by no means the only one we consider. For merchants conducting in-house fraud review, I recommend checking out the relevant industry guide to learn about fraud trends specific to their vertical. For further information about Riskifed, request a demo of our fraud solution.