Give a chestnut! Tableau Tips (74): Use Benford's law to verify the authenticity of data

published: 2021-05-31

Benford’s law Benford’s law,also known as Benford’s Law, shows that among a bunch of real life data, the number with 1 as the first digit is about 30% of the total, which is close to the expected value 1/9 of 3

times. The larger the value, the lower the probability of the number appearing in the first few digits.

The value of Benford's law for data work is that it can be used to check whether there is a problem with the data source.

When fraudsters falsify data, they may not think of creating fake data that conforms to Benford's Law. In some cases, Benford's law can be used to detect falsified data or verify the authenticity of data.

 

So, how to use Benford's law to verify the authenticity of data in Tableau? Here, I will share the method with you.

 

In this issue of "Give a Chestnut", the Tableau technique that Ada wants to share with you is: use Benford’s law to verify the authenticity of the data.

Lizi uses Tableau's own "Sample-Superstore" data source to verify its sales data. 

Step 1:Create calculated field

First, we need to create two necessary calculation fields: "First Number" and "Benford's Law".

◆ First digit: LEFT(STR([Sales]),1)

◆ Benford's law: LOG(INT([first number])+1)-LOG(INT([first number]))

 

Tips: Benford's law states that in the b-carrying system, the probability of a number starting with the number n is (logb(n + 1) − logb(n)). Benford's law applies not only to single digits, but even more The number of bits is also available.

Step 2:Create view

Drag the "first number" to the "column", and drag the "number of records" to the "row";

 

Change the "number of records" quick table calculation to "total percentage";

 

Now, we can see that the Sales field is in the following distribution form, which shows that the field basically conforms to Benford's law.

 

Next, we can perform more operations by adding a reference distribution to accurately view the data.

Step 3:Check the distribution

Drag and drop the "Benford's Law" field to the "Detailed Information" tag card;

 

Change the metric of the capsule to "Minimum";

 

Switch to the analysis pane and drag the "distribution interval" to the "cell" option on the canvas;

 

In the edit dialog box, change the setting of "Calculation-Value". Type "80,100,120" in the "Percentage" field (this will specify the desired interval between 80% to 100% and 100% to 120%), and in the "Percentage" field, select "Minimum (Benford's Law)";

 

Step 4:Configure appearance

The following steps will be used to configure the appearance of the reference interval to facilitate more intuitive viewing of the data results.

"Label" is "None", "Line" is the thinnest available line, "Fill" is "Stop indicator", check "Fill down"; click "OK" after the configuration is complete;

 

Finally, click the "Display Marker Label" function in the function bar to make the percentage figures appear.

From the above figure, we can easily find that although Superstore is the demonstration data that comes with the system, it is also realistic data that meets Benford's law.

The blue bar indicates that the actual percentage of the first digit exceeds 100% of the expected Benford value displayed in the view (values distributed in the green range indicate that the changed range exceeds 100% of the expected Benford value, and the yellow range is between 80% and 100% between).

 

Quickly open your Tableau and give it a try!