A lot of developers think that Group by and distinct works similarly when it comes to getting unique records. and they assume that the performance remains the same in both of the cases.
But this is not the case.
In some cases, Group by and distinct behave similarly but not in every case.
I have discussed this question with plenty of candidates during their interview and my colleagues and most of them are confused about their actual use and performance.
So in this post, I will be explaining the difference between group by and distinct. When these two behave similarly and when differently.
We will be discussing the difference in terms of the results fetched by both of the terms.
Both of the keywords can be used to fetch distinct records from the database but the actual functionality is totally different.
Many people assume that distinct only apply to the very first column after the distinct keyword and fetch the unique records only based on that column only.
But in actual distinct apply to all the columns you have selected in the query. It makes the entire record unique you have fetched not only the single column.
Group by works in a similar way. It makes the record unique on the basis of columns you have included in the group by statement.
Let us understand this by an example-
Assume that we have the following table in our MySQL database –
students- +------+---------+ | id | name | +------+---------+ | 1 | Disha | | 2 | Albert | | 3 | Disha | | 4 | David | | 5 | Andrew | | 6 | Rohit | | 7 | Albert | +------+---------+
Now we will be discussing each case of Group by and distinct with example queries on this table.
mysql> select distinct name from students; +---------+ | name | +---------+ | Disha | | Albert | | David | | Andrew | | Rohit | +---------+
This query will select only the unique name from the table. We have selected only the “name” column. So (Disha) and (Disha) are duplicate records and distinct will result in only one of them.
let’s add the id column to the select statement
mysql> select distinct name, id from students; +--------+-----+ | name | id | +--------+-----+ | Disha | 1 | | Albert | 2 | | Disha | 3 | | David | 4 | | Andrew | 5 | | Rohit | 6 | | Albert | 7 | +--------+-----+
In the above query, you can see that the result contains all the records because it will check the distinctness of the record on the basis of the combination of “id” and “name” columns, not the name only. So in this case (Disha, 1) and (Disha, 3) will be treated to different unique records and the result will include both of the records.
Let’s check another case which many developers have doubts about.
mysql> select id, distinct name from students;
Some people think that this will gives us a list of distinct names but this is not the case as I previously stated the distinct keyword unifies the whole row you have selected. So distinct must be the first keyword after select, otherwise, you will get a Mysql error.
#1054 - Unknown column 'distinct' in 'field list'
Now when we have seen how distinct keyword works, let’s switch to the group by so that you can understand where both of these are similar and where they differ.
mysql> select id, name from students group by name; +------+---------+ | id | name | +------+---------+ | 1 | Disha | | 2 | Albert | | 4 | David | | 5 | Andrew | | 6 | Rohit | +------+---------+
In this query, we get a list of the unique names with id. By default, we get the first occurrence of duplicate records. As in the example, you can see that these are two records for the name “Disha” (1, Disha) and (3, Disha). After group by we will be getting (1, Disha) as it is the first occurrence of records for “Disha”.
Now let’s add the id column to the group by clause and see the result
mysql> select id, name from students group by id, name; +------+---------+ | id | name | +------+---------+ | 1 | Disha | | 2 | Albert | | 3 | Disha | | 4 | David | | 5 | Andrew | | 6 | Rohit | | 7 | Albert | +------+---------+
This query is similar to select distinct id, name from students; and will result in the same output. because we are grouping the records with a combination of id and name. So (1, Disha) and (3, Disha) are treated as two different records now.
Now after result of above queries you might have get an idea that in what conditions group by works as alternative of distinct and in what conditions these two are totally different.
I am the owner of acmeextension. I am a passionate writter and reader. I like writting technical stuff and simplifying complex stuff.