0

If I have a table Application_user with 100 columns, what is the performance difference between two queries:
Query 1:
select * from Application_user;
Query 2:
select a.col1.a.col2,a.col3........ from Application_user a;

Performance difference between two queries. And reason behind it

9
  • 3
    Performance isn't the main reason you should avoid using SELECT * in a query...
    – Dai
    Commented Dec 9, 2023 at 15:10
  • Yes but this was my interview question, performance difference Commented Dec 9, 2023 at 15:13
  • If it's a column store database, reading fewer columns results in reading fewer pages from the disk. If you have a single index covering the chosen columns, it results in reading fewer pages from disk. Usually (in row store databases) there is no difference. But, as a whole, you're better of learning about join methods (hash, nested loop, merge, etc), indexes, and execution plans.
    – MatBailie
    Commented Dec 9, 2023 at 15:13
  • 1
    I'd expect same execution plan.
    – jarlh
    Commented Dec 9, 2023 at 19:23
  • 1
    * will be expanded internally at parse time to the full list of visible columns for that table or view. Perhaps the interviewer was thinking of the cost of doing the dictionary lookup to find all the columns, but that will be negligible and the parser has to do the same checks either way. So short answer is no difference. Commented Dec 10, 2023 at 12:56

3 Answers 3

1

Others have commented sufficiently on the code-breaking danger of SELECT *. As your question regards performance however, there are some nuances to the proper answer.

  1. If you are using an Exadata or other engineered system that supports Hybrid Columnar Compression (HCC), selecting more columns than you really need will have a negative performance impact, as it prevents column projection from reducing the I/O back to the database server, and there is a cost to reassembling rows from columnar-stored data.

  2. If your table is overly wide (more than 254 columns or rows exceed the block size) and you have chaining as a result, selecting columns from the right-end of the column list where it may be broken off to a different block will incur a steep performance penalty for the single-block lookups required to obtain those columns. If you only need a few columns and they are in the original block it's far better to list the columns needed so that extra block read can by avoided.

  3. If there are any LOBs in the table in columns you don't need, pulling them unnecessarily will add a lot of extra processing that you don't need.

  4. If you are fetching results over the network, the more columns you pull the more network overhead you will incur.

  5. If you have a concatenated (multi-column) index that includes all the columns your query really needs, you can potentially avoid hitting the table altogether. Similarly several indexes that collectively have all the columns you need can also be used to avoid hitting the table by being joined together. But if you use *, unless the index includes every column in the table, you won't qualify for these possible optimizer tricks, so again, you could suffer in performance.

So, there a numerous performance considerations for listing only the columns actually needed instead of being lazy and defaulting to *. However, all of the above only applies to the situation where you don't need all the columns. If you do need them all, then there is no performance difference between specifying them all individually and using *. Oracle has to look up the columns in the dictionary either way.

So, don't use select *. It's a convenient time-saver for ad-hoc exploration of tables, but shouldn't be used in code. The one exception where it is just fine is within an EXISTS subquery: WHERE EXISTS (SELECT * ....). Oracle knows you don't actually need any of the columns in this situation and won't actually try to access any of the columns except only to determine if a row exists. It also can't break anything when used in this way. But I'd avoid it in all other situations.

1

In case of Oracle:

There were times when evaluation of SELECT COUNT(*) from .. was slower than SELECT COUNT(1) from .... Oracle had to perform additional lookup to translate '*' onto column list and this information was then discarded.

Those times are long gone. If there is any performance difference between SELECT * and SELECT col1, col2, ... then those differences are hardly measurable.

Otherwise SELECT * is known SQL anti-pattern. It does not provide stable interface and addition of column can break application which is not ready to deal with it.

More over - in rare cases - if you return more columns than is necessary, a high frequency query can return big result set exceeding bandwidth of networking adapter or memory limit of GC on app server side.

0

MySQL...

The question is imprecise. Does the second formulation include all the columns?

If it does include all the columns, then there is no noticeable performance difference. (The * converts to the second formulation, but the * requires a tiny bit of network transit time.)

If the second is missing some columns, then it may may a big difference if the omitted column(s) are TEXT or BLOB. The second may be faster.

And, yes, "best practice" says not to use *.

Not the answer you're looking for? Browse other questions tagged or ask your own question.