Featured Post

What is the purpose of the php.ini file?

  The PHP configuration file,   php.ini , is the final and most immediate way to affect PHP's functionality. The php.ini file is read ea...

SOLID Principles in PHP

This post is to provide an easy to understand idea about the implementation of one of the most “talked about” principles of Object Oriented Programming - SOLID principles. As a programmer, there might come a time when we are so consumed in finishing the task at hand that we forget the world around us. And in the very same world, somewhere lost are the programming principles that we read about and planned to use next time. So let’s try to look at the SOLID principles in one of the simplest way possible so that we can practice them while coding.

So let’s dig in to the SOLID principles at a glance.

  1. Single Responsibility: A Class should be responsible for a single task.
  2. Open-Close Principle: A Class should be open to extension and close to modification.
  3. Liskov Substitution: A derived Class can be substituted at places where base Class is used.
  4. Interface Segregation: Don’t make FAT Interfaces. i.e. Classes don’t have to override extra agreements that are not needed for that Class simply because it is there in interface.
  5. Dependency Inversion: Depend on abstractions, not on concretions. Not only high level Classes but low level Classes also depend on the abstractions in order to decouple the code.

ENUM and SET Constraints

 ENUM and SET columns provide an efficient way to define columns that can contain only a given set of values. See Section 11.3.5, “The ENUM Type”, and Section 11.3.6, “The SET Type”.

Unless strict mode is disabled (not recommended, but see Section 5.1.11, “Server SQL Modes”), the definition of a ENUM or SET column acts as a constraint on values entered into the column. An error occurs for values that do not satisfy these conditions:

  • An ENUM value must be one of those listed in the column definition, or the internal numeric equivalent thereof. The value cannot be the error value (that is, 0 or the empty string). For a column defined as ENUM('a','b','c'), values such as '''d', or 'ax' are invalid and are rejected.

  • SET value must be the empty string or a value consisting only of the values listed in the column definition separated by commas. For a column defined as SET('a','b','c'), values such as 'd' or 'a,b,c,d' are invalid and are rejected.

Errors for invalid values can be suppressed in strict mode if you use INSERT IGNORE or UPDATE IGNORE. In this case, a warning is generated rather than an error. For ENUM, the value is inserted as the error member (0). For SET, the value is inserted as given except that any invalid substrings are deleted. For example, 'a,x,b,y' results in a value of 'a,b'.

The ENUM Type

 An ENUM is a string object with a value chosen from a list of permitted values that are enumerated explicitly in the column specification at table creation time.

See Section 11.3.1, “String Data Type Syntax” for ENUM type syntax and length limits.

The ENUM type has these advantages:

  • Compact data storage in situations where a column has a limited set of possible values. The strings you specify as input values are automatically encoded as numbers. See Section 11.7, “Data Type Storage Requirements” for storage requirements for the ENUM type.

  • Readable queries and output. The numbers are translated back to the corresponding strings in query results.

and these potential issues to consider:

  • If you make enumeration values that look like numbers, it is easy to mix up the literal values with their internal index numbers, as explained in Enumeration Limitations.

  • Using ENUM columns in ORDER BY clauses requires extra care, as explained in Enumeration Sorting.

Comparison of B-Tree and Hash Indexes

 Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the MEMORY storage engine that lets you choose B-tree or hash indexes.

B-Tree Index Characteristics

A B-tree index can be used for column comparisons in expressions that use the =>>=<<=, or BETWEEN operators. The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character. For example, the following SELECT statements use indexes:

SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';

In the first statement, only rows with 'Patrick' <= key_col < 'Patricl' are considered. In the second statement, only rows with 'Pat' <= key_col < 'Pau' are considered.

The following SELECT statements do not use indexes:

SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE other_col;

In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

If you use ... LIKE '%string%' and string is longer than three characters, MySQL uses the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then uses this pattern to perform the search more quickly.

InnoDB and MyISAM Index Statistics Collection

 Storage engines collect statistics about tables for use by the optimizer. Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value. For optimizer purposes, an important statistic is the average value group size.

MySQL uses the average value group size in the following ways:

  • To estimate how many rows must be read for each ref access

  • To estimate how many rows a partial join produces, that is, the number of rows produced by an operation of the form

    (...) JOIN tbl_name ON tbl_name.key = expr

As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table. When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.

The average value group size is related to table cardinality, which is the number of value groups. The SHOW INDEX statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table.

For a join based on the <=> comparison operator, NULL is not treated differently from any other value: NULL <=> NULL, just as N <=> N for any other N.

However, for a join based on the = operator, NULL is different from non-NULL values: expr1 = expr2 is not true when expr1 or expr2 (or both) are NULL. This affects ref accesses for comparisons of the form tbl_name.key = expr: MySQL does not access the table if the current value of expr is NULL, because the comparison cannot be true.

For = comparisons, it does not matter how many NULL values are in the table. For optimization purposes, the relevant value is the average size of the non-NULL value groups. However, MySQL does not currently enable that average size to be collected or used.

For InnoDB and MyISAM tables, you have some control over collection of table statistics by means of the innodb_stats_method and myisam_stats_method system variables, respectively. These variables have three possible values, which differ as follows:

  • When the variable is set to nulls_equal, all NULL values are treated as identical (that is, they all form a single value group).

    If the NULL value group size is much higher than the average non-NULL value group size, this method skews the average value group size upward. This makes index appear to the optimizer to be less useful than it really is for joins that look for non-NULL values. Consequently, the nulls_equal method may cause the optimizer not to use the index for ref accesses when it should.

  • When the variable is set to nulls_unequalNULL values are not considered the same. Instead, each NULL value forms a separate value group of size 1.

    If you have many NULL values, this method skews the average value group size downward. If the average non-NULL value group size is large, counting NULL values each as a group of size 1 causes the optimizer to overestimate the value of the index for joins that look for non-NULL values. Consequently, the nulls_unequal method may cause the optimizer to use this index for ref lookups when other methods may be better.

  • When the variable is set to nulls_ignoredNULL values are ignored.

If you tend to use many joins that use <=> rather than =NULL values are not special in comparisons and one NULL is equal to another. In this case, nulls_equal is the appropriate statistics method.

Verifying Index Usage

 Always check whether all your queries really use the indexes that you have created in the tables. Use the EXPLAIN statement, as described in Section 8.8.1, “Optimizing Queries with EXPLAIN”.


Popular Posts