Make SQL Server Collation act similar to SQLite

The default SQL Server database collation (“SQL_Latin1_General_CP1_CI_AS”) sorts some Unicode values as equal when they are not. If you have an nvarchar field defined as part of a primary key or unique index, you can run into some surprise duplicate keys.

In particular, I was loading data from a SQLite database into an Azure SQL (SQL Server) database. I had removed all the duplicates as far as SQLite was concerned, but there were some records that SQL Server complained as being duplicate. From what I can tell, one record used single byte characters for the word “Final” and the other used double-byte characters.

The solution in this case was to change the collation of the field to one that uses a binary sort.

“Latin1_General_100_BIN” seems to work swimmingly. No more strange collisions.

Leave a Reply

Your email address will not be published. Required fields are marked *