Change Tracking Mixed-State Graphs in EF Core

Real life relationships can be hard and sometimes, in EF Core, they can be hard as well. EF Core's change tracker has very specific behavior with respect to related data but it may not always be what you expect. I want to review some of these behaviors so you have a bit of guidance at hand, although I always recommend that you do some integration testing to be sure that your EF Core code does what you're anticipating.

There are a number of tools at hand to help you out. In fact, you could discover the behavior without ever calling SaveChanges because the key to the behavior is in the change tracker itself. Whatever SQL it executes for you is simply a manifestation of the knowledge stored in the change tracker. However, I still need a database to perform queries, so I'll use the SQLite provider. Why not InMemory? Because some of its persistence behaviors are different from a real database. For example, the InMemory provider updates key properties for new objects when they're tracked, whereas for many databases, those keys aren't available until after the database generates key values for you.

In a previous CODE Magazine article called Tapping into EF Core's Pipeline (https://www.codemag.com/Article/2103051), one of those taps I wrote about was the ChangeTracker.DebugView introduced in EF Core 5. I'll use that to explore the change tracker as I walk through a number of persistence scenarios with related data.

Starting with a Simple One-to-Many

For this example, I'll adopt the small book publishing house data model from my recently released Pluralsight Course, EF Core 6 Fundamentals. This publisher only publishes books written by one author, therefore I have a straightforward one-to-many relationship between author and book. One author can have many books but a book can only ever have one author. My initial classes are defined in the most common way, where Author has a list of Books and the Book type has both a navigation property back to Author along with an AuthorId foreign key.

How your classes are designed can impact behavior. This sample is an explicit choice for a stake in the ground of what to expect from the change tracker.

public class Author
{
    public int AuthorId { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public List<Book> Books { get; set; } = new List<Book>();
}

public class Book
{
    public int BookId { get; set; }
    public string Title { get; set; }
    public Author Author { get; set; }
    public int AuthorId { get; set; }
}

My context is configured to expose DbSets for Author and Book configure my SQLite database, and seed some author and book data. If you want to try this out, the full code is available in the download for this article and on a repository at github.com/Julielerman/CodeMagEFC6Relationships.

There is so much behavior to explore with this one set up. But it's also interesting to experiment with different combinations of navigation properties and foreign keys. For example, if Book had AuthorId but not an Author navigation property, some behavior will be different. It's also possible to minimize the classes and define relationships in the Fluent API mappings; for example, you could remove the Books property from Author, and the Author and AuthorId properties from Book and still have a mapped relationship.

And then even more behavior differences are introduced with nullability. For example, Book.AuthorId is a straightforward integer which, by default, is non-nullable. There's nothing here to prevent you from leaving AuthorId's value at 0. However, the default mappings infer the non-nullable AuthorId to mean that, in the database, a Book must have an Author and therefore AuthorId can't be 0. Your code must control that rule to avoid database inconsistencies (and database errors).

My goal here is to show you that there are so many variations to persist data just on this one specific setup and leave you with the knowledge and tools to determine what to expect from your own domain and data models.

Persisting When Objects are Tracked

Whether or not the change tracker is already aware of the related data affects how it treats that data. Let's look at a few scenarios where a new book is added to an author's collection of books while these objects are being tracked by an in-scope DbContext.

In this first scenario, I've used an instance of PubContext to retrieve an author from the database with FirstOrDefault query. The context stays in scope and is tracking the author. I then create a new book and add it to the author's Books list. Then, instead of calling SaveChanges, I'm calling ChangeTracker.DetectChanges to get the change tracker to update its understanding of the entities it's tracking. SaveChanges internally calls DetectChanges, so I'm just using it explicitly and avoiding an unneeded interaction with the database. Then I use ChangeTracker's DebugView.ShortView to get a simple look at what the context thinks about its entities.

void AddBookToExistingAuthorTracked()
{
    using var context = new PubContext();
    var author = context.Authors.FirstOrDefault();
    var book = new Book("A Great Book!");
    author.Books.Add(book);
    context.ChangeTracker.DetectChanges();
    var dv = context.ChangeTracker.DebugView.ShortView;
}

This is such a straightforward scenario that I'm not surprised by the contents of the DebugView.

Author {AuthorId: 1} Unchanged
Book {BookId: -2147482647} Added FK {AuthorId: 1}

The Author is Unchanged and the Book is marked as Added. It knows that its AuthorId foreign key is 1, and it has a temporary key value that will be fixed up after it gets the new value from the database.

Keep in mind that this temporary key is known only by the context. If you were to debug the book object directly, its BookId is still 0. If you were to call SaveChanges, the newly generated value of BookId would be returned from the database and BookId would get updated in the object and in context's details.

Now, as a matter of witnessing a change in EF Core's behavior, I'll modify the code to do something that will result in a problematic side effect. This is an example of the kind of mistake that's easily made with EF Core if you aren't aware of these nuances.

We know the true state of these objects: The Author is Unchanged and the Book is Added.

What if IntelliSense prompted me with the Update method and I thought “ahhh, I'm updating this author with a new book, so I'll call that method first”? So I've added a call to Update in my logic.

author.Books.Add(book);
context.Authors.Update(author);
context.ChangeTracker.DetectChanges();

The context has determined that because I used the Update method, the author's state should now be set to Modified. It doesn't care that I didn't change any values. It's simply responding to my instruction to Update. And if that object is the root of a graph, as it is in this case, it will apply that Modified state to every object in that graph with one exception: Objects with no key values (like the Book) are, by default, always marked Added. So now the DebugView shows:

Author {AuthorId: 1} Modified
Book {BookId: -2147482647} Added FK {AuthorId: 1}

On SaveChanges, that means you'll get an unneeded command sent to the database to update all of the properties of the author row. This may not seem like a problem in a demo, but could result in performance issues in production. Additionally, if you're using row versioning in this table for auditing purposes, this action will result in misinformation because the user didn't really update the author. EF Core was mistakenly told that they did.

There's something else interesting to show you about the effect of DetectChanges on graphs. Because I'm now using an explicit DbContext method - Update - the ChangeTracker is immediately aware of the author object being passed in. But only the author, which is the root object, not the entire graph.

To be clear: If I call the method with DetectChanges commented out:

context.Update(author);
//context.ChangeTracker.DetectChanges();
var dv=context.ChangeTracker.DebugView.ShortView;

The ChangeTracker is only aware of the Author and not the new Book.

Author {AuthorId: 1} Modified

DetectChanges is critical for ensuring that changes to the graph are comprehended and, again, calling SaveChanges would have rectified that. But this gives you greater insight into the workings of the ChangeTracker API.

In the long run, because the context was still tracking the objects, that call to Update wasn't even necessary. But when I called it anyway, it had a side-effect of forcing EF Core to update the Author that had never been edited.

Same Mixed State Graph, Disconnected

What if the context weren't tracking the objects? For example, when you're writing an ASP.NET Core app and handling data coming in from a request, you're dealing with a new context on each request.

In an ASP.NET Core API controller method, the REST method transforms JSON coming in via a request and automatically transforms it to the expected object. To emulate that, I've just created the resulting object, named existingAuthor, an Author graph with an existing Author whose AuthorId is 2, and a new Book.

var existingAuthor = new Author("Ruth", "Ozeki") { AuthorId = 2 };
existingAuthor.Books.Add(new Book("A Tale for the Time Being"));

An important attribute to note about this graph is that the incoming book not only has no BookId but it also doesn't have an AuthorId. I've created it this way because it's quite possible that incoming data be set up this way and I want to be able to handle that scenario.

Let's explore what happens with the various options for getting a context to be aware of the state of this graph so I can get the author's new book into the database.

Adding the Mixed State Graph

First, of course, I'll need a new PubContext. Then, well, I want to add this graph to the context, right? Let's try that.

using var context =new PubContext();
context.Authors.Add(existingAuthor);

After calling DetectChanges, the DebugView shows me that the ChangeTracker thinks that the Author and the Book are both new and need to be added to the database!

Author {AuthorId: 2} Added
Book {BookId: -2147482647} Added FK {AuthorId: 2}

But wait! The context was smart enough to know that an object with no key value should be added, so why doesn't it also assume that an object with key that has a value must already exist in the database? Well, there are too many reasons that this assumption could fail. EF Core is simply following your instructions: You called Authors.Add, so you must therefore have wanted to Add (i.e., insert) the author!

There's something else interesting to note. The book's AuthorId is now 2. Recall that the book came in without any AuthorId property. Because I passed the entire graph into the context, when I called DetectChanges, EF Core figured out that because of the relationship, the Book.AuthorId should use the key value from the Author object.

The Add method is a problem because I can't insert that author into the database. That will create an error in the database.

Using Update with the Mixed State Graph

What's my next option? Well, another conclusion might be that the author has changed because they have a new book. This reasoning might lead me to the Update method.

context.Authors.Update(existingAuthor);

This isn't the correct path either. You saw this problem earlier. Explicitly calling Update causes every object in the graph to be marked as Modified (except for any that don't have a key value). Again, the Author would be marked Modified and the Book marked Added and you'll get a needless command to update all of the Author's properties sent to the database.

On the other hand, if you know that Author was updated, or you're not concerned about the extra database trip or about audit data, Update would be a safe bet.

Focusing on the Graph's Added Object

You learned (above) that without DetectChanges, the Update (and Add and Remove) methods only acknowledge the root of the graph. So what if I pass in the book to context.Add method instead of the author?

var book = existingAuthor.Books[0];
context.Add(book);
//context.ChangeTracker.DetectChanges();

Because the tracker is only aware of the book, it isn't able to read the Author's key property and apply it to Book.AuthorId as you saw it do earlier. The DebugView shows that AuthorId is still 0:

Book {BookId: -2147482647} Added FK {AuthorId: 0}

What if I added the DetectChanges back into the logic? Well, there's a surprise. That doesn't work either! Book.AuthorId is still 0.

The fact that DetectChanges doesn't fix the foreign key also means that calling SaveChanges - which calls DetectChanges - causes the resulting database command to fail because my design is such that a Book must have an Author. An AuthorId value of 0 causes a foreign key constraint error in the database. Notice that the failure is in the database. EF Core won't protect you from making this mistake which means that again, you need to ensure that your code enforces your rules.

Did you find it strange that pushing the graph into the context using the author object pulled in the entire graph but pushing it in via the book object didn't? Consider these objects more closely. The book was a member of the Author.Books property. When I attached the author, the context was able to traverse into the book object. However, even though the Book class has an Author navigation property, that property wasn't populated with the author object. So when I called context.Add(book), the context wasn't able to detect the author object.

All of these details are hard to keep in your head, even if you knew them once or twice before! I always create integration tests to make sure I haven't forgotten a behavior.

Taking More Control over the Graph's State

There's a way to make this pattern work, however: By explicitly setting the foreign key property because you do have easy access to it. DetectChanges is redundant because the Add method set the state of the Book immediately. Of course, SaveChanges will call that anyway, but again, it's an important behavior to be aware of.

var book = existingAuthor.Books[0];
book.AuthorId = existingAuthor.AuthorId;
context.Add(book);
//context.ChangeTracker.DetectChanges();

One thing I like about just setting the foreign key is that I'm not relying on “magic” to have success with my persistence logic.

Tracking Single Entities with the Entry Method

Here's another place you may be surprised with how EF Core reacts to our incoming graph.

Given that I'm a fan of explicit logic, I'm also a fan of the very explicit DbContext.Entry method. The beauty of the Entry method is that it only pays attention to the root of whatever graph you use as its parameter. It's the only clean way to separate an entity from a graph with EF Core and, because of this strict behavior, you don't have to make guesses about what will happen within a graph.

Yet, it still may surprise you with the mixed state graph. When I use Entry to start tracking the book in my graph:

var book = existingAuthor.Books[0];
context.Entry(book).State=EntityState.Added;

the Entry method ignores the author that's connected to that book. As expected, it sets the state of that book to Added. But because the ChangeTracker is now unaware of the Author object, it can't read existingAuthor.AuthorId to set the book's foreign key property and therefore, book's AuthorId is still 0.

Book {BookId: -2147482647} Added FK {AuthorId: 0}

As you just learned above, your code needs to take responsibility for the foreign key. Therefore, I'll just set it myself before calling the Entry method:

var book = existingAuthor.Books[0];
book.AuthorId= existingAuthor.AuthorId;
context.Entry(book).State=EntityState.Added;

Now there's no question about AuthorId being 2.

Book {BookId: -2147482647} Added FK {AuthorId: 2}

While, yes, there is an extra line of code here, but this makes my logic dependable and gives me confidence. My integration tests give me a lot more confidence, though!

The Handy Attach Method

So far, I've shown you quite a few unsuccessful or “assisted” ways to get that new book inserted into the database when it's part of a mixed state graph. I hope you appreciate this better understanding of not only what to avoid, but also why.

Is there a “best” way to do this or just an easy and dependable pattern? I think the Entry method, along with the explicit code to ensure that the FK is set, is dependable and memorable. As I said, I do prefer the explicit logic. It makes me feel more in control of my code's behavior and less dependent on under-the-covers magic.

There is one other dependable pattern, which is the Attach method. I'll pass my graph into Authors.Attach.

context.Authors.Attach(existingAuthor);
context.ChangeTracker.DetectChanges();

Attach tells the ChangeTracker to just start tracking the graph but not to bother setting EntityState. The “one true rule” you've seen repeatedly now for Add and Update also applies to Attach: Any objects in the graph that have no key value are marked as Added. Now the existingAuthor is connected and being tracked but has the default state of Unchanged. The shiny new book with no BookId becomes Added.

Author {AuthorId: 2} Unchanged
Book {BookId: -2147482647} Added FK {AuthorId: 2}

In this case, Attach is an excellent way to assure the state of both objects correctly and the Book's AuthorId foreign key property. SaveChanges will only send one command, the correct command: an INSERT for the new book.

This works perfectly for this scenario. You know that the Author is unchanged and you're also relying on the rule that any new objects will be handled correctly.

Dealing with Unknown State

I've pointed out a few times now that there's a problem with calling Update on entities that haven't been edited. They get sent to the database with an UPDATE command, which can be a wasted call and possibly have a bad effect on performance.

However, there's an interesting use for Update: when you have objects coming in and you have no idea if they need to be added, updated, or ignored. Notice that I am leaving “deleted” out of that list. You must supply some way of detecting whether an object is to be deleted. In typical controller methods, including those generated by the Visual Studio templates, you do have explicit methods for inserting, updating, and deleting, so there's no mystery.

What if your logic is different from these controller methods and a request passes in an Author object with - or maybe even without - attached objects. I'll use the same code that represents an existing Author to demonstrate:

var existingAuthor = new Author("Ruth", "Ozeki")
{ AuthorId = 2 };

Here's where your own logic can be written to glean some attributes. For example, Author has an AuthorId value of 2. If your business logic is such that you know that any object with an ID value present must have come from the database, then you can at least determine that this object is either unchanged or has been edited.

In this case, if you are given no other clues, updating this Author covers your bases. If any of the data was changed - for example, maybe the user fixed a typo in the Author's last name - the Update will be sure to get that into the database. If none of the data was changed, but you have no idea if that's the case, suddenly that “unnecessary update” may be seen as not a terrible thing. That depends on what kind of performance expectations you have. Maybe the unnecessary updates are too taxing and you have determined that it's smarter to first grab that record from the database and compare its values to the incoming Author before deciding to ignore it or update it. The controller template that uses EF Core with Actions, does make a quick trip to the database but it does that to verify that the row indeed exists in the database before calling for an update and potentially triggering a database error for a nonexistent row.

The bottom line is driven by how much magic you will accept and what type of load there is on your application. From there, you can use your knowledge of EF Core's behavior to choose your path.

So Many Variants to Explore

Keep in mind that although you've learned about a lot of the ChangeTracker behavior in response to various ways of tracking this graph, I've focused on a particular scenario: a mixed state graph where one object came from the database and wasn't modified and the other object was new. I also used pretty standard class definitions. My Author class has a list of books and the Book class has both the Author navigation property and a non-nullable AuthorId foreign key property. And with that, you saw many different effects of persisting these objects, whether they were being tracked from retrieval to saving, or they were disconnected from their original context, as with a Web application.

If you start tweaking other factors, such as removing navigation properties or FK properties or changing the nullability of the foreign key, you'll have another batch of resulting behavior to be aware of.

Remember that I chose to only use DetectChanges directly and not SaveChanges. That's a nice way of testing things out without bothering with the database. Some of those scenarios where it was necessary to call DetectChanges to get the expected behavior will be solved by calling SaveChanges.

As I stated earlier, I can never keep all of these behaviors in my head. I depend on integration tests to make sure things are working as I expect them to. And because I'm not using a database in any of the examples above, you can't write tests without a database provider, not even the InMemory provider. You can just build assertions about the state of the entities within the ChangeTracker.