Real life relationships can be hard and sometimes, in EF Core, they can be hard as well. EF Core's change tracker has very specific behavior with respect to related data but it may not always be what you expect. I want to review some of these behaviors so you have a bit of guidance at hand, although I always recommend that you do some integration testing to be sure that your EF Core code does what you're anticipating.
There are a number of tools at hand to help you out. In fact, you could discover the behavior without ever calling SaveChanges
because the key to the behavior is in the change tracker itself. Whatever SQL it executes for you is simply a manifestation of the knowledge stored in the change tracker. However, I still need a database to perform queries, so I'll use the SQLite provider. Why not InMemory
? Because some of its persistence behaviors are different from a real database. For example, the InMemory
provider updates key properties for new objects when they're tracked, whereas for many databases, those keys aren't available until after the database generates key values for you.
In a previous CODE Magazine article called Tapping into EF Core's Pipeline (https://www.codemag.com/Article/2103051), one of those taps I wrote about was the ChangeTracker.DebugView
introduced in EF Core 5. I'll use that to explore the change tracker as I walk through a number of persistence scenarios with related data.
Starting with a Simple One-to-Many
For this example, I'll adopt the small book publishing house data model from my recently released Pluralsight Course, EF Core 6 Fundamentals. This publisher only publishes books written by one author, therefore I have a straightforward one-to-many relationship between author and book. One author can have many books but a book can only ever have one author. My initial classes are defined in the most common way, where Author
has a list of Books
and the Book
type has both a navigation property back to Author
along with an AuthorId
foreign key.
How your classes are designed can impact behavior. This sample is an explicit choice for a stake in the ground of what to expect from the change tracker.
public class Author
{
public int AuthorId { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public List<Book> Books { get; set; } = new List<Book>();
}
public class Book
{
public int BookId { get; set; }
public string Title { get; set; }
public Author Author { get; set; }
public int AuthorId { get; set; }
}
My context is configured to expose DbSets for Author and Book configure my SQLite database, and seed some author and book data. If you want to try this out, the full code is available in the download for this article and on a repository at github.com/Julielerman/CodeMagEFC6Relationships.
There is so much behavior to explore with this one set up. But it's also interesting to experiment with different combinations of navigation properties and foreign keys. For example, if Book had AuthorId
but not an Author
navigation property, some behavior will be different. It's also possible to minimize the classes and define relationships in the Fluent API mappings; for example, you could remove the Books
property from Author
, and the Author
and AuthorId
properties from Book
and still have a mapped relationship.
And then even more behavior differences are introduced with nullability. For example, Book.AuthorId
is a straightforward integer which, by default, is non-nullable. There's nothing here to prevent you from leaving AuthorId
's value at 0
. However, the default mappings infer the non-nullable AuthorId
to mean that, in the database, a Book must have an Author
and therefore AuthorId
can't be 0
. Your code must control that rule to avoid database inconsistencies (and database errors).
My goal here is to show you that there are so many variations to persist data just on this one specific setup and leave you with the knowledge and tools to determine what to expect from your own domain and data models.
Persisting When Objects are Tracked
Whether or not the change tracker is already aware of the related data affects how it treats that data. Let's look at a few scenarios where a new book is added to an author's collection of books while these objects are being tracked by an in-scope DbContext.
In this first scenario, I've used an instance of PubContext
to retrieve an author from the database with FirstOrDefault
query. The context stays in scope and is tracking the author. I then create a new book
and add it to the author's Books
list. Then, instead of calling SaveChanges
, I'm calling ChangeTracker.DetectChanges
to get the change tracker to update its understanding of the entities it's tracking. SaveChanges
internally calls DetectChanges
, so I'm just using it explicitly and avoiding an unneeded interaction with the database. Then I use ChangeTracker
's DebugView.ShortView
to get a simple look at what the context thinks about its entities.
void AddBookToExistingAuthorTracked()
{
using var context = new PubContext();
var author = context.Authors.FirstOrDefault();
var book = new Book("A Great Book!");
author.Books.Add(book);
context.ChangeTracker.DetectChanges();
var dv = context.ChangeTracker.DebugView.ShortView;
}
This is such a straightforward scenario that I'm not surprised by the contents of the DebugView
.
Author {AuthorId: 1} Unchanged
Book {BookId: -2147482647} Added FK {AuthorId: 1}
The Author
is Unchanged
and the Book
is marked as Added
. It knows that its AuthorId
foreign key is 1
, and it has a temporary key value that will be fixed up after it gets the new value from the database.
Keep in mind that this temporary key is known only by the context. If you were to debug the book
object directly, its BookId
is still 0
. If you were to call SaveChanges
, the newly generated value of BookId
would be returned from the database and BookId
would get updated in the object and in context's details.
Now, as a matter of witnessing a change in EF Core's behavior, I'll modify the code to do something that will result in a problematic side effect. This is an example of the kind of mistake that's easily made with EF Core if you aren't aware of these nuances.
We know the true state of these objects: The Author
is Unchanged
and the Book
is Added
.
What if IntelliSense prompted me with the Update
method and I thought “ahhh, I'm updating this author with a new book, so I'll call that method first”? So I've added a call to Update
in my logic.
author.Books.Add(book);
context.Authors.Update(author);
context.ChangeTracker.DetectChanges();
The context has determined that because I used the Update
method, the author's state should now be set to Modified
. It doesn't care that I didn't change any values. It's simply responding to my instruction to Update
. And if that object is the root of a graph, as it is in this case, it will apply that Modified
state to every object in that graph with one exception: Objects with no key values (like the Book
) are, by default, always marked Added
. So now the DebugView
shows:
Author {AuthorId: 1} Modified
Book {BookId: -2147482647} Added FK {AuthorId: 1}
On SaveChanges
, that means you'll get an unneeded command sent to the database to update all of the properties of the author row. This may not seem like a problem in a demo, but could result in performance issues in production. Additionally, if you're using row versioning in this table for auditing purposes, this action will result in misinformation because the user didn't really update the author. EF Core was mistakenly told that they did.
There's something else interesting to show you about the effect of DetectChanges
on graphs. Because I'm now using an explicit DbContext
method - Update
- the ChangeTracker
is immediately aware of the author
object being passed in. But only the author
, which is the root object, not the entire graph.
To be clear: If I call the method with DetectChanges
commented out:
context.Update(author);
//context.ChangeTracker.DetectChanges();
var dv=context.ChangeTracker.DebugView.ShortView;
The ChangeTracker
is only aware of the Author
and not the new Book
.
Author {AuthorId: 1} Modified
DetectChanges
is critical for ensuring that changes to the graph are comprehended and, again, calling SaveChanges
would have rectified that. But this gives you greater insight into the workings of the ChangeTracker API.
In the long run, because the context was still tracking the objects, that call to Update
wasn't even necessary. But when I called it anyway, it had a side-effect of forcing EF Core to update the Author
that had never been edited.
Same Mixed State Graph, Disconnected
What if the context weren't tracking the objects? For example, when you're writing an ASP.NET Core app and handling data coming in from a request, you're dealing with a new context on each request.
In an ASP.NET Core API controller method, the REST
method transforms JSON coming in via a request and automatically transforms it to the expected object. To emulate that, I've just created the resulting object, named existingAuthor
, an Author
graph with an existing Author
whose AuthorId
is 2
, and a new Book
.
var existingAuthor = new Author("Ruth", "Ozeki") { AuthorId = 2 };
existingAuthor.Books.Add(new Book("A Tale for the Time Being"));
An important attribute to note about this graph is that the incoming book not only has no BookId
but it also doesn't have an AuthorId
. I've created it this way because it's quite possible that incoming data be set up this way and I want to be able to handle that scenario.
Let's explore what happens with the various options for getting a context to be aware of the state of this graph so I can get the author's new book into the database.
Adding the Mixed State Graph
First, of course, I'll need a new PubContext
. Then, well, I want to add this graph to the context, right? Let's try that.
using var context =new PubContext();
context.Authors.Add(existingAuthor);
After calling DetectChanges
, the DebugView
shows me that the ChangeTracker
thinks that the Author
and the Book
are both new and need to be added to the database!
Author {AuthorId: 2} Added
Book {BookId: -2147482647} Added FK {AuthorId: 2}
But wait! The context was smart enough to know that an object with no key value should be added, so why doesn't it also assume that an object with key that has a value must already exist in the database? Well, there are too many reasons that this assumption could fail. EF Core is simply following your instructions: You called Authors.Add
, so you must therefore have wanted to Add
(i.e., insert) the author!
There's something else interesting to note. The book's AuthorId
is now 2
. Recall that the book came in without any AuthorId
property. Because I passed the entire graph into the context, when I called DetectChanges
, EF Core figured out that because of the relationship, the Book.AuthorId
should use the key value from the Author
object.
The Add
method is a problem because I can't insert that author into the database. That will create an error in the database.
Using Update with the Mixed State Graph
What's my next option? Well, another conclusion might be that the author has changed because they have a new book. This reasoning might lead me to the Update
method.
context.Authors.Update(existingAuthor);
This isn't the correct path either. You saw this problem earlier. Explicitly calling Update
causes every object in the graph to be marked as Modified
(except for any that don't have a key value). Again, the Author
would be marked Modified
and the Book
marked Added
and you'll get a needless command to update all of the Author
's properties sent to the database.
On the other hand, if you know that Author
was updated, or you're not concerned about the extra database trip or about audit data, Update
would be a safe bet.
Focusing on the Graph's Added Object
You learned (above) that without DetectChanges
, the Update
(and Add
and Remove
) methods only acknowledge the root of the graph. So what if I pass in the book
to context.Add
method instead of the author
?
var book = existingAuthor.Books[0];
context.Add(book);
//context.ChangeTracker.DetectChanges();
Because the tracker is only aware of the book
, it isn't able to read the Author
's key property and apply it to Book.AuthorId
as you saw it do earlier. The DebugView
shows that AuthorId
is still 0
:
Book {BookId: -2147482647} Added FK {AuthorId: 0}
What if I added the DetectChanges
back into the logic? Well, there's a surprise. That doesn't work either! Book.AuthorId
is still 0
.
The fact that DetectChanges
doesn't fix the foreign key also means that calling SaveChanges
- which calls DetectChanges
- causes the resulting database command to fail because my design is such that a Book
must have an Author
. An AuthorId
value of 0
causes a foreign key constraint error in the database. Notice that the failure is in the database. EF Core won't protect you from making this mistake which means that again, you need to ensure that your code enforces your rules.
Did you find it strange that pushing the graph into the context using the author
object pulled in the entire graph but pushing it in via the book
object didn't? Consider these objects more closely. The book was a member of the Author.Books
property. When I attached the author
, the context was able to traverse into the book
object. However, even though the Book
class has an Author
navigation property, that property wasn't populated with the author
object. So when I called context.Add(book)
, the context wasn't able to detect the author
object.
All of these details are hard to keep in your head, even if you knew them once or twice before! I always create integration tests to make sure I haven't forgotten a behavior.
Taking More Control over the Graph's State
There's a way to make this pattern work, however: By explicitly setting the foreign key property because you do have easy access to it. DetectChanges
is redundant because the Add
method set the state of the Book
immediately. Of course, SaveChanges
will call that anyway, but again, it's an important behavior to be aware of.
var book = existingAuthor.Books[0];
book.AuthorId = existingAuthor.AuthorId;
context.Add(book);
//context.ChangeTracker.DetectChanges();
One thing I like about just setting the foreign key is that I'm not relying on “magic” to have success with my persistence logic.
Tracking Single Entities with the Entry Method
Here's another place you may be surprised with how EF Core reacts to our incoming graph.
Given that I'm a fan of explicit logic, I'm also a fan of the very explicit DbContext.Entry
method. The beauty of the Entry
method is that it only pays attention to the root of whatever graph you use as its parameter. It's the only clean way to separate an entity from a graph with EF Core and, because of this strict behavior, you don't have to make guesses about what will happen within a graph.
Yet, it still may surprise you with the mixed state graph. When I use Entry
to start tracking the book
in my graph:
var book = existingAuthor.Books[0];
context.Entry(book).State=EntityState.Added;
the Entry
method ignores the author that's connected to that book
. As expected, it sets the state of that book to Added
. But because the ChangeTracker
is now unaware of the Author
object, it can't read existingAuthor.AuthorId
to set the book
's foreign key property and therefore, book
's AuthorId
is still 0
.
Book {BookId: -2147482647} Added FK {AuthorId: 0}
As you just learned above, your code needs to take responsibility for the foreign key. Therefore, I'll just set it myself before calling the Entry
method:
var book = existingAuthor.Books[0];
book.AuthorId= existingAuthor.AuthorId;
context.Entry(book).State=EntityState.Added;
Now there's no question about AuthorId
being 2
.
Book {BookId: -2147482647} Added FK {AuthorId: 2}
While, yes, there is an extra line of code here, but this makes my logic dependable and gives me confidence. My integration tests give me a lot more confidence, though!
The Handy Attach Method
So far, I've shown you quite a few unsuccessful or “assisted” ways to get that new book inserted into the database when it's part of a mixed state graph. I hope you appreciate this better understanding of not only what to avoid, but also why.
Is there a “best” way to do this or just an easy and dependable pattern? I think the Entry
method, along with the explicit code to ensure that the FK is set, is dependable and memorable. As I said, I do prefer the explicit logic. It makes me feel more in control of my code's behavior and less dependent on under-the-covers magic.
There is one other dependable pattern, which is the Attach
method. I'll pass my graph into Authors.Attach
.
context.Authors.Attach(existingAuthor);
context.ChangeTracker.DetectChanges();
Attach tells the ChangeTracker
to just start tracking the graph but not to bother setting EntityState
. The “one true rule” you've seen repeatedly now for Add
and Update
also applies to Attach
: Any objects in the graph that have no key value are marked as Added
. Now the existingAuthor
is connected and being tracked but has the default state of Unchanged
. The shiny new book
with no BookId
becomes Added
.
Author {AuthorId: 2} Unchanged
Book {BookId: -2147482647} Added FK {AuthorId: 2}
In this case, Attach
is an excellent way to assure the state of both objects correctly and the Book
's AuthorId
foreign key property. SaveChanges
will only send one command, the correct command: an INSERT
for the new book.
This works perfectly for this scenario. You know that the Author
is unchanged and you're also relying on the rule that any new objects will be handled correctly.
Dealing with Unknown State
I've pointed out a few times now that there's a problem with calling Update
on entities that haven't been edited. They get sent to the database with an UPDATE
command, which can be a wasted call and possibly have a bad effect on performance.
However, there's an interesting use for Update
: when you have objects coming in and you have no idea if they need to be added, updated, or ignored. Notice that I am leaving “deleted” out of that list. You must supply some way of detecting whether an object is to be deleted. In typical controller methods, including those generated by the Visual Studio templates, you do have explicit methods for inserting, updating, and deleting, so there's no mystery.
What if your logic is different from these controller methods and a request passes in an Author
object with - or maybe even without - attached
objects. I'll use the same code that represents an existing Author
to demonstrate:
var existingAuthor = new Author("Ruth", "Ozeki")
{ AuthorId = 2 };
Here's where your own logic can be written to glean some attributes. For example, Author
has an AuthorId
value of 2
. If your business logic is such that you know that any object with an ID value present must have come from the database, then you can at least determine that this object is either unchanged or has been edited.
In this case, if you are given no other clues, updating this Author
covers your bases. If any of the data was changed - for example, maybe the user fixed a typo in the Author
's last name - the Update
will be sure to get that into the database. If none of the data was changed, but you have no idea if that's the case, suddenly that “unnecessary update” may be seen as not a terrible thing. That depends on what kind of performance expectations you have. Maybe the unnecessary updates are too taxing and you have determined that it's smarter to first grab that record from the database and compare its values to the incoming Author
before deciding to ignore it or update it. The controller template that uses EF Core with Actions, does make a quick trip to the database but it does that to verify that the row indeed exists in the database before calling for an update and potentially triggering a database error for a nonexistent row.
The bottom line is driven by how much magic you will accept and what type of load there is on your application. From there, you can use your knowledge of EF Core's behavior to choose your path.
So Many Variants to Explore
Keep in mind that although you've learned about a lot of the ChangeTracker
behavior in response to various ways of tracking this graph, I've focused on a particular scenario: a mixed state graph where one object came from the database and wasn't modified and the other object was new. I also used pretty standard class definitions. My Author
class has a list of books and the Book
class has both the Author
navigation property and a non-nullable AuthorId
foreign key property. And with that, you saw many different effects of persisting these objects, whether they were being tracked from retrieval to saving, or they were disconnected from their original context, as with a Web application.
If you start tweaking other factors, such as removing navigation properties or FK properties or changing the nullability of the foreign key, you'll have another batch of resulting behavior to be aware of.
Remember that I chose to only use DetectChanges
directly and not SaveChanges
. That's a nice way of testing things out without bothering with the database. Some of those scenarios where it was necessary to call DetectChanges
to get the expected behavior will be solved by calling SaveChanges
.
As I stated earlier, I can never keep all of these behaviors in my head. I depend on integration tests to make sure things are working as I expect them to. And because I'm not using a database in any of the examples above, you can't write tests without a database provider, not even the InMemory
provider. You can just build assertions about the state of the entities within the ChangeTracker
.