</outer>\\";\\n var parser = new Parser();\\n \\n // Arrange : record expectations\\n var mocks = new MockRepository();\\n IHandler handler = mocks.CreateMock<IHandler>();\\n handler.StartDocument();\\n handler.StartElement(\\"outer\\");\\n handler.StartElement(\\"inner\\");\\n handler.EndElement(\\"inner\\");\\n handler.EndElement(\\"outer\\");\\n handler.EndDocument();\\n mocks.ReplayAll();\\n \\n // Act\\n parser.ParseXml(xml, handler);\\n \\n // Assert\\n mocks.VerifyAll();\\n}\\n\\n\\n
In the sample above, we are testing how the Parser class interacts with IHandler. In order for the Parser to be correct, it should call specific methods of the handler in a particular sequence.
\\nThe mock does a pretty good job here, but the code sample has the flaw we discussed in the previous article: the test relies on the implementation details of the Handler class and thus is prone to refactoring.
\\nLet’s see how we can write the same test using a hand-written stub. First, we need the stub itself:
\\npublic class HandlerStub : IHandler\\n{\\n private List<Tuple<Action, string>> actionsCalled = new List<Tuple<Action, string>>();\\n \\n public void StartDocument()\\n {\\n actionsCalled.Add(new Tuple<Action, string>(Action.DocumentStarted, null));\\n }\\n \\n public void StartElement(string elementName)\\n {\\n actionsCalled.Add(new Tuple<Action, string>(Action.ElementStarted, elementName));\\n }\\n \\n /* Other methods */\\n \\n internal enumAction\\n {\\n DocumentStarted,\\n ElementStarted,\\n ElementEnded,\\n DocumentEnded\\n }\\n}\\n
The stub basically records all the interactions that come from the outside world and provides methods to validate those interactions.
\\nNow, we can use this stub in our test:
\\n[Fact]\\npublic void Parser_parses_xml_in_correct_order()\\n{\\n // Arrange\\n string xml = \\"<outer><inner /></outer>\\";\\n var parser = new Parser();\\n var handlerStub = new HandlerStub();\\n \\n // Act\\n parser.ParseXml(xml, handlerStub);\\n \\n // Assert\\n handlerStub.WasCalled()\\n .WithStartDocument()\\n .WithStartElement(\\"outer\\")\\n .WithStartElement(\\"inner\\")\\n .WithEndElement(\\"inner\\")\\n .WithEndElement(\\"outer\\")\\n .WithEndDocument();\\n}\\n
As you can see, we achieved the same result with no mocks whatsoever.
\\nThe distinction between the former and the latter versions of the test might seem subtle, but it is rather significant. With mocks, you have to mimic the IHandler interface with every test that uses it, which leads to code duplication and thus to brittle design. Every change in IHandler would cause cascade failures throughout all tests that use it.
\\nWith stubs, you pack the fragility into a single stub implementation. You abstract your tests from the internal details of the dependency. If you change the Handler interface, you need to adjust the stub only once; you don’t have to touch any tests that use it.
\\nThat’s the power of a reliable test suite. Hand-written stubs help create unit tests that verify the end result of an interaction without knowing the internal implementation details of that interaction. They help us adhere to the most important TDD rule, which is keeping the tests one level of abstraction above the code they check.
\\nI have to make a note here. Although I strongly advocate you prefer stubs over mocks, there are situations where you are better off choosing mocks. In the cases where you need to create only a single unit test that uses a dependency, there are no effective differences between a test that relies on mocks and the one that employs stubs.
\\nIn both cases, you would need to change the code only once should a refactoring occur. Because of that, mocks would be a preferable choice as they require less up-front effort than stubs. But whenever you see you start having more than one test that substitute the same dependency using mocks, you should switch to a hand-written stub instead.
\\nIf you enjoyed this article, be sure to check out my Pragmatic Unit Testing Pluralsight course too.
\\nEven in situations where you need to test the correctness of the interactions between classes, you can check the end result of those interactions, not the behavior that led to that result.
\\nStubs help us do that. Although using them require more up-front effort, it pays off greatly. Tests that rely on stubs are less prone to refactoring and thus lay the ground for building a solid unit test suite.
\\nThe source code for the code sample can be found here.
\\nStubs vs Mocks
\\nIn the previous articles, we discussed what causes the pain while writing unit tests (mocks), and how to make TDD painless (get rid of the mocks). Today, I want to set the groundwork and discuss why mocks actually cause so much pain to us, developers.
\\nThe most important TDD rule
\\nA quick note before we start: in this series, I refer to both test-first and code-first unit testing approaches as TDD for the sake of brevity. It’s not quite accurate, though, as TDD is about the former, not just writing tests in general. Just keep in mind that I use the TDD term as a synonym to unit testing.
\\nOkay, so what is the most important TDD rule? The rule that entails all the stuff we were talking about in this article series? It’s this: unit tests should operate concepts that are one level of abstraction above the code these unit tests verify. Let’s elaborate on that statement.
\\nHere’s an example of a unit test that relies on mocks:
\\n[Fact]\\npublic void Order_calculates_shipment_amount()\\n{\\n var rateCalculator = new Mock<IRateCalculator>();\\n Order order = new Order(rateCalculator.Object);\\n \\n order.CalculateShipmentAmount();\\n \\n rateCalculator.Verify(x => x.Calculate(order));\\n}\\n
It checks calculation logic by verifying that an Order instance calls the IRateCalculator.Calculate method. We already discussed that such tests are brittle and thus not maintainable. But why is it so?
\\nThe reason is that such unit tests check the implementation and not the end result of the system under test (SUT). They go down to how the method is implemented and dictate the right way of doing this. The problem here is that there’s no such thing as the only right way of implementing the calculation logic.
\\nWe can do it in many different manners. We could decide to make Order use another calculator or even implement the calculation logic in the Order class itself. All this would result in breaking the test, although the end result could remain exactly the same.
\\nThe problem with unit tests using mocks is that they operate on the same level of abstraction as the code of the system under test. That is the reason why they are so fragile and that is the reason why it is hard to build a solid unit test suite with such tests.
\\nThe only way to avoid the brittleness is to check the end result of the method, not how this result was achieved. In other words, we need to step away from the actual implementation and rise one step ahead of the code we are unit-testing.
\\nIn the example above, we shouldn’t think about how the Order class implements the CalculateShipmentAmount method, all we need to do is verify that the amount matches the expectation.
\\nAlthough unit tests using mocks are prone to breaking this rule the most, the rule itself doesn’t refer to the use of mocks specifically. It is possible to write unit tests that don’t rely on mocks and still dictate the system under test how it should implement its behavior.
\\nHere’s an example:
\\n[Fact]\\npublic void GetById_returns_an_order()\\n{\\n var repository = new OrderRepository();\\n \\n repository.GetById(5);\\n \\n Assert.Equal(\\"SELECT * FROM dbo.[Order] WHERE OrderID = 5\\", repository.SqlStatement);\\n}\\n
The unit test above checks that the OrderRepository class prepares a correct SQL statement when fetching an Order instance from the database. The problem with this unit test is the same as with those that use mocks: it verifies the implementation details, not the end result. It insists the SQL statement should be the exact string specified in the test, although there could be myriad different variations of this query leading to the same result.
\\nThe solution here is the same as before: the test should operate concepts that reside on a higher level of abstraction. It means we need to verify that the Order instance the repository returns matches our expectations, without paying attention to how the repository does its job.
\\nTests that operate on the same level of abstraction as SUT have several drawbacks. Aside from being prone to refactoring, they force programmers do the same work twice, especially when following the test-first approach.
\\nWhile writing such tests developers are constantly thinking about how the system under test is going to be implemented. They write an expectation of how it should be implemented, then implement it with the exact same code immediately after that.
\\nLow-level tests fully mimic the SUT’s implementation and don’t add any value. While refactoring, you find they break as often as you change any of the implementation details they depend upon, so you need to also apply the same refactoring twice: in the code itself and in the tests covering that code.
\\nMoreover, in many cases, such tests compel you to expose the internal state of the SUT and thus break the encapsulation principles. In the example with OrderRepository, we had to make its SqlStatament property public in order to verify it with the unit test.
\\nIt is no coincidence that low-level unit tests lead to poor encapsulation. In order to verify the SUT’s behavior, they need the SUT to expose its internal details to them.
\\nThat is actually a good litmus test that can help ensure you follow the \\"one level of abstraction above\\" rule while writing unit tests. If the test forces you to expose state that you would keep private otherwise, chances are you are creating a low-level test.
\\nLow-level tests are the root cause of why TDD can be so painful. The heavy use of such tests is also the underlying reason of why there is such notion as test-induced design damage.
\\nDon’t write tests that operate on the same level of abstraction as the code they cover. It’s hard to overestimate how important that is. This rule is the dividing line between fragile tests and a solid and reliable test suite that helps grow your software project.
\\nIf you enjoyed this article, be sure to check out my Pragmatic Unit Testing Pluralsight course too.
\\nThe most important TDD rule
\\nUnit testing is good at checking the correctness of your code in isolation, but it’s not a panacea. Only integration tests can give us confidence that the application we develop actually works as a whole. They are also a good substitute for mocks in the cases where you can’t test important business logic without involving external dependencies.
\\nIntegration testing or how to sleep well at nights
\\nIntegration tests operate on a higher level of abstraction than unit tests. The main difference between integration and unit testing is that integration tests actually affect external dependencies.
\\nThe dependencies integration tests work with can be broken into two types: the ones that are under your control, and the ones that you don’t have control over.
\\nDatabase and file system fall into the first category: you can programmatically change their internal state which makes them perfectly suitable for integration testing.
\\nThe second type represents such dependencies as email SMTP server or enterprise service bus (ESB). In most cases, you can’t just wipe out the side effects introduced by invoking an email gateway, so you still need to somehow fake these dependencies even with integration tests. However, you don’t need mocks to do that. We’ll discuss this topic in a minute.
\\nIt’s almost always a good idea to employ both unit and integration testing. The reason is that, with unit tests, you can’t be sure that different parts of your system actually work with each other correctly. Also, it’s hard to unit test business logic that don’t belong to domain objects without introducing mocks.
\\nA single integration test cross cuts several layers of your code base at once resulting in a better return of investments per line of the test code. At the same time, integration testing is not a substitution for unit testing because they don’t provide as high granularity as unit tests do. You can’t just cover all possible edge cases with them because it would lead to significant code duplication.
\\nA reasonable approach here is the following:
\\nEmploy unit testing to verify all possible cases in your domain model.
\\nWith integration tests, check only a single happy path per application service method. Also, if there are any edge cases which cannot be covered with unit tests, check them as well.
\\nAlright, let’s look at a concrete example of how we can apply integration testing. Below is a slightly modified version of the method from the previous post:
\\npublic HttpResponseMessage CreateCustomer(string name, string email, string city)\\n{\\n Customer existingCustomer = _repository.GetByEmail(email);\\n if (existingCustomer != null)\\n return Error(\\"Customer with such email address already exists\\");\\n \\n Customer customer = new Customer(name, city);\\n _repository.Save(customer);\\n \\n if (city == \\"New York\\")\\n {\\n _emailGateway.SendSpecialGreetings(customer);\\n }\\n else\\n {\\n _emailGateway.SendRegularGreetings(customer);\\n }\\n \\n return Ok();\\n}\\n
How can integration tests help us in this situation?
\\nFirst of all, they can verify that the customer was in fact saved in the database. Secondly, there’s an important business rule here: customers\' emails must be unique. Integration testing can help us with that as well. Furthermore, we send different types of greeting emails depending on the city the customer’s in. That is also worth checking.
\\nLet’s start off with testing a happy path:
\\n[Fact]\\npublic void Create_customer_action_creates_customer()\\n{\\n var emailGateway = new FakeEmailGateway();\\n var controller = new CustomerController(new CustomerRepository(), emailGateway);\\n \\n controller.CreateCustomer(\\"John Doe\\", \\"[email protected]\\", \\"Some city\\");\\n \\n using (var db = new DB())\\n {\\n Customer customerFromDb = db.GetCustomer(\\"[email protected]\\");\\n customerFromDb.ShouldExist()\\n .WithName(\\"John Doe\\")\\n .WithEmail(\\"[email protected]\\")\\n .WithCity(\\"Some city\\")\\n .WithState(CustomerState.Pending);\\n \\n emailGateway\\n .ShouldSendNumberOfEmails(1)\\n .WithEmail(\\"[email protected]\\", \\"Hello regular customer!\\");\\n }\\n}\\n
Note that we pass a real customer repository instance to the controller and a fake email gateway. Here, the repository is a dependency we have control over, whereas email gateway is the dependency we need to fake.
\\nAlso note the DB class and the heavy use of extension methods, such as ShouldExist, WithName and so on. The DB class is a utility class that helps gather all test-specific database interactions in a single place, and the extension methods allow us to check the customer’s properties in a narrow and readable way.
\\nThe test also verifies that an appropriate email was sent to the newly created customer. In this particular case, the email should be sent with \\"Hello regular customer\\" subject. We’ll look at the fake email gateway closer shortly.
\\nHere’s another test:
\\n[Fact]\\npublic void Cannot_create_customer_with_duplicated_email()\\n{\\n CreateCustomer(\\"[email protected]\\");\\n var controller = new CustomerController(\\n new CustomerRepository(),\\n new FakeEmailGateway());\\n \\n HttpResponseMessage response = controller.CreateCustomer(\\"John\\", \\"[email protected]\\", \\"LA\\");\\n \\n response.ShouldBeError(\\"Customer with such email address already exists\\");\\n}\\n
Here we verify that no two customers can have the same email.
\\nThe two tests shown above allow us to make sure that all three layers (controllers, the domain model, and the database) work together correctly. They immediately let us know if there’s anything wrong with the database structure, object-relational mappings, or SQL queries, and thus give us a true confidence our application works as a whole.
\\nOf course, it’s not 100% guarantee because there still could be issues with ASP.NET Web API routing or SMTP server settings. But I would say that integration testing, in conjunction with unit testing, provide us about 80% assurance possible.
\\nAlright, and finally here’s the third integration test which verifies that New Yorkers receive a special greetings letter:
\\n[Fact]\\npublic void Customers_from_New_York_get_a_special_greetings_letter()\\n{\\n var emailGateway = new FakeEmailGateway();\\n var controller = new CustomerController(new CustomerRepository(), emailGateway);\\n \\n controller.CreateCustomer(\\"John\\", \\"[email protected]\\", \\"New York\\");\\n \\n emailGateway\\n .ShouldSendNumberOfEmails(1)\\n .WithEmail(\\"[email protected]\\", \\"Hello special customer!\\");\\n}\\n
Now, let’s take a closer look at the fake email gateway we used in the tests above:
\\npublic class FakeEmailGateway : IEmailGateway\\n{\\n private readonly List<SentEmail> _sentEmails;\\n public IReadOnlyList<SentEmail> SentEmails\\n {\\n get { return _sentEmails.ToList(); }\\n }\\n \\n public FakeEmailGateway()\\n {\\n _sentEmails = new List<SentEmail>();\\n }\\n \\n public void SendRegularGreetings(Customer customer)\\n {\\n _sentEmails.Add(new SentEmail(customer.Email, \\"Hello regular customer!\\"));\\n }\\n \\n public void SendSpecialGreetings(Customer customer)\\n {\\n _sentEmails.Add(new SentEmail(customer.Email, \\"Hello special customer!\\"));\\n }\\n \\n public class SentEmail\\n {\\n public string To { get; private set; }\\n public string Subject { get; private set; }\\n \\n public SentEmail(string to, string subject)\\n {\\n To = to;\\n Subject = subject;\\n }\\n }\\n}\\n
As you can see, it just records recipients and subjects for all emails sent during its lifespan. We then use this information in our integration tests.
\\nThis implementation has a significant flaw which makes it no better solution than the mocks I argued against in the first article: it’s too brittle as it mimics every single method in the IEmailGateway interface. This stub would either stop compiling or report false positive every time you make a change in the interface.
\\nAlso, it makes a heavy use of the code duplication. Note that the letters\' subjects are just typed in directly resulting in virtually no advantage over mocks.
\\nA much better approach would be to extract a separate class - an email provider - with a single method to which the email gateway would resort in order to send an email. This provider then can be substituted by a stub:
\\nUnlike the previous version, this stub is stable as it doesn’t depend on the actual emails sent by the email gateway. It also doesn’t introduce any code duplication - it just records all incoming emails as is:
\\npublic class FakeEmailProvider : IEmailProvider\\n{\\n public void Send(string to, string subject, string body)\\n {\\n _sentEmails.Add(new SentEmail(to, subject));\\n }\\n \\n /* Other members */\\n}\\n
Integration testing is a good alternative to unit testing with mocks because of two reasons:
\\nUnlike unit tests with mocks, integration tests do verify that your code works with external dependencies correctly.
\\nFor the dependencies you don’t have control over, integration testing makes use of stubs instead of mocks. In most cases, stubs are less brittle and thus more maintainable choice. This point correlates to an old discussion about behavior vs state verification (also known as the classicist and the mockist schools). We’ll return to this topic in the following articles.
\\nIn the next post, we’ll talk about the most important TDD rule.
\\nIf you enjoyed this article, be sure to check out my Pragmatic Unit Testing Pluralsight course too.
\\nIntegration testing or how to sleep well at nights
\\nLast week, we nailed the root cause of the problems, related to so-called test-induced damage - damage we have to bring into our design in order to make the code testable. Today, we’ll look at how we can mitigate that damage, or, in other words, do painless TDD.
\\nHow to do painless TDD
\\nSo is it possible to not damage the code while keeping it testable? Sure it is. You just need to get rid of the mocks altogether. I know it sounds like an overstatement, so let me explain myself.
\\nSoftware development is all about compromises. They are everywhere: CAP theorem, Speed-Cost-Quality triangle, etc. The situation with unit tests is no different. Do you really need to have 100% test coverage? If you develop a typical enterprise application, then, most likely, no. What you do need is a solid test suite that covers all business critical code; you don’t want to waste your time on unit-testing trivial parts of your code base.
\\nOkay, but what if you can’t test a critical business logic without introducing mocks? In that case, you need to extract such logic out of the methods with external dependencies. Let’s look at the method from the previous post:
\\npublic class CustomerController : ApiController\\n{\\n private readonly ICustomerRepository_repository;\\n private readonly IEmailGateway_emailGateway;\\n \\n public CustomerController(ICustomerRepository repository,\\n IEmailGateway emailGateway)\\n {\\n _emailGateway = emailGateway;\\n _repository = repository;\\n }\\n \\n [HttpPost]\\n public HttpResponseMessage CreateCustomer([FromBody] string name)\\n {\\n Customer customer = new Customer();\\n customer.Name = name;\\n customer.State = CustomerState.Pending;\\n \\n _repository.Save(customer);\\n _emailGateway.SendGreetings(customer);\\n \\n return Ok();\\n }\\n}\\n
What logic is really worth testing here? I argue that only the first 3 lines of the CreateCustomer method are. Repository.Save method is important, but mocks don’t actually help us ensure it works, they just verify the CreateCustomer method calls it. Same is for the SendGreetings method.
\\nSo, we just need to extract those lines in a constructor (or, if there would be some more complex logic, in a factory):
\\n[HttpPost]\\npublic HttpResponseMessage CreateCustomer([FromBody] string name)\\n{\\n Customer customer = new Customer(name);\\n \\n _repository.Save(customer);\\n _emailGateway.SendGreetings(customer);\\n \\n return Ok();\\n}\\npublic class Customer\\n{\\n public Customer(string name)\\n {\\n Name = name;\\n State = CustomerState.Pending;\\n }\\n}\\n
Unit-testing of those lines becomes trivial:
\\n[Fact]\\npublic void New_customer_is_in_pending_state()\\n{\\n var customer = new Customer(\\"John Doe\\");\\n \\n Assert.Equal(\\"John Doe\\", customer.Name);\\n Assert.Equal(CustomerState.Pending, customer.State);\\n}\\n
Note that there’s no arrange section in this test. That’s a sign of highly maintainable tests: the fewer arrangements you do in a unit test, the more maintainable is becomes. Of course, it’s not always possible to get rid of them completely, but the general rule remains: you need to keep the Arrange section as small as possible. In order to do this, you need to stop using mocks in your unit tests.
\\nBut what about the controller? Should we just stop unit-testing it? Exactly. The controller now doesn’t contain any essential logic itself, it just coordinates the work of the other actors. Its logic became trivial. Testing of such logic brings more costs than profits, so we are better off not wasting our time on it.
\\nThat brings us to some interesting conclusions regarding how to make TDD painless. We need to unit test only a specific type of code which:
\\nDoesn’t have external dependencies.
\\nExpresses your domain.
\\nBy external dependencies, I mean objects that depend on the external world’s state. For example, repositories depend upon data in the database, file managers depend upon files in the files system, etc. Mark Seemann in his book Dependency Injection in .NET describes such dependencies as volatile. Other types of dependencies (strings, DateTime, or even domain classes) don’t count as such.
\\nHere’s how we can illustrate different types of code:
\\nSteve Sanderson wrote a brilliant article on that topic, so you might want to check it out for more details.
\\nGenerally, the maintainability cost of the code that both contains domain logic and has external dependencies (the \\"Mess\\" quadrant at the diagram) is too high. That what we had in the controller in the beginning: it depended on external dependencies and did some domain-specific work at the same time. The code like that should be split up: the domain logic should be placed in the domain objects so that the controller keeps track of coordination and putting it all together.
\\nFrom the remaining 3 types of code in your code base (domain model, trivial code, and controllers), you need to unit test only the domain-related code. That is where you can have the best return of your investments; that is a trade-off I advocate you make.
\\nHere are my points on that:
\\nNot all code is equally important. Your application contains some business-critical parts (the domain model), on which you should focus the most of your efforts.
\\nDomain model is self-contained, it doesn’t depend on the external world. It means that you don’t need to use any mocks to test it, nor should you. Because of that, unit tests aimed to test your domain model are easy to implement and maintain.
\\nFollowing these practices helps build a solid unit test suite. The goal of unit-testing is not 100% test coverage (although in some cases it is reasonable). The goal is to be confident that changes and new code don’t break existing functionality.
\\nI stopped using mocks long time ago. Since then, not only did the quality of the code I write not dropped, but I also have a maintainable and reliable unit test suite that helps me evolve code bases of the projects I work on. TDD became painless for me.
\\nYou might argue that even with these guidelines, you can’t always get rid of the mocks in your tests completely. An example here may be the necessity of keeping customers\' emails unique: we do need to mock the repository in order to test such logic. Don’t worry, I’ll address this case (and some others) in the next article.
\\nIf you enjoyed this article, be sure to check out my Pragmatic Unit Testing Pluralsight course too.
\\nLet’s summarize this post with the following:
\\nDon’t use mocks
\\nExtract the domain model out of methods with external dependencies
\\nUnit-test only the domain model
\\nIn the next post, we’ll discuss integration testing and how it helps us in the cases where unit tests are helpless.
\\nHow to do painless TDD
\\nI’m going to write a couple of posts on the topic of TDD. Over the years, I’ve come to some conclusions of how to apply TDD practices and write tests in general that I hope you will find helpful. I’ll try to distil my experience with it to several points which I’ll illustrate with examples.
\\nTest-induced design damage or why TDD is so painful
\\nI’d like to make a quick note before we start. TDD, which stands for Test-Driven Development, is not the same thing as writing unit tests. While TDD implies the latter, it emphasizes test-first approach, in which you write a unit test before the code it tests. In this article, I’m talking about both TDD and writing tests in general. I also refer to both of these notions as \\"TDD\\" for the sake of brevity, although it’s not quite accurate.
\\nHave you ever felt like adhering to the TDD practices or even just unit testing your code after you wrote it brings more problems than solves? Did you notice that, in order to make your code testable, you need to mess it up first? And the tests themselves look like a big ball of mud that is hard to understand and maintain?
\\nDon’t worry, you are not alone. There was a whole series of discussions on whether or not TDD is dead, in which Martin Fowler, Kent Beck, and David Heinemeier Hansson tried to express their views and experiences with TDD.
\\nThe most interesting takeaway from this discussion is the concept of test-induced design damage introduced by David Hansson. It generally states that you can’t avoid damaging your code when you make it testable.
\\nHow is it so? Let’s take an example:
\\n[HttpPost]\\npublic HttpResponseMessage CreateCustomer([FromBody] string name)\\n{\\n Customer customer = new Customer();\\n customer.Name = name;\\n customer.State = CustomerState.Pending;\\n \\n var repository = new CustomerRepository();\\n repository.Save(customer);\\n \\n var emailGateway = new EmailGateway();\\n emailGateway.SendGreetings(customer);\\n \\n return Ok();\\n}\\n
The method is pretty simple and self-describing. At the same time, it’s not testable. You can’t unit-test it because there’s no isolation here. You don’t want your tests to touch database because they’d be too slow, nor you want them to send real emails every time you run the test suite.
\\nIn order to test business logic in isolation, you need to inject the dependencies to the class from the outside world:
\\npublic class CustomerController : ApiController\\n{\\n private readonlyICustomerRepository_repository;\\n private readonlyIEmailGateway_emailGateway;\\n \\n public CustomerController(ICustomerRepository repository,\\n IEmailGateway emailGateway)\\n {\\n _emailGateway = emailGateway;\\n _repository = repository;\\n }\\n \\n [HttpPost]\\n public HttpResponseMessage CreateCustomer([FromBody] string name)\\n {\\n Customer customer = new Customer();\\n customer.Name = name;\\n customer.State = CustomerState.Pending;\\n \\n _repository.Save(customer);\\n _emailGateway.SendGreetings(customer);\\n \\n return Ok();\\n }\\n}\\n
Such approach allows us to isolate the method’s business logic from external dependencies and test it appropriately. Here’s a typical unit test aimed to verify the method’s correctness:
\\n[Fact]\\npublic void CreateCustomer_creates_a_customer()\\n{\\n // Arrange\\n var repository = new Mock<ICustomerRepository>();\\n Customer savedCustomer = null;\\n repository\\n .Setup(x => x.Save(It.IsAny<Customer>()))\\n .Callback((Customer customer) => savedCustomer = customer);\\n \\n Customer emailedCustomer = null;\\n var emailGateway = new Mock<IEmailGateway>();\\n emailGateway\\n .Setup(foo => foo.SendGreetings(It.IsAny<Customer>()))\\n .Callback((Customer customer) => emailedCustomer = customer);\\n \\n var controller = new CustomerController(repository.Object, emailGateway.Object);\\n \\n // Act\\n HttpResponseMessage message = controller.CreateCustomer(\\"John Doe\\");\\n \\n // Assert\\n Assert.Equal(HttpStatusCode.OK, message.StatusCode);\\n Assert.Equal(savedCustomer, emailedCustomer);\\n Assert.Equal(\\"John Doe\\", savedCustomer.Name);\\n Assert.Equal(CustomerState.Pending, savedCustomer.State);\\n}\\n
Does it seem familiar? I bet you created plenty of those. I did a lot.
\\nClearly, such tests just don’t feel right. In order to test a simple behavior, you have to create tons of boilerplate code just to isolate that behavior out. Note how big the Arrange section is. It contains 11 rows compared to 5 rows in both Act and Assert sections.
\\nSuch unit tests also break very often without any good reason - you just need to slightly change the signature of one of the interfaces they depend upon.
\\nDo such tests help find regression defects? In some simple cases, yes. But more often than not, they don’t give you enough confidence when refactoring your code base. The reason is that such unit tests report too many false positives. They are too fragile. After a time, developers start ignoring them. It is no wonder; try to keep trust in a boy who cries wolf all the time.
\\nSo why exactly does it happen? What makes tests brittle?
\\nThe reason is mocks. Test suites with a large number of mocked dependencies require a lot of maintenance. The more dependencies your code has, the more effort it takes to test it and fix the tests as your code base evolves.
\\nUnit-testing doesn’t incur design damage if there are no external dependencies in your code. To illustrate this point, consider the following code sample:
\\npublic double Calculate(double x, double y)\\n{\\n return x * x + y * y + Math.Sqrt(Math.Abs(x + y));\\n}\\n
How easy is it to test it? As easy as this:
\\n[Fact]\\npublic void Calculate_calculates_result()\\n{\\n // Arrange\\n double x = 2;\\n double y = 2;\\n var calculator = new Calculator();\\n \\n // Act\\n double result = calculator.Calculate(x, y);\\n \\n // Assert\\n Assert.Equal(10, result);\\n}\\n
Or even easier:
\\n[Fact]\\npublic void Calculate_calculates_result()\\n{\\n double result = new Calculator().Calculate(2, 2);\\n Assert.Equal(10, result);\\n}\\n
That brings us to the following conclusion: the notion of test-induced design damage belongs to the necessity of creating mocks. When mocking external dependencies, you inevitably introduce more code, which itself leads to a less maintainable solution. All this results in increasing of maintenance costs, or, simply put, pain for developers.
\\nIf you enjoyed this article, be sure to check out my Pragmatic Unit Testing Pluralsight course too.
\\nAlright, we now know what causes so-called test-induced design damage and pain for us when we do TDD. But how can we mitigate that pain? Is there a way to do this? We’ll talk about it the next post.
\\nTest-induced design damage or why TDD is so painful
\\nThe third most important software development principle is Encapsulation.
\\nEncapsulation is all about hiding information accidental to the clients\' needs. The principle basically states the following:
\\nThe data and the logic operating this data should be packed together in a single cohesive class.
\\nThe collaboration surface (interface) of a class should be as small as possible.
\\nClasses can only communicate with their immediate neighbors.
\\nKeeping the data and the logic together prevents duplication and helps maintaining invariants. Without it, logic quickly spreads across the application, making it extremely hard to maintain.
\\nEncapsulation allows us to decrease coupling throughout the code and thus make it simpler. Also, it helps creating intention revealing interfaces which give us high-level information about the methods\' behavior.
\\nThere are two other principle which I personally consider parts of the overall Encapsulation principle: Tell Don’t Ask and Law of Demeter. They both state the rules we described before, although Law of Demeter does it in a more formal way.
\\nOften, the lack of encapsulation goes side by side with Anemic Domain Model. A tendency to move all the logic operating the data to some external services breaks encapsulation and often leads to code duplication and increase of accidental complexity.
\\nLet’s take some code and see how we can refactor it towards better encapsulation.
\\nHere’s a code sample:
\\npublic class Person\\n{\\n public Employer Employer { get; set; }\\n public string JobTitle { get; set; }\\n public string City { get; set; }\\n}\\n \\npublic class Employer\\n{\\n public string Name { get; set; }\\n public string City { get; set; }\\n}\\n
What problems do you see here? The classes only contain data with no methods, which makes them anemic. It also means that the logic working with this data resides somewhere else. That breaks the first rule of the Encapsulation principle which states that we should pack the data and the logic together.
\\nLet’s take a look at one of the client methods:
\\npublic double GetManagerCoderRatio(IList<Person> persons)\\n{\\n int coders = persons.Count(person => person.JobTitle == \\"Programmer\\"\\n || person.JobTitle == \\"Software Developer\\"\\n || person.JobTitle == \\"Coder\\");\\n \\n int managers = persons.Count(person => person.JobTitle == \\"CTO\\"\\n || person.JobTitle == \\"CFO\\"\\n || person.JobTitle == \\"Manager\\");\\n \\n return managers / (double)coders;\\n}\\n
This code sample has another problem: the method makes decisions based entirely upon the data of a single object. Namely, it decides whether a person a manager or a coder. Clearly, such decisions should be made by the Person class itself. Let’s change our code:
\\npublic class Person\\n{\\n public Employer Employer { get; set; }\\n public string JobTitle { get; set; }\\n public string City { get; set; }\\n \\n public bool IsCoder\\n {\\n get\\n {\\n return JobTitle == \\"Programmer\\"\\n || JobTitle == \\"Software Developer\\"\\n || JobTitle == \\"Coder\\";\\n }\\n }\\n \\n public bool IsManager\\n {\\n get\\n {\\n return JobTitle == \\"CTO\\"\\n || JobTitle == \\"CFO\\"\\n || JobTitle == \\"Manager\\";\\n }\\n }\\n}\\npublic double GetManagerCoderRatio(IList<Person> persons)\\n{\\n int coders = persons.Count(person => person.IsCoder);\\n int managers = persons.Count(person => person.IsManager);\\n \\n return managers / (double)coders;\\n}\\n
As you can see, now the decision-making process based on the Person’s state encapsulated with that state.
\\nAlright, let’s look at another client method:
\\npublic void AcquareCompany(IList<Person> employees, Employer newEmployer)\\n{\\n foreach (Person employee in employees)\\n {\\n employee.Employer = newEmployer;\\n employee.JobTitle = \\"Consultant\\"; // All employees now independent consultants with x3 wage\\n employee.City = newEmployer.City; // Remote work is not allowed\\n }\\n}\\n
As we can see here, the Person class has an invariant: no person can work remotely, meaning that their city should be the same as the employer’s one.
\\nAlthough the AcquareCompany method maintains this invariant, the invariant itself can be easily broken. To do it, we just need to assign a person a new employer without changing the person’s city.
\\nIt is a good practice to make code easy to use correctly and hard to use incorrectly. The Person class doesn’t adhere to this practice, so let’s refactor it:
\\npublic class Person\\n{\\n public Employer Employer { get; private set; }\\n public string JobTitle { get; private set; }\\n public string City { get; private set; }\\n \\n public void ChangeEmployer(Employer newEmployer, string jobTitle)\\n {\\n Employer = newEmployer;\\n City = newEmployer.City;\\n JobTitle = jobTitle;\\n }\\n \\n /* IsCoder, IsManager properties */\\n}\\npublic void AcquareCompany(IList<Person> employees, Employer newEmployer)\\n{\\n foreach (Person employee in employees)\\n {\\n employee.ChangeEmployer(newEmployer, \\"Consultant\\");\\n }\\n}\\n
Note the following implications from the change made:
\\nWe decreased the collaboration surface of the Person class, and thus decreased coupling between the class and the client method. Instead of 3 properties (Employer, JobTitle, City) we now have a single ChangeEmployer method. All the properties\' setters are made private.
\\nWe created intention revealing interface. Instead of a set of low-level operations, Person now has a high-level method (ChangeEmployer) which fully conveys its purpose.
\\nThe Person’s invariant is now protected and can’t be broken.
\\nEncapsulation is another principle that helps decrease accidental complexity and keep software maintainable.
\\nNext time, we’ll look at what coupling and cohesion actually mean (with examples), as well as some common misunderstandings of the DRY principle. Stay tuned!
\\nEncapsulation revisited
\\nToday, I’m going to discuss the KISS principle. I consider it the second most valuable software development principle.
\\nThe acronym KISS stands for \\"Keep it simple, stupid\\" (or \\"Keep it short and simple\\", according to other sources). It means that you should keep your system as simple as possible, because the simpler your application is, the easier it is to maintain.
\\nThis principle correlates with Yagni, but they are not the same thing. While Yagni states you need to cut off the code you don’t need right now, KISS is about making the remaining code simple. Here’s how it can be visualized:
\\nSimplicity allows us to read and understand the source code more easily, and the ability to reason about the source code is vital for further development.
\\nHere are three characteristics that can be attributed to any software system: Correctness, Performance, Readability. How would you arrange them by their importance?
\\nIt’s quite a tricky question, because the answer depends on several things, such as whether or not you will ever need to touch this system. But I believe for every typical enterprise application with a long development and support period the answer would be pretty much the same. So, what is it?
\\nI guess one can think the order of importance is the following: Correctness > Readability > Performance. That is, the correctness is more important than readability which is itself more important than performance.
\\nNow, think about it. Is it better to have an unreadable application that does things correctly than an application with bugs whose source code is simple and straightforward? In the short term, yes, it’s better to have a flawless system. But what if you need to add or change something in it? If you can’t understand the code, how can you implement the change?
\\nIt turns out that if you are ever going to enhance or evolve your system (which is true for most of enterprise systems), you are better off having one that is readable than the one that is bugless. The situation with the latter is unstable; most likely, you will introduce new bugs with the very first changes as you can’t fully understand it. On the other hand, with a readable and simple source code, you can find and fix bugs pretty quickly. That is, the order rather looks like this: Readability > Correctness > Performance.
\\n\\n\\n\\n\\nThere are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.
\\n\\n\\n— C. A. R. Hoare.
\\n
Simplicity shouldn’t be confused with easiness. These two concepts are completely different.
\\nAn easy solution is one that doesn’t take you much effort to implement. At the same time, a simple solution is usually far more difficult to achieve. It’s easy to create a complicated system that is hard to understand; it takes a lot of time to boil such system down to several core aspects and remove unnecessary complexity. Here’s a nice talk on that topic: Simple made easy.
\\nThis notion strongly correlates with the concept of technical debt. It’s easier to introduce technical debt than it is to remove it. A complicated system often contains a lot of technical debt, so you have to get rid of it in order to make the system simpler.
\\nNot all systems can be made easy to understand. If you develop a CAD system, you can’t make it as simple as a To-Do List software, regardless of your attempts. You just have to understand 2D- and 3D-graphics and underlying math in order to build such application.
\\nThat brings us to two different types of complexity: accidental and essential. Essential complexity relates to the problem domain itself - the problem we need to solve in order to create the application. Nothing can decrease the essential complexity of a software system.
\\nAccidental complexity, on the other hand, refers to the issues developers have to address on their way to solve the essential problems. Such things as cache management, optimization practices and other stuff that is not directly related to the application’s domain model is considered accidental complexity.
\\nMost of the software engineering practices are aimed to reduce the accidental complexity, so is the KISS principle. The KISS principle has two main implications:
\\nIf you need to choose between two solutions, choose the simplest one.
\\nConstantly work on simplifying your code base.
\\nThe first statement seems obvious, but too often I saw developers choosing a more complicated solution without any good reason. I guess, such decisions were dictated by a feeling that the simpler solution just isn’t solid enough for the problem they were working on.
\\nIn reality, the contrary is the case. The simpler your solution is, the better you are as a software developer. Most software developers can write code that works. Creating code that works and is as simple as possible - that is the true challenge.
\\n\\n\\n\\n\\nPerfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
\\n\\n\\n— Antoine de Saint-Exupery.
\\n
A human’s mind can only handle 5-9 objects in memory. Throwing unnecessary complexity to code reduces your ability to reason about it and thus reduces your productivity. Even more, such reduction is not linear, but rather exponential. Increasing the unnecessary complexity of the code for just a little bit makes you times less productive when you work with it. Simplicity is the thing that is always worth investing in.
\\nKISS revisited
\\nI’m starting a new blog post series about the most valuable principles in software development. Not that I think you might not know them, but I rather want to share my personal experience and thoughts on that topic. The order in which I put those principles reflects their significance relative to each other, as it appears to be in my opinion.
\\nThat is quite a large subject and I’m going to dilute it with articles on other topics, so it might take a while.
\\nOkay, let’s start.
\\nThe abbreviation YAGNI stands for \\"You aren’t gonna need it\\" and basically means that you shouldn’t do anything that is not needed right now, because chances are you end up investing money in a feature that no one actually needs. Not only will you loose time spent on its development, but you will also increase the total cost of owning of you code base, because the more features a software contains, the more effort it takes to maintain it.
\\nIf I was asked what is the single most important thing that differs a good developer from a great one, I would answer that it is embracing and following the Yagni principle. I strongly believe that it is vital for a developer to fully understand this principle and, more importantly, possess the skill of following it.
\\nNothing can save you more time than not writing code. Nothing can boost your productivity as much as following the Yagni principle.
\\nI consider following the Yagni principle a separate skill. Just as other development skills, that skill can be trained. It also requires practice, just as any other skill.
\\nEvery software developer makes lots of little decisions in their day-to-day work. It’s important to incorporate \\"Yagni thinking\\" in this routine.
\\n\\"I have implemented the AddCommentToPhoto method. I guess I also need to create an AddCommentToAlbum function. Although we don’t need it right now, I bet we’ll need it in future if we decide to allow users to comment their photo albums.\\"
\\n\\"I’ve finished the CanonizeName method on the Album class. I think I’m better off extracting this method in a utility class because, in future, other classes might need it too.\\"
\\nThese are examples of thinking ahead, which is opposite to \\"Yagni thinking\\". You might have noticed that in both cases, there’s a reference to some \\"future\\". In most cases, when you find somebody making such statements, it is a strong sign they don’t follow the Yagni principle.
\\nWe, as software developers, are good in making generalizations and thinking ahead. While it helps us greatly in our day-to-day job, such trait has a downside. That is, we tend to think that some amount of work that we make up-front can save us a lot more effort in future. The reality is, it almost never does.
\\nThis may seem controversial, but in practice, laying the foundation for a functionality that we don’t need right now and might need in future almost never pays off. We end up just wasting our time.
\\nEvery time you are about to make a decision, ask yourself: is the method/class/feature I’m going to implement is really required right now? Or do you just assume it will be at some point in future? If the answer to the first question is no, be sure to forsake it.
\\nIt might seem easy to do, but the truth is it usually isn’t. We are used to creating reusable code. We do that even when we don’t have to, just in case. I remember my own mental shift being really tough. It required me to re-evaluate a lot of the coding habits I had at that moment.
\\nEventually, it pays off. It is amazing how that shift helps increase the overall productivity. Not only does it speed up the development, but it also allows for keeping the code base simple and thus more maintainable.
\\nIn some cases, thinking ahead can lead you in situations you don’t usually want to be in. I’d like to point them out.
\\nAnalysis paralysis. When you think of a new feature, you can be easily overwhelmed by the details involved in its development. The situation gets even worse when you try to take into account all the possible future implications of the decisions you make. In an extreme case, it can completely paralyze the development process.
\\nFramework on top of a framework. Another extreme situation thinking ahead may lead you in is building your own framework.
\\nSeveral years ago, I joined a project aimed to automate some business activities. By that moment, it had a rich functionality that allowed the users to configure almost any process they needed. The only problem with that solution was that the users didn’t want to do that.
\\nThe project ended up with a single pre-defined configuration that hardly ever changed later on. Every time someone needed to change a process, they asked developers to do that. Developers then changed the configuration and redeployed the whole application.
\\nNo one used the brand new highly configurable framework except the developers themselves. The project was implemented with a huge overhead just to enable capabilities that programmers could achieve by modifying the source code in the first place.
\\nFollowing the Yagni principle helps in such cases a lot. Moreover, it often becomes life-and-death for the project success.
\\nYagni principle is applicable not only to developers. To achieve success, it is important to also apply it on the business level. This strongly correlates with the notion of building the right thing: when you focus on the current users\' needs and don’t try to anticipate what they might want in future.
\\nProjects that adhere to Yagni principle on all levels have much better chances to succeed. Moreover, following it on the business level pays off the most, because it helps save a tremendous amount of work.
\\nAlthough business decisions are often deemed to be an area software developers can’t have an influence on, it is not always the case. Even if it’s so, don’t let it stop you from trying.
\\nIf you see that a feature that is going to be implemented is not needed right now, state it. Tell business people that they are better off not implementing it. There are two major points I use in such conversations:
\\nWe don’t know what the real requirements are. If we implement the feature now, odds are we’ll need to change the implementation in future and thus waste a lot of efforts we put up-front.
\\nAdding functionality always adds maintainability costs. Every feature increases the time and effort required to implement all subsequent features, especially if that feature interacts with other existing features (i.e. is cross-cutting). Adding the feature later, when the need for it becomes obvious, helps speed up the development of the other functionality.
\\nDo you follow the Yagni principle? If yes, how accurately do you do that? Here’s a simple test that can help you evaluate that. Don’t take it too seriously, though.
\\nLet’s say you need to create a function that calculates the sum of 2 and 3. How would you implement it? Scroll down to see the answer.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\n.
\\nHere’s how most developers would do that:
\\npublic int Sum(int x, int y)\\n{\\n return x + y;\\n}\\n
Do you see a problem here? There is a premature generalization in this code: instead of a method that calculates the sum of 2 and 3 which was requested by the problem definition, we implemented a function that returns a sum of any two numbers.
\\nHere’s the version that follows the Yagni principle:
\\npublic int SumTwoAndThree()\\n{\\n return 2 + 3;\\n}\\n
Of course, it’s a toy example, but it shows how deep in our minds resides the habit of making generalizations
\\nNothing helps gain efficiency as much as following the Yagni principle. It helps avoid a whole set of problems that relate to overengineering.
\\nAlthough I fully advocate you adhere to it, there are some situations in which this principle is not fully applicable. Namely, if you develop a 3rd party library, you need to think at least a little bit ahead of the current users\' needs as future changes can potentially break too much of existing code using this library.
\\nBesides that exception, following the Yagni principle is a must-have practice in software development.
\\nYAGNI revisited
\\nHard coding is often considered an anti-pattern. Having values that can change over time hard-coded in the source code requires recompilation every time these values actually change.
\\nWhile this statement is true, I think that hard coding should be the default choice when developing an application.
\\nWhen you work on a project or feature, there always are some magic numbers or strings that potentially can change in future. The first impulse is to make such changes configurable so that you could easily modify them later.
\\nIn most cases, such decisions complicate further maintainability. What we have here is a classic dilemma \\"simplicity vs agility\\". As the application grows, it becomes easier to change some of its parameters because they are extracted to the configuration file, but at the same time the overall burden of maintenance increases.
\\nIn an extreme case, you might end up having tens or even hundreds of configurable parameters, which make it utterly hard to maintain the application. Such situation is called configuration hell.
\\nJust as with many other software design decisions, we need to appeal to the YAGNI principle. Do all these parameters really need to be configurable right now? Or do we just make some up-front arrangement? If the latter is the case, we are better off cutting the configuration file and keeping it small up to the moment when the need for it becomes apparent.
\\nHard coding should be the default choice unless the necessity for the otherwise is proved. By hard coding, I don’t mean you should spread magic numbers and strings across your project’s source code. They still need to be gathered and put in a single place as constants. Yet, that means you should remove them from the config file.
\\nLet’s take some example and see how we could apply what was said above in practice.
\\nMy favourite logging library is NLog. It has a tremendous amount of features, each of which is easily configurable.
\\nHere’s a configuration file for a typical NLog setup:
\\n<nlog>\\n <variablename=\\"logFile\\" value=\\"C:\\\\logs\\\\log-${shortdate}.txt\\"/>\\n \\n <targets>\\n <targetname=\\"trace\\" xsi:type=\\"AsyncWrapper\\" queueLimit=\\"5000\\" overflowAction=\\"Block\\">\\n <targetname=\\"file\\" xsi:type=\\"File\\" encoding=\\"utf-8\\"\\n layout=\\"Date: ${longdate}\\\\r\\\\n Level: ${level}\\\\r\\\\n Message: ${message}\\\\r\\\\n\\"\\n fileName=\\"${logFile}.txt\\"\\n archiveFileName=\\"${logFile}.{#####}.txt\\"\\n archiveAboveSize=\\"2097152\\"\\n maxArchiveFiles=\\"200\\"\\n archiveNumbering=\\"Sequence\\"/>\\n </target>\\n </targets>\\n \\n <rules>\\n <loggername=\\"*\\" minlevel=\\"Warn\\" writeTo=\\"trace\\"/>\\n </rules>\\n</nlog>\\n
While the setup itself is quite reasonable, I’d like to raise a question: is it really necessary to keep all these settings in the config file? Are we going to change them? In most cases, the answer is no. Even if you are doubtful of it, that also means \\"no\\" due to the YAGNI principle.
\\nLuckily, NLog allows us to use its Configuration API in order to configure it in code. So, instead of counting on the config file, we can easily move the settings to the source code. Let’s take a closer look at the example and see which of the settings we can get rid of.
\\nFirst of all, you can see in the targets section that we use an async wrapper for the actual target. Do we really want it to be configurable? No, such setting barely ever gets changed. Okay, what about the other target? It sets a lot of useful things, e.g. log entry’s layout, file name, maximum log file size and so on. Do we really need to have an opportunity to change them without recompilation? Most likely, no.
\\nWhat about the rules? This part is not as obvious as the one with targets. The possibility to change the minimum log level required to trigger the rule seems sensible because we might want to adjust it on the fly for debugging reasons. Nevertheless, odds are we never appeal to it in practice. So, we are better off removing this setting as well.
\\nAlright, what we have ended up with? Only a single setting is left, which is the path to the log file itself:
\\n<appSettings>\\n <addkey=\\"LogFilePath\\" value=\\"C:\\\\logs\\\\log-${shortdate}.txt\\" />\\n</appSettings>\\n
All the other settings now reside in the source code:
\\nstring layout = \\"Date: ${longdate}\\\\r\\\\n\\" +\\n \\"Level: ${level}\\\\r\\\\n\\" +\\n \\"Message: ${message}\\\\r\\\\n\\";\\n \\nvar config = new LoggingConfiguration();\\n \\nvar target = new FileTarget { FileName = fileName, Layout = layout /* Other params */ };\\nvar asyncWrapper = new AsyncTargetWrapper(target)\\n{\\n QueueLimit = 5000,\\n OverflowAction = AsyncTargetWrapperOverflowAction.Block\\n};\\nvar rule = new LoggingRule(\\"*\\", LogLevel.Error, asyncWrapper);\\nconfig.AddTarget(\\"asyncWrapper\\", asyncWrapper);\\nconfig.LoggingRules.Add(rule);\\n \\nLogManager.Configuration = config;\\n
As you can see, we removed the nlog section completely and moved the remaining setting to the appSettings section. It is now an ordinary member of our configuration file.
\\nThis single setting is the only one that really needs to have different values depending on the environment being used in. What we did here is we reduced the configuration surface and thus made the solution more maintainable at the cost of making it less flexible. And I strongly believe this is a good trade-off.
\\nLater on, we might find ourselves changing one of the hard coded settings too often. That would signalize we do have a good reason to move it to the configuration file. But up to this point, just make hard coding your default choice.
\\nI apply this rule regularly to all of the settings that could potentially be moved to configuration files. It helps keep them small and maintainable. Also, I noticed that even if I occasionally need to change one of such settings, making the change directly in the source code is sufficient in most cases.
\\nI need to note that the content of the article is applicable to in-house software only. 3rd party library development is a different story.
\\nAlso, I really appreciate all the feedback I got on that topic, I didn’t expect there would be so much of discussion here. But please, don’t confuse the main point of the article - which is making hard coding your default choice - with making hard coding the only choice. If you do need to extract some values out of the code and make them configurable - you are good to do it. The only thing I advocate you do is asking yourself whether such extraction is really needed.
\\nToday, I’d like to discuss the differences between interfaces, abstractions and .NET interfaces, as well as what the term \\"implementation details\\" means.
\\nSince .NET came out, the interface language feature it introduced slowly captured the term \\"Interface\\". Nowadays, the C# keyword \\"interface\\" is often taken for the only possible way to introduce an API. The terms \\"Interface\\" and \\".NET Interface\\" are used interchangeably by many.
\\nMoreover, it is often considered a bad practice to work with classes directly, especially in the case of Dependency Injection:
\\npublic Service(IPersonRepository repository) // \\"Good\\" practice\\n{\\n}\\n \\npublic Service(PersonRepository repository) // \\"Bad\\" practice\\n{\\n}\\n
But is that really the case? To answer this question, we need to step back and recall what the term \\"Interface\\" really means.
\\nAPI is a set of functions client code calls in order to use the functionality a class introduces. There’s a simple thing that differs interface of a class from its implementation details, and that is \\"public\\" (and also \\"internal\\") keyword. Every time you add a public method to a class, you change its interface. And every time you call a public method, you use the API of a class.
\\nWhile .NET interfaces do represent API, i.e. something you can code against, it is incorrect to attribute the term \\"Interface\\" exclusively to them. Every class has its own interface that is defined as a set of public methods.
\\n\\n\\n\\n\\nAlways depend upon abstractions, not implementations.
\\n
I guess this well-known phrase from the Dependency inversion principle added a fair part to the overall confusion. Often, .NET interfaces are used to introduce an abstraction, but the truth is, .NET interfaces don’t automatically make your entities abstract. Abstraction is something that depends on the context it is being used in. .NET interfaces, on the other hand, is just a language feature which can be used - along with the other language features - to introduce an abstraction.
\\nThere’s virtually no difference between using a .NET interface and a class. If your code depend on a .NET interface, it doesn’t mean it depends on abstraction. Furthermore, a dependency on a class doesn’t automatically make your code dependent on implementation details.
\\nThis diagram shows the relationships between those notions. .NET interfaces and classes that are well-defined have a solid and clean interface which make them a good abstraction of the concept they describe.
\\nOn the other hand, poorly designed classes tend to expose their internal structure to the clients and thus break encapsulation principles. As for poorly designed .NET interfaces, there’s actually a special term: Header interfaces. It means that, rather than introducing some high-level cohesive set of methods that belong to a specific role, the interface just mimics the structure of a class. And, of course, whether or not a class or .NET interface is designed poorly, depends on the context.
\\nWhile it’s true that, from a client code perspective, there’s no difference in using a .NET interface or a class, it is not always the case if we look at the whole picture.
\\nHow often do you see an interface with a single implementation? Frequently, they are created mechanically just for the sake of \\"loose coupling\\":
\\npublic interface IPersonRepository\\n{\\n void Save(Person person);\\n Person GetById(int id);\\n}\\n \\npublic class PersonRepository : IPersonRepository\\n{\\n public void Save(Person person)\\n {\\n /* ... */\\n }\\n \\n public Person GetById(int id)\\n {\\n /* ... */\\n }\\n}\\n
But the thing is, just as with abstractions, using a .NET interface doesn’t automatically make your code more decoupled. Also, such approach contradicts YAGNI principle. Adding a new .NET interface should always be justified. Creating a .NET interface with a single implementation is a design smell.
\\nThe last thing I’d like to mention is unit testing. There’s some uncertainty in the question of whether or not the creation of .NET interfaces for the purpose of mocking is a smell. I personally tend to avoid such situations, but it’s not always possible. I’ll try to describe this topic in a future post.
\\nThe term \\"Interface\\" != .NET interface. Depending on a .NET interface doesn’t mean your code depend on an abstraction.
\\nClass != implementation details. Depending on a class doesn’t make your code dependent on implementation details.
\\nI guess most developers heard the guideline stating that, when designing methods, you should return the most specific type and accept the most generic one. Is it always applicable? Let’s try to look at it from different perspectives.
\\nLet’s start with some simple example. Let’s say we have the following class hierarchy:
\\npublic abstract class Person\\n{\\n}\\n \\npublic class Employee : Person\\n{\\n}\\n \\npublic class Customer : Person\\n{\\n}\\n
And the following method:
\\npublic Employee GetMostValuableEmployee()\\n{\\n /* Method body */\\n}\\n
In this concrete example, it makes sense to return an Employee object. The Employee class might introduce some functionality specific to employees so it is a good idea to enable the client code to work with it. Changing the return value type to a more generic one puts too many restrictions to the method consumers. In the worst case, the client code would need to manually cast the Person object to an Employee, so it is better to not make them do that.
\\nIt is not always possible to return a value of a specific type (e.g. Employee or Customer). For example, if you have some polymorphic collection with objects of both types, you might have no choice other than returning an object of Person type:
\\npublic Person GetWhoEnteredOfficeLast()\\n{\\n /* Method body */\\n}\\n
That brings us to the first part of the guideline: make your methods return values of the most specific type possible.
\\nLet’s look at another method:
\\npublic void SavePerson(Person person)\\n{\\n /* Method body */\\n}\\n
It’s a good decision to use the Person type for the input parameter instead of Employee or Customer because it enables the client code to use this method for operating a wider range of objects. It means that the client can pass this method customers and employees, and it will work for any of them.
\\nOf course, it’s not always possible to use generic types for input parameters. Otherwise, we would always use Object as the type of the method parameters. That brings us to the second part of the guideline: make your methods accept parameters of the most generic types possible.
\\nOkay, but why bother? Why do we need to follow these guidelines? It turns out, there’s some higher level reason behind these rules. That is, you should always try to make it easier for the clients of your code to use your API, even if the client is you. Good software design is all about simplicity. The simpler and easier you make it, the better.
\\nAlright, the example with a person is rather obvious. But what about collections? Does this principle apply to them as well?
\\nYou might have heard about another popular guideline stating that you should prefer interfaces over concrete types when dealing with collections. That means that rather than returning a List (or Dictionary, or HashSet) object:
\\npublic List<Employee> GetAllEmployees()\\n{\\n /* Method body */\\n}\\n
You need to introduce an IList interface:
\\npublic IList<Employee> GetAllEmployees()\\n{\\n /* Method body */\\n}\\n
Or even IEnumerable:
\\npublic IEnumerable<Employee> GetAllEmployees()\\n{\\n /* Method body */\\n}\\n
But List is a more specific type than IList and IEnumerable interfaces. Isn’t there a contradiction between these guidelines?
\\nTo answer this question, we need to step back and answer another one. Do we make it easier for our users to use the API if we introduce an interface rather than a concrete collection type for return values?
\\nIt turns out that it depends on two things:
\\nWhether or not our software is a 3rd party library.
\\nWhether or not the method we are talking about is a part of its public API.
\\nIf we build an in-house software to which source code we have full access, there’s no need to introduce an interface. In this case, we should follow the guideline that stands for returning objects of the most concrete type, which is the List class.
\\nSome developers advocate we use interfaces in every situation where we need to return a collection. They argue that such behavior will pay-off later, when we decide to change the type of the collection.
\\nClearly, doing so breaks YAGNI principle in the case of an in-house development and leads to premature generalization. If you have full access to the code you develop, the effective price of changing it is close to zero. There’s no need to introduce an interface in advance because you can always do it later when it becomes clear that you have to.
\\nOn the other hand, the situation with 3rd party libraries is quite different. If you develop a redistributable library, your users might not be able to stand breaking changes you introduce, so you need to plan your API at least a little bit ahead of time. That means that you do need to anticipate some future changes in it and thus make steps towards some generalization even if it’s not required right now.
\\nIt means that you should always prefer interfaces over concrete collection types for methods that are parts of a redistributable library’s public API. This statement is true for both returning value and input parameter types.
\\nNote that even in case of a redistributable library, the rule above is applicable only to the methods that comprise its public API. For private methods, it is still better to use a concrete collection type.
\\nThere’s another case where you would want to use interfaces even if you work with an in-house software. If a method returns a collection which your client code cannot change, it is a good design decision to convey this by using a read-only interface. But don’t use IEnumerable to represent a read-only collection. In most cases, IReadOnlyList or IReadOnlyCollection would be a better choice for that purpose.
\\nIf you develop an in-house software, do use the specific type for return values and the most generic type for input parameters even in case of collections.
\\nIf a method is a part of a redistributable library’s public API, use interfaces instead of concrete collection types to introduce both return values and input parameters.
\\nIf a method returns a read-only collection, show that by using IReadOnlyList or IReadOnlyCollection as the return value type.
\\nI guess you already know about the safe navigation operator (?.
operator) coming up in C# 6. While it’s a nice syntactic sugar for quite a few cases, I’d like to point out some misuses of it I’m sure we will see when C# 6 is released.
I remember I read on one of the forums someone said that C# team should make ?.
operator the default behavior for navigation instead of the .
one. Let’s talk about it a little bit. Why not use it as the default option, really? Or, in other words, why not use the ?.
operator everywhere, just in case?
One of the main purposes of your code is to reveal your intention to other developers. Using the ?.
operator as the default option leads to a situation where developers are unable to understand whether a method is designed to return null or someone just put this check there because it was easy to do.
public void DoSomething(Order order)\\n{\\n string customerName = order?.Customer?.Name;\\n}\\n
Does this method really expect the order parameter to be null? Is the Customer property of it also supposed to turn into null? The code conveys that yes, both the order parameter and the Customer property can be null. But it’s no longer the case if the author has put it there without consciousness.
\\n\\nPutting the ?.
operator in code without an actual need for it leads to confusion and misunderstanding.
This strongly correlates with the lack of non-nullable reference types in C#. It would be much easier to follow the program flow if they are introduced. Also, such code would be simply illegal:
\\n\\npublic void DoSomething(Order! order)\\n{\\n Customer customer = order?.Customer; // Compiler error: order can\'t be null\\n}\\n
Another possible misuse of the ?.
operator is relying on nulls too much. Look at this code sample:
List<string> list = null;\\nint count;\\nif (list != null)\\n{\\n count = list.Count;\\n}\\nelse\\n{\\n count = 0;\\n}\\n
It has an obvious design smell. Instead of using null, it is preferable to leverage the Null Object design pattern instead:
\\n\\n// An empty list is an example of the Null Object design pattern usage\\nList<string> list = new List<string>();\\nint count = list.Count;\\n
Now, with the new ?.
operator, the design smell isn’t so obvious anymore:
List<string> list = null;\\nint count = list?.Count ?? 0;\\n
In most cases, Null Object pattern is much better choice than null. Not only does it allow for null checks elimination, but it also helps better express domain model throughout the code. The use of the ?.
operator might assist with the null checks elimination, but it can never substitute the need for domain modeling.
It might seem extremely easy to write such code:
\\n\\npublic void Process()\\n{\\n int result = DoProcess(new Order(), null);\\n}\\n\\nprivate int DoProcess(Order order, Processor processor)\\n{\\n return processor?.Process(order) ?? 0;\\n}\\n
Whereas, it would be more convenient to introduce a Null Object:
\\n\\npublic void Process()\\n{\\n var processor = new EmptyProcessor();\\n int result = DoProcess(new Order(), processor);\\n}\\n\\nprivate int DoProcess(Order order, Processor processor)\\n{\\n return processor.Process(order);\\n}\\n
Often, code like the following is shown as an example of a good use for the new ?.
operator:
public void DoSomething(Customer customer)\\n{\\n string address = customer?.Employees\\n ?.SingleOrDefault(x => x.IsAdmin)?.Address?.ToString();\\n SendPackage(address);\\n}\\n
While it’s true that such approach allows you to reduce the number of lines in the method, it implies that such sequence of method invocations itself is somewhat acceptable.
\\n\\nThe code above breaks encapsulation principles. From a domain modeling perspective, it is much more convenient to introduce a separate method:
\\n\\npublic void DoSomething(Customer customer)\\n{\\n Contract.Requires(customer != null);\\n\\n string address = customer.GetAdminAddress();\\n SendPackage(address);\\n}\\n
Thus, keeping objects’ encapsulation in place and eliminating the need for the null checks altogether. The use of the ?.
operator may hide problems with entities’ encapsulation. It is better to resist the temptation to use the ?.
operator in such cases, even if it might seem extremely easy to chain properties and methods one after another.
?.
operator in C# 6: use casesSo, what are the actual use cases for the new ?.
operator? First of all, it is legacy code. If you work with code or a library to which you don’t have access (or just don’t want to touch it), you might not have any choice other than working with the model it introduces. In this case, you can leverage the new safe navigation operator in a way that helps you to reduce the amount of lines required to work with the legacy code base. Another good example would be raising an event:
protected void OnNameChanged(string name)\\n{\\n NameChanged?.Invoke(name);\\n}\\n
Other use cases boil down to the ones that don’t fall to the 3 categories described earlier.
\\n\\nWhile the safe navigation operator can help reduce the number of lines in some cases, it can also hide design smells which would be more apparent without it.
\\n\\nYou can use an imaginary experiment to decide whether or not you should use the ?.
operator. Just think if the code without it is acceptable. If it is, then use the operator. Otherwise, try to refactor the code and remove the design flaws, rather than hide them.
We often think that relational and NoSQL databases are somewhat incompatible. But what if we could use both within a single domain model? I’d like to show how to combine SQL Server and MongoDB together and discuss what benefits we could get from it.
\\nRelational databases is a good choice for many software development projects, but there are some particular use cases they are not very good at.
\\nOne of such use cases is many-to-many relations. I think it is the toughest moment in the overall picture of the paradigm mismatch between OOP and relational stores. Converting hierarchical data structures into flat tables requires quite a bit of work and thus usually appears to be an expensive operation.
\\nThe problem emerges when the number of relations exceeds some limit. And that limit is usually quite moderate.
\\nLet’s take a concrete example. Let’s say we need to model a network of nodes with relations between them:
\\nOn the application side, the code can look as simple as this:
\\npublic class Node\\n{\\n public int Id { get; private set; }\\n public string Name { get; private set; }\\n public List<Node> Links { get; private set; }\\n \\n public Node(string name)\\n {\\n Name = name;\\n Links = new List<Node>();\\n }\\n \\n public void AddLink(Node node)\\n {\\n Links.Add(node);\\n node.Links.Add(this);\\n }\\n}\\n
On the database side, we just need two tables to handle this situation: one for nodes themselves and the other for links between them.
\\nIf a typical node has not more than, say, twenty connections to other nodes, this solution works well in terms of performance and scalability. The performance becomes worse with a larger number of connections. And if there are thousands of them, it degrades drastically.
\\nOkay, so how could we mitigate the problem? The reason why the performance got bad is that in order to persist the data, we need to flatten it first. This solution leads to creating N+1 rows in the database: one for the node and N for its connections.
\\nIf we could store all the data associated with a node in a single row, that could potentially solve our problem.
\\nHere’s where NoSQL databases come into play. They introduce the concept of document which is exactly what we need here. We can store all the information attached to a node within a single document and thus eliminate the need of flattening the data.
\\nOf course, theoretically, we could just move all the data to some NoSQL storage, but what if we don’t want to get rid of our relational database completely? How can we combine them together?
\\nUsing an ORM, of course! NHibernate has a lot of extension points we can resort to in order to mix our own custom logic in the process of saving and retrieving data. With it, we can override the default persistence behavior for certain entities. We can store nodes in SQL Server just as we did before. Their links, at the same time, can be persisted in MongoDB.
\\nLet’s see how we can do it. First, we need to register an event listener that is being executed after a node is loaded:
\\nprivate static ISessionFactoryBuildSessionFactory(string connectionString)\\n{\\n FluentConfiguration configuration = Fluently.Configure()\\n .Database(MsSqlConfiguration.MsSql2012.ConnectionString(connectionString))\\n .Mappings(m => m.FluentMappings.AddFromAssembly(Assembly.GetExecutingAssembly()))\\n .ExposeConfiguration(x =>\\n {\\n x.EventListeners.PostLoadEventListeners = new IPostLoadEventListener[]\\n {\\n new EventListener()\\n };\\n });\\n \\n return configuration.BuildSessionFactory();\\n}\\n
This event listener builds the node up with the links from MongoDB:
\\ninternal class EventListener : IPostLoadEventListener\\n{\\n public void OnPostLoad(PostLoadEvent ev)\\n {\\n var networkNode = ev.EntityasMongoNode;\\n \\n if (networkNode == null)\\n return ;\\n \\n var repository = new NodeLinkRepository();\\n IList<NodeLink> linksFromMongo = repository.GetLinks(networkNode.Id);\\n \\n HashSet<NodeLink> links = (HashSet<NodeLink>)networkNode\\n .GetType()\\n .GetProperty(\\"LinksInternal\\", BindingFlags.NonPublic | BindingFlags.Instance)\\n .GetValue(networkNode);\\n links.UnionWith(linksFromMongo);\\n }\\n}\\n
To save the links, we need to use an interceptor:
\\ninternal static ISessionOpenSession()\\n{\\n return _factory.OpenSession(new Interceptor());\\n}\\ninternal class Interceptor : EmptyInterceptor\\n{\\n public override void PostFlush(ICollection entities)\\n {\\n IEnumerable<MongoNode> nodes = entities.OfType<MongoNode>();\\n \\n if (!nodes.Any())\\n return ;\\n \\n var repository = new NodeLinkRepository();\\n Task[] tasks = nodes.Select(x => repository.SaveLinks(x.Id, x.Links)).ToArray();\\n Task.WaitAll(tasks);\\n }\\n}\\n
The PostFlush method is being called every time NHibernate is done synchronizing objects in memory with the database. If any exception takes place during storing data in SQL Server (due to, for example, unique constraint violation), we don’t get the links saved in MongoDB.
\\nFor saving part, we could also use an event listener, but interceptors have an advantage here which is the ability to process all nodes in a single batch as MongoDB driver allows us to access the database asynchronously.
\\nYou can find the full source code on GitHub.
\\nOkay, so what are the performance benefits, exactly? I have executed some performance tests. For them, I used the most common use cases for this domain model which are creating links between nodes and fetching an existing node with all its links into memory. I created 1000 nodes and linked them with each other. After that, I loaded each of them into memory one by one.
\\nHere are the average results for 10 runs on my machine (the results are displayed in seconds):
\\nAs you can see, the hybrid solution that uses both SQL Server and MongoDB is almost 6 times more performant on saves and more than 4 times faster on reads than the one with SQL Server only.
\\nAnd there’s still some room for further performance optimization with MongoDB. Actually, I was able to increase the performance of writes by 1.5 times by executing them in batches using PLINQ. That gives us about 8.5x performance speedup on writes. The only problem with the use of PLINQ is that such approach is unstable due to some problems with the connection pooling in MongoDB driver. That might be changed after the issue is fixed.
\\nThere are some limitations in this approach.
\\nFirstly, as we store data in different databases, we can’t use transactions and thus, lose the benefits they introduce. While the links are not getting stored if an exception takes place on the SQL Server side, the otherwise is not true. That means that, in order to revert the SQL Server update after an exception on the MongoDB side, we need to introduce some compensation logic.
\\nSecondly, MongoDB limits the size of documents to 16 Mb each, so if you want to store more than 16Mb of links for each node, you need to flatten the structure of the collections in MongoDB and thus lose some performance benefits. Frankly, 16 Mb is quite a lot, so I don’t think this particular limitation is a deal breaker for any but the largest projects.
\\nCombining SQL Server with MongoDB using NHibernate can be a good solution if you have lots of hierarchical data and don’t want to get rid of your relational database completely. It can help you to significantly increase the performance in the cases relational databases are not very good at.
\\nYou can find the full source code for this tutorial on GitHub.
\\nToday, I’d like to discuss a particular case with validating input data using NHibernate event listeners.
\\nIn this post, Peter van Ooijen writes about an interesting example of using NHibernate event listeners. In short, he proposes to utilize them to create a single point in application where all validation logic takes place:
\\nWhile such approach seems compelling, I’d like to raise a question: is it really an appropriate place to put the validation logic in?
\\nThere are several things ORMs are really good at. One of them (and the most important one) is hiding complexity that regards to object-relational mapping from the domain model. Ideally, ORM would allow us to create a clean separation of concerns in which domain classes and database wouldn’t interact with each other directly.
\\nSeparation of concerns also means that neither ORM nor database should contain any of the domain knowledge, to which the validation logic belongs.
\\nOkay, so what’s exactly the problem with validation logic inside the NHibernate event listeners?
\\nThe first problem with this approach is the lack of explicitness. That is, if you follow DDD principles, you want your code base to reflect your domain as close and as explicitly as possible. The domain your software is aimed to solve problems in is the thing on which you should focus the most of your efforts.
\\nIf your domain contains some sophisticated validation rules (and even if they are rather simple), you need to express them in a lucid way. Not only should you clearly define the rules themselves, but also explicitly indicate where they are being invoked.
\\nPutting invocation of those rules in the infrastructure layer, which NHibernate event listeners represent, leads to the growth of ambiguity and breaks the Separation of Concerns principle.
\\nThe second, and I guess the biggest problem here is the lack of invariant support. In most cases, you want your domain entities to be in a valid state during all their lifespan. That makes it extremely easy to follow the program flow as in every single point of your application, you are sure that the objects you operate don’t contain any invalid data and you don’t have to perform any checks on them. Check out my primitive obsession article to read more on this topic.
\\nCreating non-static Validate method in domain classes assumes that they can stay in an invalid state. And that means domain objects don’t maintain their invariants. It also creates temporary coupling: you need to not forget to invoke Validate() before you want to use a domain object.
\\nA better way of doing this is using guard clauses. They can protect a domain object’s state from being corrupted in the first place:
\\npublic class Customer : Entity\\n{\\n public string Name { get; private set; }\\n public int NumberOfEmployees { get; private set; }\\n \\n public Customer(string name, int numberOfEmployees)\\n {\\n Contracts.Require(!string.IsNullOrWhiteSpace(name));\\n Contracts.Require(numberOfEmployees > 0);\\n \\n Name = name;\\n NumberOfEmployees = numberOfEmployees;\\n }\\n}\\n
That way you are always sure your domain objects keep their invariants and you can use them freely wherever you want.
\\nAnother issue here is that this approach assumes data in the database can be invalid.
\\nWhile it’s fine to have some checks on read operations that could validate data coming from the database, invalid data itself shouldn’t be treated as something acceptable. Your database resides in your \\"trusted boundary\\". Just like your domain objects shouldn’t contain any invalid data, your database shouldn’t contain it either.
\\nWhat those checks could do is they could inform developers that there is something wrong with the data in the database. A DBA or developers could then transform this data throughout the database to make it valid.
\\nBut this is an exceptional case, and, of course, developers should try to keep data in database valid in the first place.
\\nBut what if your domain objects\' invariants have changed? For example, your customers now must have at least 5 employees instead of one. The code itself is an easy part, you can just change the guard clause to reflect the new requirement:
\\nContracts.Require(numberOfEmployees > 5);\\n
But what to do with the data?
\\nWell, the data in the database needs to be migrated as well. Ask your product owner what to do with the existing customers that don’t have enough employees and create a migration script to implement this change. Your application’s database is a part of your software, so it needs to be changed every time your invariants change just like your code does.
\\nOkay, I think we are now clear that the approach described above is not the best way to deal with validation. But what is it then? Where to perform the validation?
\\nI strongly believe that validation should take place at the borders of your domain. Your domain model, as well as your database, is inside your trusted boundary. That is, the boundary in which no invalid data can sneak, so you could freely make assumptions (supported by preconditions, preconditions, and invariants) about its validity.
\\nIf an invalid data is inside your domain model, it is simply too late to do anything. The best place to invoke the validation is application layer (e.g. controllers or presenters):
\\nI wrote on a similar topic in my code contracts vs input validation article, so check it out to read more about input data validation.
\\nIt is interesting to talk about valid use cases for NHibernate event listeners. I think they do a really great job in two situations:
\\nWhen you need to add audition parameters to every domain object: who created or updated it and when.
\\nWhen you need to fire your domain events. You can look at an example of it here.
\\nI don’t claim this is an exhaustive list, but I can’t come up with any other use cases right now.
\\nLet’s recap:
\\nDomain objects should maintain their invariants.
\\nValidation needs to be invoked explicitly at the domain model borders.
\\nDatabase shouldn’t contain invalid data.
\\nCQRS is a pretty defined concept. Often, people say that you either follow CQRS or not, meaning that it is some kind of a binary choice. In this article, I’d like to show that there is some wriggle room in this notion and how different types of CQRS can look like.
\\nWith this type, you don’t have any CQRS whatsoever. That means you have a domain model and you use your domain classes for both serving commands and executing queries.
\\nLet’s say you have a Customer class:
\\npublic class Customer\\n{\\n public int Id { get; private set; }\\n public string Name { get; private set; }\\n public IReadOnlyList<Order> Orders { get; private set; }\\n \\n public void AddOrder(Order order)\\n {\\n /* ... */\\n }\\n \\n /* Other methods */\\n}\\n
With the type 0 of CQRS you end up with CustomerRepository class looking like this:
\\npublic class CustomerRepository\\n{\\n public void Save(Customer customer) { /* ... */ }\\n public Customer GetById(int id) { /* ... */ }\\n public IReadOnlyList<Customer> Search(string name) { /* ... */ }\\n}\\n
Search method here is a query. It is used for fetching customers\' data from database and returning it to a client (a UI layer or a separate application accessing your server through some API). Note that this method returns a list of domain objects.
\\nThe advantage of such approach is obvious: it has no code overhead. In other words, you have a single model that you use for both commands and queries and don’t have to duplicate the code at all.
\\nThe disadvantage here is that this single model is not optimized for read operations. If you need to show a list of customers in UI, you usually don’t want to display their orders. Instead, you most likely prefer to show only a brief information such as id, name and the number of orders.
\\nThe use of a domain class for transferring customers\' data from the database to UI leads to loading all their orders into memory and thus introduces a heavy overhead because UI needs the order count field only, not the orders themselves.
\\nThis type of CQRS is good for small applications with little or no performance requirements. For other types of applications, we need to move further.
\\nWith this type of CQRS, you have your class structure separated for read and write operations. That means you create a set of DTOs to transfer the data you fetch from the database.
\\nThe DTO for Customer can look like this:
\\npublic class CustomerDto\\n{\\n public int Id { get; set; }\\n public string Name { get; set; }\\n public int OrderCount { get; set; }\\n}\\n
The Search method now returns a list of DTOs instead of a list of domain objects:
\\npublic class CustomerRepository\\n{\\n public void Save(Customer customer) { /* ... */ }\\n public Customer GetById(int id) { /* ... */ }\\n public IReadOnlyList<CustomerDto> Search(string name) { /* ... */ }\\n}\\n
The Search method can use either an ORM or plain ADO.NET to get the data needed. This should be determined by performance requirements in each particular case. There’s no need to fall back to ADO.NET if a method’s performance is good enough.
\\nDTOs introduce some duplication as we need to come up with the same concept twice: once for commands in a form of a domain class and once more for queries in a form of a DTO. But at the same time, they allow us to create clean and explicit data structures that perfectly align with our needs for read operations as they only contain data clients need to display. And the more explicit we are with our code, the better.
\\nI would say that this type of CQRS is sufficient for most of enterprise applications as it gives a pretty good balance between code complexity and performance. Also, with this approach, we have some flexibility in terms of what tool to use for queries. If the performance of a method is not critical, we can use ORM and save developers\' time; otherwise, we may fall back to ADO.NET (or some lightweight ORM like Dapper) and write complex and optimized queries on our own.
\\nIf we want to continue separating our read and write models, we need to move further.
\\nThis type of CQRS proposes using separated models and sets of API for serving read and write requests.
\\nThat means that, in addition to DTOs, we extract all the read logic out of our model. Repository now contains only methods that regard to commands:
\\npublic class CustomerRepository\\n{\\n public void Save(Customer customer) { /* ... */ }\\n public Customer GetById(int id) { /* ... */ }\\n}\\n
And the search logic resides in a separate class:
\\npublic class SearchCustomerQueryHandler\\n{\\n public IReadOnlyList<CustomerDto> Execute(SearchCustomerQuery query)\\n {\\n /* ... */\\n }\\n}\\n
This approach introduces more overhead comparing to the previous one in terms of code required to handle the complexity, but it is a good solution if you have a heavy read workload.
\\nIn addition to the ability to write optimized queries, type 2 of CQRS allows us to easily wrap read portion of API with some caching mechanism or even move read API to another server and setup a load-balancer/failover cluster. It works great if you have a massive disparity between the workload of writes and reads in your system as it allows you to scale the read part of it drastically.
\\nIf you need even more performance of read operations, you need to move to type 3 of CQRS.
\\nThat is the type that considered to be the true CQRS by many. To scale read operations even further, we can create a separate data storage optimized specifically for queries we have in our system. Often, such storage might be a NoSQL database like MongoDB or a replica set with several instances of it:
\\nThe synchronization goes in background mode and can take some time. Such data storages are considered to be eventually consistent.
\\nA good example here could be indexing of customers\' data with Elastic Search. Often we don’t want to use full-text search capabilities built into SQL Server as they don’t scale much. Instead, we could use non-relational data storage optimized specifically for searching customers.
\\nAlong with the best scalability for read operations, this type of CQRS brings the highest overhead. Not only should we segregate our read and write model logically, i.e. use different classes and even assemblies for it, but we also need to introduce database-level separation.
\\nThere are different types of CQRS you can leverage in your software; there’s nothing wrong with sticking to the type #1 and not moving further to the types 2 or 3 as long as the type #1 meets your application’s requirements.
\\nI’d like to emphasize this once more: CQRS is not a binary choice. There are some different variations between not separating reads and writes at all (type 0) and separating them completely (type 3).
\\nThere should be a balance between the degree of segregation and complexity overhead it introduces. The balance itself should be found in each concrete software application apart, often after several iterations. I strongly believe that CQRS itself should not be implemented \\"just because we can\\"; it should only be brought to the table to meet concrete requirements, namely, to scale read operations of the application.
\\nIn this article, I’d like to clarify the differences in DTO vs Value Object vs POCO where DTO stands for Data Transfer Object, and POCO is Plain Old CLR Object, also known as POJO in Java environment.
\\nFirst of all, I want to make a note regarding Value Object. There’s a similar concept in C#, namely Value Type. It’s just an implementation detail of how objects are being stored in memory and I’m not going to touch this. Value Object, which I’m going to discuss is a DDD concept. Check out this article to read more about it.
\\nAlright, let’s start.
\\nYou might have noticed that such notions as DTO, Value Object and POCO are often used interchangeably. But are they really synonyms?
\\nDTO is a class representing some data with no logic in it. DTO’s are usually used for transferring data between different applications or different layers within a single application. You can look at them as dumb bags of information the sole purpose of which is to just get this information to a recipient.
\\nOn the other hand, Value Object is a full member of your domain model. It conforms to the same rules as Entity. The only difference between Value Object and Entity is that Value Object doesn’t have its own identity. It means that two Value Objects with the same property set should be considered the same whereas two Entities differ even if their properties match.
\\nValue Objects do contain logic and, typically, they are not used for transferring data between application boundaries.
\\nPOCO (Plain Old CLR Object) is a term created as an analogy for POJO only because \\"POJO\\" itself can’t be used in .NET as the letter \\"J\\" in it stands for \\"Java\\". Thus, POCO has the same semantics as POJO.
\\nPOJO was introduced by Martin Fowler and others to oppose JavaBeans and other heavy-weight enterprise constructions that gained a lot of popularity back in early 2000’s.
\\nThe primary goal of POJO is to show that domain can be successfully modeled without complexity related to the execution environment (and JavaBeans brought a lot of it in its early versions). Moreover, the execution environment shouldn’t have anything to do with domain modeling at all.
\\nThere’s no direct analogy for JavaBeans in .NET because Microsoft has never introduced the same concept, but we can come up with some made up parallel to help express this concept.
\\nYou can think of Component class from System.ComponentModel namespace as an opposite for POCO. There are a lot of classes in .NET that inherit from Component, for example, DBCommand from System.Data and EventLog from System.Diagnostics.
\\nOf course, in most cases, you wouldn’t create a domain class inheriting from Component. It just doesn’t make any sense, because such approach brings a lot of unnecessary complexity, thus contradicting the YAGNI principle.
\\nAnother good example of non-POCO approach is Entity Framework before 4.0 version. Every class EF generated inherited from EntityObject base class and thus brought a lot of complexity specific to Entity Framework. Since version 4.0, Entity Framework introduced POCO data model which allows for use of classes that don’t inherit from EntityObject.
\\nThat said, POCO stands for use of as simple classes as possible for domain objects. This notion helps conform to YAGNI, KISS and other best practices. POCO classes can contain logic.
\\nAre there any connections between these terms? There are a few.
\\nFirst of all, DTO and Value Object represent different concepts and can’t be used interchangeably. On the other hand, POCO is a superset for DTO and Value Object:
\\nIn other words, Value Object and DTO shouldn’t inherit any heavy-weight enterprise components and thus they are POCO. At the same time, POCO is a wider set: it can be Value Object, Entity, DTO or any other class you might create as long as it doesn’t inherit complexity accidental to your domain.
\\nHere are properties for each of them:
\\nNote that POCO may both have and not have its own identity. It depends on what type of POCO it is: Value Object or Entity. Also, POCO may or may not contain logic in it. It depends on weather or not POCO is DTO.
\\nAlright, I hope I made it at least a little bit clearer. I’d like to summarize this topic with the following:
\\nDTO != Value Object
\\nDTO ⊂ POCO
\\nValue Object ⊂ POCO
\\nDid you think about how we think? How do we come up with a solution and how we decide whether it’s good or bad? It seems like a very interesting topic, so let’s dive in!
\\nOne of the popular beliefs is that our thought process is very similar to a computer program: when we decide to create something - a software, an article, etc. - we just separate the total amount of work into pieces and implement them step by step. The thinking itself takes place at the very beginning here, i.e. before we start to fulfill the tasks.
\\nWhile that might be the case for simple tasks in which we do know what we should do ahead of time, it most cases the creation process is completely different: we try to do something with a plan in mind, we encounter unexpected difficulties and start over with a new plan.
\\nThat is why agile techniques shine: they make the process of changing plans by constant evaluation and making new decisions a first-class citizen in our day-to-day work. Instead of pretending that we can predict any change that can happen with a software project, we now assume we know very little about how it should look like and thus should include regular learning and re-evaluation in our development process.
\\nI’d like to focus on the \\"Evaluation & Making a Decision\\" step of the diagram as it is where our process of thinking takes place.
\\nMany years ago, when I went to school, I noticed at one of my painting classes that while I can remember a picture and then recognize it among the others, I can’t just sit and paint it right away. Although the process of recognizing always goes smoothly, the process of recreating a picture out of the memory takes much more time and effort.
\\nIt seemed that these two distinct actions - memorization and recreation - are not symmetric. And, as I learned afterward, they really aren’t.
\\nIt turned out that when we memorize something, be it a picture, a chess board layout, or a software architecture, we don’t save it in our memory as is. What we do save is a hash of it. That is, if there was some method which allowed us to retrieve any information out of our brain, all it would give us is a set of information that is totally irrelevant to the memorized objects.
\\nWhen we see an object or encounter some situation, our brain first converts it to a hash and then uses pattern matching to identify it. Each of hashes saved in our memory has some action items attached to it: the experience gained during our life. We use those items to decide how to react to a particular stimulus:
\\nIn the course of time, evolution optimized our mentality making it extremely easy to memorize and then recognize a visual image. On the other hand, the reverse operation wasn’t used at all. It was life-and-death to quickly recognize a tiger, but painting tiger on a rock certainly wasn’t so important.
\\nThat said, our mentation is inherently nonlinear. Whether we code or draw a picture, the process of thinking remains very similar. First, we see a picture or a code sample written so far, then, we search for similar samples in our knowledge base. After that, we look for the action items attached to it and make a change according to one of them. Then, the process repeats.
\\nYou might have noticed that when you write code, you can’t see how exactly your code will look like. That is true even if you solve simple tasks because that is how our brains work: you constantly re-evaluate the situation and if there are any action items attached to it in your knowledge base, you implement them.
\\nAll said above has an important corollary: \\"outside in\\" learning doesn’t work. You can’t learn OOP principles without writing a program using classes and you can’t learn functional programming principles without writing a program using a functional language.
\\nI remember how I studied math in university. A professor gave us a lecture describing the details of, say, combinatorial analysis and of course we didn’t get a word. After that, we had a workshop in which the professor explained how to solve particular problems applying the theory he taught us at the lecture and all of the sudden, the theory started making sense.
\\nOnly after we were given several specific problems and solutions to them did we start to understand the theory behind those solutions. Now it seems clear that it would be much more productive if we had started with some minimum amount of theory, switch to practice and only after that proceed to more in-depth lectures.
\\nWe learn by example. They form our knowledge base - set of hashes in our memory. Experience and practice help us attach action items to those hashes: what steps we should take to solve a concrete problem or improve a concrete solution. We use those hashes as building blocks to make generalizations. We derive general rules by grouping several separate facts by their similarity or by action items attached to them.
\\nDid you think about how mathematicians prove theorems? In most cases, they are already sure (or have a reasonable suspicion) that the theorem is correct. They infer that suspicion from a bunch of facts they know. The difficulty then, is to come up with a formal proof, not with the idea of the theorem itself.
\\nPower of our intellect can be represented as speed and accuracy with which we can search in our knowledge base. On the other hand, experience is the knowledge base itself: it represents how many hashes we store in memory and how many action items are connected to them.
\\nIt is interesting to compare human and computer intelligence. Let’s take chess as an example of clashing these two completely different types of thinking.
\\nThe basic algorithm behind a chess program is to look at several (10 or even 15) moves ahead and crunch all possible positions trying to find the best move. Of course, we can’t do such thing. Moreover, if you try to mimic a chess program and find all the moves in advance, you will inevitably fail.
\\nWe are not computers, we think in a completely different way. Do you know how Kasparov managed to beat Deep Blue (then the best chess program) back in 90’s? He didn’t try to think ten moves ahead, that is simply impossible. What he did is he recognized concrete patterns on the board and acted accordingly the action items from his huge knowledge base.
\\nI see only one move ahead, but it is always the correct one. – Jose R. Capablanca.
\\nAnother telling example is Go game. It is one of the few board games in which top computer programs can’t get even close to top human players. The point here is that the tree of possible moves in this game is much wider than in chess. That means that the classic exhaustive search algorithms used in chess programs surrender to human’s memorization and pattern matching method.
\\nOne possible way to take situation over would probably be to mimic the human’s process of learning: build a neural network and train it with as large knowledge base as possible, but I’m not familiar with such initiatives.
\\nWhat resume can we conclude from this? The key to becoming a master in something is to absorb as many separate facts (hashes) and implications of those facts (action items) as possible. There’s no universal algorithm for that, only hard work of learning by examples.
\\nGreat chess masters become great not because they think ahead more moves than others, but because they store a huge knowledge base in their memory. Similarly, great software developers become great when they get to know almost all of the problems they might encounter while developing software. Software design principles help generalize that knowledge base but can never substitute it.
\\nIf you use Resharper, you must have been using some (or maybe most) of its features already. But what I see a lot is that some really useful features are left unattended. I want to describe those lesser known yet very useful features that can help you in your day-to-day work.
\\nMoving methods and properties up and down helps organizing code a lot. It becomes really helpful if you get used to it: at some point in the future, you won’t be able to even imagine how to code without this feature:
\\nNot only does it work on the class level, but it also helps to move statements inside a method:
\\nAnd even in the method’s declaration:
\\nThe only problem with this feature is that by default it’s mapped to very unhandy shortcuts - Ctrl + Shift + Alt + Up/Down/Left/Right - which makes it almost completely unusable.
\\nTo fix it, I have remapped it to Alt + Up/Down/Left/Right. With this change, the feature became #1 assistant in my Resharper arsenal.
\\nTo change the mappings, go to Tools → Options → Environment → Keyboard and filter the commands by \\"ReSharper_MoveUp\\", \\"ReSharper_MoveDown\\", \\"ReSharper_MoveLeft\\", \\"ReSharper_MoveRight\\" keywords. You need to assign the shortcuts for both Text Editor and Global contexts:
\\nThis is another feature I use a lot while coding. It allows you to jump to the next or previous method in the class and is especially useful in conjunction with the previous one:
\\nHowever, by default, it is bound to Alt + Up/Down keys we just used for moving code up and down, so I rebound it to Ctrl + Up/Down shortcuts.
\\nIf you think of it a little bit, such binding starts making more sense then the default one. In Windows, Alt key is usually used for *altering* something, whereas Ctrl key - for *jumping* to some distant piece of interface.
\\nTo change the mappings, go to keyboard settings (Tools → Options → Environment → Keyboard) and filter commands by \\"ReSharper_GotoNextMethod\\" and \\"ReSharper_GotoPrevMethod\\" keywords.
\\nAnother invaluable feature is Go to Next/Previous error. It allows you to quickly iterate through all compiler errors in the current solution:
\\nJust as with previous two features, the default key bindings for it is rather unhandy (Shift + Alt + PageDown/PageUp in Visual Studio scheme and Alt + F12 / Shift + Alt + F12 in IDEA scheme), so I changed the bindings to Ctrl + Shift + Down/Up.
\\nTo do it, filter the commands by \\"ReSharper_GotoNextErrorInSolution\\" and \\"ReSharper_GotoPrevErrorInSolution\\" in the keyboard settings.
\\nThe fourth feature I encourage you to use (if you aren’t already) is Go to Containing Method/Class:
\\nIt allows you to navigate to the method which the current line of code belongs to. Or to the parent class if the selected line doesn’t belong to any method (or a method name is selected).
\\nThe default binding is Ctrl + [, and, unlike the previous default shortcuts, this one fits it perfectly.
\\nNumber 5 useful Resharper feature is Select Containing Method/Class, which is accessible by Ctrl + Shift + [.
\\nIt is much the same as the previous one but instead of navigating to the containing method it allows us to select it. And if you want to select the whole class, just press Ctrl + Shift + [ twice:
\\nAbility to see all the recent edits is very useful if you are lost in the source code and want to go back to the previously edited code:
\\nThe default mapping for this feature is Ctrl + Shift + Comma in Visual Studio scheme and Ctrl + Shift + Alt + Backspace in IDEA scheme. To me, Visual Studio version is fine, but if you use IDEA scheme, you might want to change the bindings to Ctrl + Shift + Comma or any other shortcut you find suitable.
\\nTo change it, filter the commands in the Keyboard settings window by \\"ReSharper_GotoRecentEdits\\" keyword.
\\nThere’s also another similar feature which allows you to go to the last edit location (Ctrl + Shift + Backspace), but I found myself using it quite rarely as the Go to Recent Edits feature covers all the use cases this feature may address.
\\nThis is another must-have feature which allows you to locate the current class in the solution explorer:
\\nThe default binding for it is Shift + Alt + L.
\\nThe last lesser know tip I recommend you to be armed with is ability to quickly surround any line of code with curly brackets.
\\nTo do it, select a line (or several lines) of code and press Alt + Enter. In most cases, the required option will be the first in the list, so all you’ll need to do is press Enter button again:
\\nWrapping all said, I’ve created a small PDF file I hope will be helpful to remember the features we discussed today: link to PDF file
\\nMost of its content intersects with the existing Keymaps PDF JetBrains introduced:
\\nBut I have changed the mappings to comply with the shortcuts we set in this article.
\\nThese 8 features, in conjunction with other well known Resharper shortcuts, have given me a huge productivity boost. I hope it will help you as well.
\\nThe topic described in this article is a part of my Applying Functional Principles in C# Pluralsight course.
\\nIn this article, I’m going to write about how to deal with failures and invalid input in a functional way.
\\nFunctional C#: Handling failures and input errors
\\nThe concept of validation and error processing is well known, but the code required to handle it may become really annoying in languages like C#. This article is inspired by Railway Oriented Programming which was introduced by Scott Wlaschin in his talk at NDC Oslo. I encourage you to watch the full video as it gives invaluable insights of how awkward our day-to-day C# code might be.
\\nLook at the sample below:
\\n[HttpPost]\\npublic HttpResponseMessage CreateCustomer(string name, string billingInfo)\\n{\\n Customer customer = new Customer(name);\\n \\n _repository.Save(customer);\\n \\n _paymentGateway.ChargeCommission(billingInfo);\\n \\n _emailSender.SendGreetings(name);\\n \\n return new HttpResponseMessage(HttpStatusCode.OK);\\n}\\n
It seems easy and straightforward: first we create a customer instance, then save it, after that charge some commission and finally send a greetings e-mail. The problem with this code is that it handles the happy path only, i.e. the path in which everything goes just fine.
\\nWhen you start taking into account potential failures, input errors and logging routine, the method starts turning into boilerplate code:
\\n[HttpPost]\\npublic HttpResponseMessage CreateCustomer(string name, string billingInfo)\\n{\\n Result<CustomerName> customerNameResult = CustomerName.Create(name);\\n if (customerNameResult.Failure)\\n {\\n _logger.Log(customerNameResult.Error);\\n return Error(customerNameResult.Error);\\n }\\n \\n Result<BillingInfo> billingInfoResult = BillingInfo.Create(billingInfo);\\n if (billingInfoResult.Failure)\\n {\\n _logger.Log(billingInfoResult.Error);\\n return Error(billingInfoResult.Error);\\n }\\n \\n Customer customer = new Customer(customerNameResult.Value);\\n \\n try\\n {\\n _repository.Save(customer);\\n }\\n catch (SqlException)\\n {\\n _logger.Log(\\"Unable to connect to database\\");\\n return Error(\\"Unable to connect to database\\");\\n }\\n \\n _paymentGateway.ChargeCommission(billingInfoResult.Value);\\n \\n _emailSender.SendGreetings(customerNameResult.Value);\\n \\n return new HttpResponseMessage(HttpStatusCode.OK);\\n}\\n
Even worse, if we need to handle failures in both Save and ChargeCommission methods, we end up creating compensation logic so that the changes could be rolled back if one of the methods fails:
\\n[HttpPost]\\npublic HttpResponseMessage CreateCustomer(string name, string billingInfo)\\n{\\n Result<CustomerName> customerNameResult = CustomerName.Create(name);\\n if (customerNameResult.Failure)\\n {\\n _logger.Log(customerNameResult.Error);\\n return Error(customerNameResult.Error);\\n }\\n \\n Result<BillingInfo> billingIntoResult = BillingInfo.Create(billingInfo);\\n if (billingIntoResult.Failure)\\n {\\n _logger.Log(billingIntoResult.Error);\\n return Error(billingIntoResult.Error);\\n }\\n \\n try\\n {\\n _paymentGateway.ChargeCommission(billingIntoResult.Value);\\n }\\n catch (FailureException)\\n {\\n _logger.Log(\\"Unable to connect to payment gateway\\");\\n return Error(\\"Unable to connect to payment gateway\\");\\n }\\n \\n Customer customer = new Customer(customerNameResult.Value);\\n try\\n {\\n _repository.Save(customer);\\n }\\n catch (SqlException)\\n {\\n _paymentGateway.RollbackLastTransaction();\\n _logger.Log(\\"Unable to connect to database\\");\\n return Error(\\"Unable to connect to database\\");\\n }\\n \\n _emailSender.SendGreetings(customerNameResult.Value);\\n \\n return new HttpResponseMessage(HttpStatusCode.OK);\\n}\\n
You can see that our 5 lines have turned into 35 - the method has become 7 times longer! It is now really hard to follow the program flow. Those 5 lines of meaningful code are now buried under the bulk of boilerplate orchestration.
\\nCan it be fixed? Luckily, yes. Let’s go through the method and see what we can do with it.
\\nYou might have noticed that we use the technique I described in my primitive obsession article: instead of using the raw name and billingInfo strings, we are wrapping them with CustomerName and BillingInfo classes. That gives us an opportunity to put all the relevant validation logic in one place and comply with the DRY principle.
\\nThe static Create method returns a special class named Result which encapsulates all information regarding the operation’s results: an error message in case it failed and an object instance in case it succeeded.
\\nAlso, note that potential failures are wrapped with try/catch statement. Such approach breaks one of the best practices I wrote about in my Exceptions for flow control post. It states that if you know how to deal with exceptions, catch them at the lowest level possible.
\\nIt means that ChargeCommission and Save methods should catch known exceptions by themselves and return a result just as the static Create methods do. Let’s refactor the code:
\\n[HttpPost]\\npublic HttpResponseMessage CreateCustomer(string name, string billingInfo)\\n{\\n Result<CustomerName> customerNameResult = CustomerName.Create(name);\\n if (customerNameResult.Failure)\\n {\\n _logger.Log(customerNameResult.Error);\\n return Error(customerNameResult.Error);\\n }\\n \\n Result<BillingInfo> billingIntoResult = BillingInfo.Create(billingInfo);\\n if (billingIntoResult.Failure)\\n {\\n _logger.Log(billingIntoResult.Error);\\n return Error(billingIntoResult.Error);\\n }\\n \\n Result chargeResult = _paymentGateway.ChargeCommission(billingIntoResult.Value);\\n if (chargeResult.Failure)\\n {\\n _logger.Log(chargeResult.Error);\\n return Error(chargeResult.Error);\\n }\\n \\n Customer customer = new Customer(customerNameResult.Value);\\n Result saveResult = _repository.Save(customer);\\n if (saveResult.Failure)\\n {\\n _paymentGateway.RollbackLastTransaction();\\n _logger.Log(saveResult.Error);\\n return Error(saveResult.Error);\\n }\\n \\n _emailSender.SendGreetings(customerNameResult.Value);\\n \\n return new HttpResponseMessage(HttpStatusCode.OK);\\n}\\n
As you can see, now both ChargeCommission and Save methods return Result objects.
\\nThe goal of the Result class is pretty simple and very similar to the Maybe monad we discussed earlier: it allows us to reason about the code without looking into the implementation details. Here’s how it looks like (I omitted some details for brevity):
\\npublic class Result\\n{\\n public bool Success { get; private set; }\\n public string Error { get; private set; }\\n public bool Failure { /* ... */ }\\n \\n protected Result(bool success, string error) { /* ... */ }\\n \\n public static ResultFail(string message) { /* ... */ }\\n \\n public static Result<T> Ok<T>(T value) { /* ... */ }\\n}\\n \\npublic class Result<T> : Result\\n{\\n public T Value { get; set; }\\n \\n protected internal Result(T value, bool success, string error)\\n : base(success, error)\\n {\\n /* ... */\\n }\\n}\\n
Now, we can apply the same principle that functional languages use. That is where the actual magic happens:
\\n[HttpPost]\\npublic HttpResponseMessage CreateCustomer(string name, string billingInfo)\\n{\\n Result<BillingInfo> billingInfoResult = BillingInfo.Create(billingInfo);\\n Result<CustomerName> customerNameResult = CustomerName.Create(name);\\n \\n return Result.Combine(billingInfoResult, customerNameResult)\\n .OnSuccess(() => _paymentGateway.ChargeCommission(billingInfoResult.Value))\\n .OnSuccess(() => new Customer(customerNameResult.Value))\\n .OnSuccess(\\n customer => _repository.Save(customer)\\n .OnFailure(() => _paymentGateway.RollbackLastTransaction())\\n )\\n .OnSuccess(() => _emailSender.SendGreetings(customerNameResult.Value))\\n .OnBoth(result => Log(result))\\n .OnBoth(result => CreateResponseMessage(result));\\n}\\n \\n
If you are familiar with functional languages, you might have noticed that OnSuccess extension method is actually a Bind method. I named it that way just to make it clear how exactly it works.
\\nWhat the OnSuccess method basically does is it checks the previous Result instance and if it is successful, executes the delegate passed in; otherwise, it just returns the previous result. Thus, the chain continues right until one of the operations fails. And if it does, the other operations are getting skipped.
\\nOnFailure method, as you might guessed, is executed only if the previous operation failed. It is a perfect fit for the compensation logic we need to perform in case the database call wasn’t successful.
\\nOnBoth is placed at the end of the chain. It’s main use cases are logging the operations\' failure and creating the resulting messages.
\\nSo what we got here is the exact same behavior we had before but with much less of boilerplate code. As you can see, the program flow has become much easier to follow.
\\nWhat about the Command-Query Separation principle? The approach described above implies using return value (which is, in our case, an instance of the Result class) even if the method itself is a command (i.e. changes the object’s state). Is there a conflict with CQS?
\\nNo, there isn’t. Moreover, this approach increases readability in the same way following the CQS principle does. Not only does it allow you to know if a method is a command or a query, but it also allows you to see whether or not the method may fail.
\\nDesigning for failures extends the potential range of information you get from the method’s signature. Instead of 2 results (void for commands and some type for queries) you now have 4 of them.
\\nThe method is a command and it can’t fail:
\\npublic void Save(Customer customer)\\n
The method is a query and it can’t fail:
\\npublic Customer GetById(long id)\\n
The method is a command and it can fail:
\\npublic Result Save(Customer customer)\\n
The method is a query and it can fail
\\npublic Result<Customer> GetById(long id)\\n
And when I write that a method can’t fail I don’t mean it can’t fail in any given circumstances. Of course, there always is a chance of getting some kind of exception that wasn’t expected in the first place. What I mean is that the method is designed to always succeed, i.e. the developer supposes that any exception thrown within that method is unexpected (see also Exceptions for flow control in C# article for more details about expected and unexpected exceptions).
\\nWith this approach, exceptions become the thing they were intended to be initially: they signalize that there’s something wrong with your system. From that moment, they become a really helpful assistant on the way of building software.
\\nExposing your intent is crucial if you want to increase readability of your code. Introducing the Result class helps to show if the method can fail or not. OnSuccess, OnFailure, and OnBoth methods, on the other hand, help you remove boilerplate code resulting in a clean and narrow design.
\\nIn conjunction with other 3 techniques - immutability, getting rid of primitive obsession and non-nullable reference types - this approach introduces a powerful programming model that can significantly increase your productivity.
\\nFunctional C#: Handling failures and input errors
\\nThe topic described in this article is a part of my Applying Functional Principles in C# Pluralsight course.
\\nThis is the third article in my Functional C# series.
\\nFunctional C#: Non-nullable reference types
\\nLook at the code example below:
\\nCustomer customer = _repository.GetById(id);\\nConsole.WriteLine(customer.Name);\\n
Looks pretty familiar, doesn’t it? But what issues do you see in this code?
\\nThe problem here is that we don’t know for sure whether or not GetById method can return null. If there is any chance for it to return null, then we’ll be getting NullReferenceException at run-time. Even worse, there could pass a significant amount of time between getting the customer instance and using it. Therefore, the exception we get will be hard to debug because we won’t be able to easily find out where exactly the customer instance turned to null.
\\nThe faster we receive feedback, the less time it takes us to fix the problem. Of course, the quickest feedback possible can only be given by compiler. How cool would it be to just write this code and let the compiler do all the checks required?
\\nCustomer! customer = _repository.GetById(id);\\nConsole.WriteLine(customer.Name);\\n
Where Customer! stands for non-nullable type, i.e. a type whose instances can’t turn into null in any way. How cool would it be to be sure that the compiler will tell you if there are any possible code paths that return null?
\\nYep, very cool. Or even better:
\\nCustomer customer = _repository.GetById(id);\\nConsole.WriteLine(customer.Name);\\n
That is, make all the reference types non-nullable by default (just as value types are) and if we want to introduce a nullable type then put it this way:
\\nCustomer? customer = _repository.GetById(id);\\nConsole.WriteLine(customer.Name);\\n
Can you imagine a world without all those annoying null reference exceptions? Me neither.
\\nUnfortunately, non-nullable reference types can’t be introduced in C# as a language feature. Such design decisions should be implemented from the day one because otherwise they break almost every existing code base. Check out these articles to get to know more about this topic: Eric Lippert’s article, interesting but probably not realizable design proposal.
\\nBut don’t worry. Although we can’t make compiler help us on our way to leverage the power of non-nullable reference types, there still are some workarounds we can resort to. Let’s look at the Customer class we ended up with in the previous post:
\\npublic class Customer\\n{\\n public CustomerName Name { get; private set; }\\n public Email Email { get; private set; }\\n \\n public Customer(CustomerName name, Email email)\\n {\\n if (name == null)\\n throw new ArgumentNullException(\\"name\\");\\n if (email == null)\\n throw new ArgumentNullException(\\"email\\");\\n \\n Name = name;\\n Email = email;\\n }\\n \\n public void ChangeName(CustomerName name)\\n {\\n if (name == null)\\n throw new ArgumentNullException(\\"name\\");\\n \\n Name = name;\\n }\\n \\n public void ChangeEmail(Email email)\\n {\\n if (email == null)\\n throw new ArgumentNullException(\\"email\\");\\n \\n Email = email;\\n }\\n}\\n
We moved all the email and customer name validations to separate classes, but we couldn’t do anything with the null checks. As you can see, they are the only checks remaining.
\\nSo, how can we get rid of them?
\\nBy using an IL rewriter, of course! There’s a great NuGet package named NullGuard.Fody built exactly for that purpose: it weaves your assemblies with null checks all over your code base making your classes throw an exception in case a null value comes in as a parameter or comes out as a method result.
\\nTo start using it, install the package NullGuard.Fody and mark your assembly with this attribute:
\\n[assembly: NullGuard(ValidationFlags.All)]\\n
From now, every method and property in the assembly automatically gets a null validation check for any input parameter or output value. Our Customer class can now be written as simply as this:
\\npublic class Customer\\n{\\n public CustomerName Name { get; private set; }\\n public Email Email { get; private set; }\\n \\n public Customer(CustomerName name, Email email)\\n {\\n Name = name;\\n Email = email;\\n }\\n \\n public void ChangeName(CustomerName name)\\n {\\n Name = name;\\n }\\n \\n public void ChangeEmail(Email email)\\n {\\n Email = email;\\n }\\n}\\n
Or even simpler:
\\npublic class Customer\\n{\\n public CustomerName Name { get; set; }\\n public Email Email { get; set; }\\n \\n public Customer(CustomerName name, Email email)\\n {\\n Name = name;\\n Email = email;\\n }\\n}\\n
This is what gets compiled under the hood:
\\npublic class Customer\\n{\\n private CustomerName _name;\\n public CustomerName Name\\n {\\n get\\n {\\n CustomerName customerName = _name;\\n \\n if (customerName == null)\\n throw new InvalidOperationException();\\n \\n return customerName;\\n }\\n set\\n {\\n if (value == null)\\n throw new ArgumentNullException();\\n \\n _name = value;\\n }\\n }\\n \\n private Email_email;\\n public Email Email\\n {\\n get\\n {\\n Email email = _email;\\n \\n if (email == null)\\n throw new InvalidOperationException();\\n \\n return email;\\n }\\n set\\n {\\n if (value == null)\\n throw new ArgumentNullException();\\n \\n _email = value;\\n }\\n }\\n \\n public Customer(CustomerName name, Email email)\\n {\\n if (name == null)\\n throw new ArgumentNullException(\\"name\\", \\"[NullGuard] name is null.\\");\\n if (email == null)\\n throw new ArgumentNullException(\\"email\\", \\"[NullGuard] email is null.\\");\\n \\n Name = name;\\n Email = email;\\n }\\n}\\n
As you can see, the validations are equivalent to the ones we wrote manually except that there are also validations for return values which is a good thing.
\\nSo how do we state that a value of some type can be null? We need to use Maybe monad:
\\npublic struct Maybe<T>\\n{\\n private readonlyT _value;\\n \\n public T Value\\n {\\n get\\n {\\n Contracts.Require(HasValue);\\n \\n return _value;\\n }\\n }\\n \\n public bool HasValue\\n {\\n get { return _value != null; }\\n }\\n \\n public bool HasNoValue\\n {\\n get { return !HasValue; }\\n }\\n \\n private Maybe([AllowNull] T value)\\n {\\n _value = value;\\n }\\n \\n public static implicitoperatorMaybe<T>([AllowNull] T value)\\n {\\n return new Maybe<T>(value);\\n }\\n}\\n
As you can see, the input values for the Maybe class are marked with AllowNull attribute. That tells our null guard weaver that it shouldn’t emit null checks for these particular parameters.
\\nWith Maybe, we can write the following code:
\\nMaybe<Customer> customer = _repository.GetById(id);\\n
And it now becomes obvious that the GetById method can return a null value. From now, we can reason about the code without stepping into it!
\\nMoreover, you now can’t accidentally mess up a nullable value with a non-nullable one, that would lead to a compiler error:
\\nMaybe<Customer> customer = _repository.GetById(id);\\nProcessCustomer(customer);// Compiler error\\nprivate void ProcessCustomer(Customer customer)\\n{\\n // Method body\\n}\\n
Of course, you need to consciously decide what assemblies should be weaved. It probably won’t be a good idea to enforce those rules in WFP presentation layer as there are a lot of system components that are inherently nullable. In such environment, null checks just won’t add any value because you can’t do anything with those nulls.
\\nAs for domain assemblies, it totally makes sense to introduce such enhancement for them. Moreover, they would benefit from such approach the most.
\\nOne little note about Maybe monad. You would probably want to name it Option because of F# language naming conventions. I personally prefer to entitle it Maybe but I would say there is 50/50 distribution between programmers who name it Maybe and the ones who prefer Option name. Of course, it’s just a matter of taste.
\\nOkay, fast feedback in run-time is good but it is still only a run-time feedback. It would be great if there would be a way to somehow analyze the code statically and provide the feedback even faster, at compile time.
\\nThere is such a way: Resharper’s Code Annotations. You can use NotNull attribute to mark method’s parameters and return values as non-nullable. That allows Resharper to raise a warning if you pass a null to a method which parameters are not allowed to be null.
\\nWhile such approach can be a pretty helpful assistant, it suffers from several problems.
\\nFirst of all, in order to state that a parameter can’t be null, you should take an action, i.e. mark it with an attribute. It would be better to apply a reversed technique: mark a parameter only if you want it to be nullable. In other words, make non-nullability default and opt-out parameters if necessary just as we do with NullGuard.
\\nSecondly, warning is only a warning. Of course, we could set up \\"warning as an error\\" settings in Visual Studio, but still, use of Maybe monad leaves much less wiggle room for potential bugs as it prevents us from illegal use of non-nullable types.
\\nThat’s why, although Code Annotations are very useful in some cases, I personally tend to not using them.
\\nApproach described above is a really powerful one.
\\nIt helps to reduce the amount of bugs by providing fast feedback if a null value unexpectedly sneaks in.
\\nIt significantly increases code readability. You don’t need to step into the method to find out whether or not it can return a null value.
\\nThe null checks are there by default meaning that all of your methods and properties are null-safe unless you specify otherwise. It makes code much cleaner as you don’t need to specify NotNull attributes all over your code base.
\\nNext time, we’ll discuss how to handle exceptions in a functional way. Stay tuned.
\\nFunctional C#: Non-nullable reference types
\\nThe topic described in this article is a part of my Applying Functional Principles in C# Pluralsight course.
\\nThis is the second article in my Functional C# blog post series.
\\nFunctional C#: Primitive obsession
\\nPrimitive obsession stands for using primitive types to model domain. For example, this is how Customer class might look like in a typical C# application:
\\npublic class Customer\\n{\\n public string Name { get; private set; }\\n public string Email { get; private set; }\\n \\n public Customer(string name, string email)\\n {\\n Name = name;\\n Email = email;\\n }\\n}\\n
The problem here is that when you want to enforce validation rules specific for your domain, you inevitably end up putting validation logic all over your source code:
\\npublic class Customer\\n{\\n public string Name { get; private set; }\\n public string Email { get; private set; }\\n \\n public Customer(string name, string email)\\n {\\n // Validate name\\n if (string.IsNullOrWhiteSpace(name) || name.Length > 50)\\n throw new ArgumentException(\\"Name is invalid\\");\\n \\n // Validate e-mail\\n if (string.IsNullOrWhiteSpace(email) || email.Length > 100)\\n throw new ArgumentException(\\"E-mail is invalid\\");\\n if (!Regex.IsMatch(email, @\\"^([\\\\w\\\\.\\\\-]+)@([\\\\w\\\\-]+)((\\\\.(\\\\w){2,3})+)$\\"))\\n throw new ArgumentException(\\"E-mail is invalid\\");\\n \\n Name = name;\\n Email = email;\\n }\\n \\n public void ChangeName(string name)\\n {\\n // Validate name\\n if (string.IsNullOrWhiteSpace(name) || name.Length > 50)\\n throw new ArgumentException(\\"Name is invalid\\");\\n \\n Name = name;\\n }\\n \\n public void ChangeEmail(string email)\\n {\\n // Validate e-mail\\n if (string.IsNullOrWhiteSpace(email) || email.Length > 100)\\n throw new ArgumentException(\\"E-mail is invalid\\");\\n if (!Regex.IsMatch(email, @\\"^([\\\\w\\\\.\\\\-]+)@([\\\\w\\\\-]+)((\\\\.(\\\\w){2,3})+)$\\"))\\n throw new ArgumentException(\\"E-mail is invalid\\");\\n \\n Email = email;\\n }\\n}\\n
Moreover, the exact same validation rules tend to get into the application layer:
\\n[HttpPost]\\npublic ActionResultCreateCustomer(CustomerInfo customerInfo)\\n{\\n if (!ModelState.IsValid)\\n return View(customerInfo);\\n \\n Customer customer = new Customer(customerInfo.Name, customerInfo.Email);\\n // Rest of the method\\n}\\npublic class CustomerInfo\\n{\\n [Required(ErrorMessage = \\"Name is required\\")]\\n [StringLength(50, ErrorMessage = \\"Name is too long\\")]\\n public string Name { get; set; }\\n \\n [Required(ErrorMessage = \\"E-mail is required\\")]\\n [RegularExpression(@\\"^([\\\\w\\\\.\\\\-]+)@([\\\\w\\\\-]+)((\\\\.(\\\\w){2,3})+)$\\", \\n ErrorMessage = \\"Invalid e-mail address\\")]\\n [StringLength(100, ErrorMessage = \\"E-mail is too long\\")]\\n public string Email { get; set; }\\n}\\n
Apparently, such approach breaks DRY principle which claims the need for a single source of truth. That means that you should have a single authoritative source for each piece of domain knowledge in your software. In the example above, there are at least 3 of them.
\\nTo get rid of primitive obsession, we need to introduce two new types which could aggregate all the validation logic that is spread across the application:
\\npublic class Email\\n{\\n private readonlystring _value;\\n \\n private Email(string value)\\n {\\n _value = value;\\n }\\n \\n public static Result<Email> Create(string email)\\n {\\n if (string.IsNullOrWhiteSpace(email))\\n return Result.Fail<Email>(\\"E-mail can\'t be empty\\");\\n \\n if (email.Length > 100)\\n return Result.Fail<Email>(\\"E-mail is too long\\");\\n \\n if (!Regex.IsMatch(email, @\\"^([\\\\w\\\\.\\\\-]+)@([\\\\w\\\\-]+)((\\\\.(\\\\w){2,3})+)$\\"))\\n return Result.Fail<Email>(\\"E-mail is invalid\\");\\n \\n return Result.Ok(new Email(email));\\n }\\n \\n public static implicitoperatorstring(Email email)\\n {\\n return email._value;\\n }\\n \\n public override bool Equals(object obj)\\n {\\n Email email = obj asEmail;\\n \\n if (ReferenceEquals(email, null))\\n return false;\\n \\n return _value == email._value;\\n }\\n \\n public override int GetHashCode()\\n {\\n return _value.GetHashCode();\\n }\\n}\\npublic class CustomerName\\n{\\n public static Result<CustomerName> Create(string name)\\n {\\n if (string.IsNullOrWhiteSpace(name))\\n return Result.Fail<CustomerName>(\\"Name can\'t be empty\\");\\n \\n if (name.Length > 50)\\n return Result.Fail<CustomerName>(\\"Name is too long\\");\\n \\n return Result.Ok(new CustomerName(name));\\n }\\n \\n // The rest is the same as in Email\\n}\\n
The beauty of this approach is that whenever validation logic (or any other logic attached to those classes) changes, you need to change it in one place only. The fewer duplications you have, the fewer bugs you get, and the happier your customers become!
\\nNote that the constructor in Email class is closed so the only way to create one is by using the Create method which does all the validations needed. By doing this, we make sure that an Email instance is in a valid state from the very beginning and all its invariants are met.
\\nThis is how the controller can use those classes:
\\n[HttpPost]\\npublic ActionResultCreateCustomer(CustomerInfo customerInfo)\\n{\\n Result<Email> emailResult = Email.Create(customerInfo.Email);\\n Result<CustomerName> nameResult = CustomerName.Create(customerInfo.Name);\\n \\n if (emailResult.Failure)\\n ModelState.AddModelError(\\"Email\\", emailResult.Error);\\n if (nameResult.Failure)\\n ModelState.AddModelError(\\"Name\\", nameResult.Error);\\n \\n if (!ModelState.IsValid)\\n return View(customerInfo);\\n \\n Customer customer = new Customer(nameResult.Value, emailResult.Value);\\n // Rest of the method\\n}\\n
The instances of Result<Email> and Result<CustomerName> explicitly tell us that the Create method may fail and if it does, we can know the reason by examining the Error property.
\\nThis is how Customer class can look like after the refactoring:
\\npublic class Customer\\n{\\n public CustomerName Name { get; private set; }\\n public Email Email { get; private set; }\\n \\n public Customer(CustomerName name, Email email)\\n {\\n if (name == null)\\n throw new ArgumentNullException(\\"name\\");\\n if (email == null)\\n throw new ArgumentNullException(\\"email\\");\\n \\n Name = name;\\n Email = email;\\n }\\n \\n public void ChangeName(CustomerName name)\\n {\\n if (name == null)\\n throw new ArgumentNullException(\\"name\\");\\n \\n Name = name;\\n }\\n \\n public void ChangeEmail(Email email)\\n {\\n if (email == null)\\n throw new ArgumentNullException(\\"email\\");\\n \\n Email = email;\\n }\\n}\\n
Almost all of the validations have been moved to Email and CustomerName classes. The only checks that are left is null checks. They still can be pretty annoying, but we’ll get to know how to handle them in a better way in the next article.
\\nSo, what benefits do we get by getting rid of primitive obsession?
\\nWe create a single authoritative knowledge source for every domain problem we solve in our code. No duplications, only clean and dry code.
\\nStronger type system. Compiler works for us with doubled effort: it is now impossible to mistakenly assign an email to a customer name field, that would result in a compiler error.
\\nNo need to validate values passed in. If we get an object of type Email or CustomerName, we are 100% sure that it is in a correct state.
\\nThere’s one detail I’d like point out. Some people tend to wrap and unwrap primitive values multiple times during a single operation:
\\npublic void Process(string oldEmail, string newEmail)\\n{\\n Result<Email> oldEmailResult = Email.Create(oldEmail);\\n Result<Email> newEmailResult = Email.Create(newEmail);\\n \\n if (oldEmailResult.Failure || newEmailResult.Failure)\\n return ;\\n \\n string oldEmailValue = oldEmailResult.Value;\\n Customer customer = GetCustomerByEmail(oldEmailValue);\\n customer.Email = newEmailResult.Value;\\n}\\n
Instead of doing it, it is better to use custom types across the whole application unwrapping them only when the data leaves the domain boundaries, i.e. is being saved in database or rendered to HTML. In your domain classes, try to use them as much as possible. It would result in a cleaner and more maintainable code:
\\npublic void Process(Email oldEmail, Email newEmail)\\n{\\n Customer customer = GetCustomerByEmail(oldEmail);\\n customer.Email = newEmail;\\n}\\n
Unfortunately, custom types creation in C# is not as neat as in functional languages like F#. That probably will be changed in C# 7 if we get record types and pattern matching, but until that moment we need to deal with overall clunkiness of that approach.
\\nBecause of that, I find some really simple primitives not worth being wrapped. For example, money amount with the single invariant stating that the amount can’t be negative probably could still be represented as decimal. That would lead to some validation logic duplication, but - again - that is probably a simpler design decision even in a long run.
\\nAs usual, appeal to a common sense and weight pros and cons in every single situation. And don’t hesitate to change your mind, even multiple times.
\\nWith immutable and non-primitive types, we are getting closer to designing applications in C# in a functional way. Next time, I’ll show how to mitigate the billion dollar mistake.
\\nFunctional C#: Primitive obsession
\\nThe topic described in this article is a part of my Applying Functional Principles in C# Pluralsight course.
\\nI’m starting a series of articles in which I want to show how to program in C# in a more functional way.
\\nFunctional C#: Immutability
\\nThe biggest problem of enterprise software development is code complexity. Code readability is probably the first goal you should try to achieve on your way to building a software project. Without it, you are not able to make a qualified judgment about the correctness of your software or at least your ability to reason about it is significantly reduced.
\\nDo mutable objects increase or reduce code readability? Let’s look at an example:
\\n// Create search criteria\\nvar queryObject = new QueryObject<Customer>(name, page: 0, pageSize: 10);\\n \\n// Search customers\\nIReadOnlyCollection<Customer> customers = Search(queryObject);\\n \\n// Adjust criteria if nothing found\\nif (customers.Count == 0)\\n AdjustSearchCriteria(queryObject, name);\\n \\n// Is queryObject changed here?\\nSearch(queryObject);\\n
Has the query object been changed by the time we search customers for the second time? Maybe yes. But maybe not. It depends on whether or not we found anything for the first time and on whether or not AdjustSearchCriteria method changed the criteria. To find out what exactly happened, we need to look at the AdjustSearchCriteria method code. We can’t know it for sure by just looking at the method signature.
\\nNow compare it to the following code:
\\n// Create search criteria\\nvar queryObject = new QueryObject<Customer>(name, page: 0, pageSize: 10);\\n \\n// Search customers\\nIReadOnlyCollection<Customer> customers = Search(queryObject);\\n \\nif (customers.Count == 0)\\n{\\n // Adjust criteria if nothing found\\n QueryObject<Customer> newQueryObject = AdjustSearchCriteria(queryObject, name);\\n Search(newQueryObject);\\n}\\n
It is now clear that AdjustSearchCriteria method creates new criteria that are used to perform a new search.
\\nSo, what are the problems with mutable data structures?
\\nIt is hard to reason about the code if you don’t know for sure whether or not your data is changed.
\\nIt is hard to follow the flow if you need to look not only at the method itself, but also at the methods it calls.
\\nIf you are building a multithreaded application, following and debugging the code becomes even harder.
\\nIf you have a relatively simple class, you should always consider making it immutable. This rule of thumb correlates with the notion of Value Objects: value objects are simple and easily made immutable.
\\nSo how do we build immutable types? Let’s take an example. Let’s say we have a class named ProductPile representing a bunch of products we have for sale:
\\npublic class ProductPile\\n{\\n public string ProductName { get; set; }\\n public int Amount { get; set; }\\n public decimal Price { get; set; }\\n}\\n
To make it immutable, we need to mark its properties as read-only and create a constructor:
\\npublic class ProductPile\\n{\\n public string ProductName { get; private set; }\\n public int Amount { get; private set; }\\n public decimal Price { get; private set; }\\n \\n public ProductPile(string productName, int amount, decimal price)\\n {\\n Contracts.Require(!string.IsNullOrWhiteSpace(productName));\\n Contracts.Require(amount >= 0);\\n Contracts.Require(price > 0);\\n \\n ProductName = productName;\\n Amount = amount;\\n Price = price;\\n }\\n}\\n
Let’s say we need to reduce the product amount by one when we sell one of the items. Instead of changing the existing object we need to create a new one based on the current:
\\npublic class ProductPile\\n{\\n public string ProductName { get; private set; }\\n public int Amount { get; private set; }\\n public decimal Price { get; private set; }\\n \\n public ProductPile(string productName, int amount, decimal price)\\n {\\n Contracts.Require(!string.IsNullOrWhiteSpace(productName));\\n Contracts.Require(amount >= 0);\\n Contracts.Require(price > 0);\\n \\n ProductName = productName;\\n Amount = amount;\\n Price = price;\\n }\\n \\n public ProductPileSubtractOne()\\n {\\n return new ProductPile(ProductName, Amount - 1, Price);\\n }\\n}\\n
So what do we get here?
\\nWith immutable class, we need to validate its code contracts only once, in the constructor.
\\nWe absolutely sure that objects are always in a correct state.
\\nObjects are automatically thread-safe.
\\nThe code’s readability is increased as there’s no need to step into the methods for making sure they don’t change anything.
\\nOf course, everything comes at a price. While small and simple classes benefit from immutability the most, such approach is not always applicable to larger ones.
\\nFirst of all, there are performance issues attached. If your object is quite big, necessity to create a copy of it with every single change may hit the performance of your application.
\\nA good example here is immutable collections. Their authors took into account potential performance problems and added Builder class that allows to mutate the collection. After the preparation is done, you can finalize it converting to an immutable collection:
\\nvar builder = ImmutableList.CreateBuilder<string>();\\nbuilder.Add(\\"1\\"); // Adds item to the existing object\\nImmutableList<string> list = builder.ToImmutable();\\nImmutableList<string> list2 = list.Add(\\"2\\"); // Creates a new object with 2 items\\n
Another issue is that some classes are inherently mutable and trying to make them immutable brings more problems than solves.
\\nBut don’t let these issues keep you from creating immutable data types. Consider pros and cons of every design decision and always take common sense into account.
\\nIn most cases, you will be able to benefit from immutability, especially when you keep your classes small and simple.
\\nFunctional C#: Immutability
\\nThe use of exceptions for flow control was raised quite a few times already (here’s a c2 discussion and here is a great question on SO). I’d like to summarize this topic and provide some common use cases along with code examples to handle them.
\\nGenerally, code is read more often than written. Most of the best practices aim to simplify understanding and reasoning about the code: the simpler code, the fewer bugs it contains, and the easier it becomes to maintain the software.
\\nThe use of exceptions for program flow control hides the programmer’s intention, that is why it is considered a bad practice.
\\npublic void ProcessItem(Item item)\\n{\\n if (_knownItems.Contains(item))\\n {\\n // Do something\\n throw new SuccessException();\\n }\\n else\\n {\\n throw new FailureException();\\n }\\n}\\n
It is hard to reason about ProcessItem function because you can’t say what the possible results are just by looking at its signature. Such approach makes you look at the source code and thus breaks method’s encapsulation. Moreover, it violates the principle of least astonishment by throwing an exception even in case of success.
\\nIn this particular example the solution is obvious - return boolean value instead of throwing exceptions - but let’s dive deeper and look at more complex use cases.
\\nPerhaps, the most common practice for exceptions is to use them to state that input validation is failed.
\\npublic class EmployeeController : Controller\\n{\\n [HttpPost]\\n public ActionResult CreateEmployee(string name, int departmentId)\\n {\\n try\\n {\\n ValidateName(name);\\n Department department = GetDepartment(departmentId);\\n \\n // Rest of the method\\n }\\n catch (ValidationException ex)\\n {\\n // Return view with error\\n }\\n }\\n \\n private void ValidateName(string name)\\n {\\n if (string.IsNullOrWhiteSpace(name))\\n throw new ValidationException(\\"Name cannot be empty\\");\\n \\n if (name.Length > 100)\\n throw new ValidationException(\\"Name length cannot exceed 100 characters\\");\\n }\\n \\n private Department GetDepartment(int departmentId)\\n {\\n using (EmployeeContext context = new EmployeeContext())\\n {\\n Department department = context.Departments\\n .SingleOrDefault(x => x.Id == departmentId);\\n \\n if (department == null)\\n throw new ValidationException(\\"Department with such Id does not exist\\");\\n \\n return department;\\n }\\n }\\n}\\n
Apparently, such approach has some benefits: it allows you to quickly \\"return\\" from any method right to the catch block in the CreateEmployee method.
\\nNow, let me demonstrate you another code sample:
\\npublic static Employee FindAndProcessEmployee(IList<Employee> employees, string taskName)\\n{\\n Employee found = null;\\n \\n foreach (Employee employee in employees)\\n {\\n foreach (Task task in employee.Tasks)\\n {\\n if (task.Name == taskName)\\n {\\n found = employee;\\n goto M1;\\n }\\n }\\n }\\n \\n // Some code\\n \\n M1:\\n found.IsProcessed = true;\\n \\n return found;\\n}\\n
What do these two code examples have in common? Yep, both of them allows you to easily break through the code to the point where you need to appear leaving irrelevant code paths behind.
\\nThe only problem with such code is that it drastically decreases readability. Both of the code samples make it really difficult to follow the program flow. That is why many developers often equalize exceptions with the \\"goto\\" statement.
\\nWith exceptions, it is unclear where exactly they are being caught: you can wrap up your validation routine with a try/catch statement right when you call it but you can also put a try/catch block couple of levels higher. You can never know for sure if it was made intentionally or not:
\\npublic Employee CreateEmployee(string name, int departmentId)\\n{\\n // Is it a bug or the method was placed without try/catch intentionally?\\n ValidateName(name);\\n \\n // Rest of the method\\n}\\n
The only way to find it out is to analyze the whole call stack. Exceptions used for validation make your code much less readable because they don’t express the developer’s intention clear enough. It’s impossible to look at the code and say what can go wrong and how we react to it.
\\nIs there a better solution? Of course!
\\n[HttpPost]\\npublic ActionResult CreateEmployee(string name, int departmentId)\\n{\\n if (!IsNameValid(name))\\n {\\n // Return view with error\\n }\\n \\n if (!IsDepartmentValid(departmentId))\\n {\\n // Return view with another error\\n }\\n \\n Employee employee = new Employee(name, departmentId);\\n // Rest of the method\\n}\\n
Putting all the validations explicitly makes your intention obvious. It’s much easier to see what is going on in this method.
\\nSo what are the use cases for exceptions? The main goal of exceptions is - surprise! - to state an exceptional situation in your software. Exceptional situation is a situation where you don’t know what to do and the best option you have is terminate the current operation entirely.
\\nExamples of exceptional situations may include database connectivity problems, lack of required configuration files and so on. Validation isn’t such a situation because validation logic, by its definition, expects incoming data to be incorrect (see my previous post).
\\nAnother valid use case for exceptions is code contract violation. You, as the class author, expect its clients to meet the code contracts. The situation in which method’s contract is not getting met is not expected, or, in other words, is exceptional.
\\nWhether or not a situation is exceptional depends on a context. A developer of a redistributable library might not know how to deal with database connectivity problems because he can’t see the context in which his library is being used.
\\nThere’s not much of what the library developer can do in case database is unavailable so throwing an exception would be an appropriate decision. You can take Entity Framework or NHibernate as an example: they expect that database is always available and if it’s not, they throw an exception.
\\nOn the other hand, the developer who uses the library might expect that the database goes off-line from time to time and design their application for database failure. If the database fails, the client application can retry the same operation or display user a message with a suggestion to try again later.
\\nThus, a situation may be exceptional from a bottom level point of view and at the same time be expected from a client code perspective. How should we deal with exceptions thrown by libraries in such case?
\\nSuch exceptions should be caught at the lowest level possible. If they are not, your code will suffer the same drawbacks as the \\"goto\\" code sample: it won’t be possible to know where a particular exception is being processed without analysing the whole call stack.
\\npublic void CreateCustomer(string name)\\n{\\n Customer customer = new Customer(name);\\n bool result = SaveCustomer(customer);\\n \\n if (!result)\\n {\\n MessageBox.Show(\\"Error connecting to the database. Please try again later.\\");\\n }\\n}\\n \\nprivate bool SaveCustomer(Customer customer)\\n{\\n try\\n {\\n using (MyContext context = new MyContext())\\n {\\n context.Customers.Add(customer);\\n context.SaveChanges();\\n }\\n return true;\\n }\\n catch (DbUpdateException ex)\\n {\\n return false;\\n }\\n}\\n
As you can see at the sample above, SaveCustomer method expects problems with database and intentionally catches all such errors. It returns a boolean value that states the operation status which then is processed by the calling method.
\\nSaveCustomer method has a clear signature which tells us that there can be problems with saving a customer, that we expect them and that you should check the returning value in order to make sure that everything went fine.
\\nThere’s a widely known best practice that correlates with all said really well: you shouldn’t wrap such code with a generic exception handler. Generic exception handler basically states that you expect any exceptions that could possibly be thrown and that just can’t be a case.
\\nIf you do expect some exceptions, you do it for a very limited range of exception types that you know you can safely handle. Putting a generic exception handler leads to situations where you swallow unexpected exceptions often leaving your application in an inconsistent state.
\\nThere’s one situation where generic exception handlers are applicable, though. You can put one at the topmost level to catch all exceptions that were not handled by your code in order to log them. You shouldn’t in any way try to handle them. All you can do is log the exception and gracefully shut the application down.
\\nHow often do you see a code like this?
\\npublic bool CreateCustomer(int managerId, string addressString, string departmentName)\\n{\\n try\\n {\\n Manager manager = GetManager(managerId);\\n Address address = CreateAddress(addressString);\\n Department department = GetDepartment(departmentName);\\n \\n CreateCustomerCore(manager, address, department);\\n return true;\\n }\\n catch (Exception ex)\\n {\\n _logger.Log(ex);\\n return false;\\n }\\n}\\n
This is an example of incorrect use of generic exception handler. It implies that any exception coming from the method body signalize an error in the process of customer creation. But what is the problem with such approach?
\\nThe problem (besides the \\"goto\\" issues we discussed earlier) is that an exception might not be an exception we are concerned about. The exception could be of type ArgumentException which we expect, but it also could be ContractViolationException and if it is, then we are hiding a bug by pretending that we know how to deal with such exception.
\\nSometimes such approach is applied by developers who are willing to protect their software from unexpected failures. But the truth is that this solution masks any potential bugs in your software making it hard to reveal them.
\\nThe best way to deal with unexpected exceptions is to discard the current operation completely. If your application is stateless (for example, if you are running a background job), you can just restart the process because in this case you don’t have any data corrupted.
\\nBut if your application has a state or there are any chance it could be left with corrupted data, you should log the exception details and crash the process preventing inconsistent behavior from expanding.
\\nLet’s recap:
\\nThrow an exception to state an unexpected situation in your software.
\\nUse return values for input validation.
\\nIf you know how to deal with exceptions a library throws, catch them at the lowest level possible.
\\nIf you have an unexpected exception, discard current operation completely. Don’t pretend you know how to deal with them.
\\nInput validation rules are often taken for code contracts. In this post, I’ll try to cover their differences and show what their common use cases are.
\\nWhat is the purpose of having a code contract, namely preconditions, invariants and postconditions? The idea is closely related to the Fail Fast principle: the faster you notice an unexpected behavior, the quicker you fix it.
\\nIn the vast majority of cases, it is much more efficient to crash the application rather than try to automatically recover from an error because you never know what the error is and how it can affect your application. Allowing an application to continue running after a bug have occurred may corrupt its persistence state.
\\nSo, what are contract preconditions and how do they differ from input validation checks? The key point here is that a precondition violation always states that there’s a bug in the client code. Invalid input data, on the other hand, does not indicate a bug in your system.
\\nThat is the main point; all the other guidelines and best practices grow from that statement. Let’s dive deeper and try to draw a line between the two.
\\nYou can think of contract preconditions as a protective shield placed inside of your code to ensure that everything goes fine. On the other part, input validation is a shield placed to defend you against the outside world:
\\nRed signals represent invalid interactions (or data) coming from users or other applications; green signals stand for valid interactions. Your goal as a developer is to make sure that no invalid data can reach your code.
\\nIf a red signal appears to be inside of your system, then either you didn’t filter input data well or your own code generates them in some cases.
\\nIn either case, it is a bug and it is better to locate this bug as soon as possible. That’s when code contracts show up. They allow you to stop red signals from spreading across the application and quickly find out the cause of the bug:
\\nWhen your application has a rich set of contract preconditions (and, preferably, postconditions and invariants), invalid interactions are doomed. They are being caged as soon as they appear making it extremely easy to debug and fix the code generating them.
\\nThat said, input validation is a mechanism designed to protect your system from invalid data infiltration. Such validation makes no assumptions about the data coming in. That means the data is allowed to be invalid and that itself is a valid situation.
\\nIndeed, if a user enters \\"Fifteen\\" in a numeric field, you don’t want your system to crash. Instead, you want it to politely inform the user of the error.
\\nOn the contrary, contract preconditions do assume that the data inside of your system is in a correct state. If it’s not then there is a bug in your system.
\\nNow, when I hope we made it clear what is the difference between contract preconditions and input validation, let’s talk about what can be considered as a contract and why.
\\nA code contract is a public agreement proposed by the service code. It says that if the client follows some rules - preconditions - then the service guarantees to provide some results described in postconditions.
\\nThat leads to the following attributes every contract precondition should have:
\\nThey should be public. That is, every client developer is able to get to know them before writing a line of code.
\\nThey should be easy to check meaning that client developers are not needed to write complex algorithms to emulate them before calling a method.
\\nThey should not rely on non-public state because that will restrict client code in its ability to check them.
\\nThey should be stable. That is, a precondition validation result shouldn’t depend on volatile class members.
\\nLet’s go through these points. The first one is quite simple; it means that the preconditions should be somehow described so that the client developer can read them before using the method. The best way to do it is to explicitly specify them at the top of the method’s code:
\\npublic string GetName(int position)\\n{\\n Contracts.Require(position, x => x >= 0);\\n \\n // Rest of the method\\n}\\n
The second one means that service class shouldn’t make client developers write complex calculations in order to comply with the contract. If you do need a complex contract, provide a separate method that can be used by your clients to check the preconditions:
\\npublic int Devote(int amount)\\n{\\n Contracts.Require(() => CanDevote(amount));\\n \\n return DevoteCore(amount);\\n}\\n
The next point follows from that one: you can’t make your preconditions depend on non-public methods or state, otherwise your clients won’t be able to comply with them.
\\nThe last one stands for contract stability. If a class’s preconditions depend on volatile variables - for example, file existence - then the clients won’t be able to do anything to meet them:
\\npublic string ReadFile(string filePath)\\n{\\n Contracts.Require(() => File.Exists(filePath));\\n \\n // Rest of the method\\n}\\n
In this code example, file might have been deleted or inaccessible and your client can’t do anything about it. That check is not a contract precondition because it doesn’t mean there’s a bug in the client code. Thus, you shouldn’t introduce it as a precondition.
\\nIt is still an exceptional situation and you can use regular exceptions to deal with it:
\\npublic string ReadFile(string filePath)\\n{\\n if (!File.Exists(filePath))\\n throw new ArgumentException();\\n \\n // Rest of the method\\n}\\n
Contract preconditions and input validation have similar but still different purpose. While input validation ensures that your system is protected from the outside world, contract preconditions guard your code inside of your system and help you quickly locate and fix bugs in it.
\\nMost of the development principles are applicable to any software you might develop. Nevertheless, there are some differences between building a reusable library and an enterprise application. Those differences often become sticking points as we try to apply experience gained in one type of project to projects of the other type.
\\nThe differences between shared library and enterprise development grow from differences in requirements and lifetime support cycle.
\\nA typical shared library, by its definition, has a lot of dependents that you can’t control. That means that libraries face much stronger backward compatibility requirements then typical enterprise development does.
\\nWith an in-house software development, you can change, add or even remove public interfaces without bothering how it would affect the clients. All you need to do is change the clients along with the services. And when I write \\"interface\\" I mean interface in a broad sense: every public method in every public class is a part of the service’s interface.
\\nWith a library development, you need to support your clients in such a way that they can use the latest version of your library without (or with minimal) changes in their code. Failing to do so may lead to decrease of your customers\' loyalty, and thus, to shortening your user base.
\\nAnother important difference is deployment frequency. You can’t ship a new version of your library every week. Even if you do, your clients won’t update it so frequently.
\\nThat makes some of the practices (for example, continuous delivery) used in enterprise world completely useless for a library development.
\\nUsers of your library typically can’t change the source code. Even if it’s open source, the effort required to change something in it is much larger.
\\nNot only do you have to download, change and rebuild it, but you have to maintain the change all over the library’s lifespan, or at least until you decide to not use it anymore.
\\nPull request are not always accepted, so in order to keep the change, the client will need to apply it every time a new version comes out.
\\nThe points described lead to some significant differences in the development process.
\\nWhen you develop a library, you need to adhere to the following rule: once published, interfaces cannot be changed. That is, to support older versions of your clients, you should keep the previous implementation even if you have a better one. This rule is also known as Open-Closed principle as it was described by Bertrand Meyer in his Object-Oriented Software Construction book.
\\nTo introduce a change in a published interface, you should create a new interface that will reside side-by-side with the old one. The obsolete methods may be marked with an attribute so that the clients will get a warning telling them they should switch to newer version:
\\nAs you can’t force your clients to update their code other than by breaking backward compatibility, the obsolete interfaces may remain in your library for a long time, maybe forever.
\\nProbably the most valuable principle in software development is the one called You Are Not Gonna Need It. Its basic idea is that you shouldn’t waste your time on features that you are not sure will be required and focus on the functionality your users need right here and right now.
\\nThis principle lies on an empirical observation that in most cases, you can’t guess what the requirements should be and you will have to rewrite your feature anyway when the requirements become clear.
\\nWhile this approach is an advisable way to develop an in-house software, you shouldn’t adhere to it with library development. With library development, you should always think at least a little bit ahead of current needs because users of your library won’t be able to update it as frequently as you might want them to.
\\nTo illustrate it, I’ll give a simple example. Let’s assume that there is no DateTime class in .Net BCL so you are building a utility class to help you to work with dates. In your code, you need to find out what date will be in some number of days.
\\nYou write something like the following:
\\npublic class DateTime\\n{\\n public DateTime AddDays(double days) { /* Implementation */ }\\n}\\n
If you develop an enterprise software and follow YAGNI principle, you just stop where you are and don’t implement anything else until you really need to. But with library development, you should think if there are any other similar functions your users might find useful. Of course, you can’t just implement everything your users might ever want. You still need to keep a balance, but the balance here is shifted to the \\"think ahead\\" part of the spectrum.
\\nIn the example above, you can’t just ship the library with this method along. There definitely are some other methods users will need if they utilize such kind of functionality:
\\npublic class DateTime\\n{\\n public DateTime AddDays(double days) { /* Implementation */ }\\n public DateTime AddMonths(double days) { /* Implementation */ }\\n public DateTime AddYears(double days) { /* Implementation */ }\\n public DateTime AddHours(double days) { /* Implementation */ }\\n public DateTime AddMinutes(double days) { /* Implementation */ }\\n public DateTime AddSeconds(double days) { /* Implementation */ }\\n}\\n
Another example is the use of properties. Technically, there is no difference between this class:
\\npublic class MyClass\\n{\\n public string Name { get; set; }\\n}\\n
and this class:
\\npublic class MyClass\\n{\\n public string Name;\\n}\\n
With enterprise development, if you ever need to encapsulate Name field (so that the assignment will, for example, notify subscribers about the change), you can just turn the field into a property:
\\npublic class MyClass\\n{\\n private string _name;\\n public string Name\\n {\\n get { return _name; }\\n set\\n {\\n _name = value;\\n Notify();\\n }\\n }\\n}\\n
Everything will work perfectly because the clients of this code will be recompiled together with the class itself.
\\nOn the other hand, if you develop a library, such approach will not work. Turning field into property breaks backward compatibility: the clients depending on this code will have to be recompiled in order to use the new version of the library. And that is not always an option. To comply with possible future changes, a library should use public properties instead of public fields from the very beginning.
\\nI hope you can see now that you can’t and shouldn’t equally apply YAGNI to both enterprise and library development.
\\nDefensive programming is a programming style aimed to predict any potential error in the code. In practice, it basically means validating the input and output parameters:
\\npublic void ProcessReport(string reportId, string userId)\\n{\\n if (reportId != null)\\n throw new ArgumentException(\\"reportId\\");\\n if (userId != null)\\n throw new ArgumentException(\\"userId\\");\\n \\n User user = _userRepository.GetById(userId);\\n Report report = _reportRepository.GetById(userId);\\n \\n if (user == null)\\n throw new ArgumentException(\\"User is not found\\");\\n if (report == null)\\n throw new ArgumentException(\\"Report is not found\\");\\n \\n report.Process(user);\\n}\\n
It’s a good practice to use such approach when you develop a library. Your clients won’t be able to step through your code (or at least it would be hard for them to do so) and debug it in case something goes wrong. That’s why it’s a good idea to provide them with as complete description as possible.
\\nOn the other hand, defensive programming brings a lot of complexity as you have to write much more code to maintain the validations. With an in-house development, you can keep the contract validation and just remove the other checks altogether:
\\npublic void ProcessReport(string reportId, string userId)\\n{\\n if (reportId != null)\\n throw new ArgumentException(\\"reportId\\");\\n if (userId != null)\\n throw new ArgumentException(\\"userId\\");\\n \\n User user = _userRepository.GetById(userId);\\n Report report = _reportRepository.GetById(userId);\\n \\n report.Process(user);\\n}\\n
There’s little difference in whether you get an ArgumentException with a nice description or just a NullReferenceException. In both cases, they will be thrown at the same line of code. And you will also have a nice clue of what is going on in both cases.
\\nWhen you develop enterprise software, there’s no need to stick to defensive programming as it doesn’t pay off.
\\nYou should always keep in mind what type of software you develop. Principles that are applicable to an in-house software development, don’t fit library development.
\\nDon’t stick to YAGNI with a library as you will need to think at least a little bit ahead; practice defensive programming as you need to provide as many details as possible if something goes wrong.
\\nThe opposite is also true. Do stick to YAGNI with an enterprise development and don’t pay too much attention to defensive style if you don’t have external clients.
\\nNowadays, notion of composition over inheritance is quite widely accepted. It basically means that when designing a software, you should prefer composition to inheritance, although you could use either one.
\\nBut what if several classes do have some common attributes? Do you need to extract a base class for them?
\\nIn OOP, inheritance stands for \\"is-a\\" relation. That is, a class A can be treated as a sub-class of a class B if A *is* a B in a way that makes sense for our particular domain.
\\nThat means that your decision of whether or not to create a subclass should be based on their semantics only. You could work on a bounded context in which two classes naturally relate to each other. But the same two classes can have a completely different meaning in another bounded context, and thus, cannot be related.
\\nWhat you definitely shouldn’t do is make a decision based purely on the members classes have. Let’s look at an example:
\\npublic class NamedObject : Entity\\n{\\n public string Name { get; set; }\\n}\\n \\npublic class Network : NamedObject\\n{\\n // Other properties\\n}\\n \\npublic class Node : NamedObject\\n{\\n // Other properties\\n}\\n
Here, NamedObject is introduced to store the Name property as both Network and Node classes have it. That is a classic example of utility inheritance - a concept I encourage you completely avoid. The only reason to introduce a new class here is \\"normalizing\\" classes\' field set by pulling their common properties up to the base class.
\\nSuch inheritance is not a \\"real\\" inheritance because it doesn’t contain any domain knowledge. What does it mean to be a NamedObject in the domain you are working on? Does your domain really have such concept? When you talk to your domain experts, do you name Network and Node \\"NamedObject\\"? Do the domain experts use such term?
\\nThat is also true for interfaces:
\\npublic interface INamedObject\\n{\\n string Name { get; set; }\\n}\\n \\npublic class Network : Entity, INamedObject\\n{\\n public string Name { get; set; }\\n // Other properties\\n}\\n \\npublic class Node : Entity, INamedObject\\n{\\n public string Name { get; set; }\\n // Other properties\\n}\\n
Although there’s no additional class here, there still is a misleading concept of an entity having a role \\"NamedObject\\" which doesn’t make sense in terms of the domain.
\\nThat brings us to the following rule: if a class or interface is not a part of your ubiquitous language, you shouldn’t introduce it in your domain model. Although it might seem to be valuable in a utility sense, it most cases it is a sign of a poor design.
\\nEvery time you see such code, you should take a break and think of your model. There almost always is a way to refactor your code to get rid of the inconsistency between the code and the domain.
\\nIn the example above the reason why a new class was added was a code like this:
\\npublic string CreateMessage(Entity entity, string error)\\n{\\n string entityName;\\n if (entity isNamedObject)\\n {\\n entityName = ((NamedObject)entity).Name;\\n }\\n else\\n {\\n entityName = entity.Id.ToString();\\n }\\n \\n return \\"Error in processing an entity \\" + entityName + \\": \\" + error;\\n}\\n
Such approach clearly breaks Open-Closed principle. We can refactor this code by introducing a new method GetName() in the Entity class:
\\npublic class Entity\\n{\\n public virtual string GetName()\\n {\\n return Id.ToString();\\n }\\n // Other members\\n}\\n
Or, we can reuse the ToString() method:
\\npublic class Entity\\n{\\n public override string ToString()\\n {\\n return Id.ToString();\\n }\\n // Other members\\n}\\n
The CreateMessage() method then turns into much simpler code:
\\npublic string CreateMessage(Entity entity, string error)\\n{\\n return \\"Error in processing an entity \\" + entity.GetName() + \\": \\" + error;\\n}\\n
A mismatch between the code and the domain is a design smell. Always follow your domain model closely; if you face such issue, you should either introduce the missing term to your domain experts and see if they accept it or refactor your code.
\\nI often see developers saying that in most cases, use of IEnumerable breaks LSP. Does it? Let’s find out.
\\nThis is the continuation of my article Read-Only Collections and LSP. It this post, I’d like to discuss IEnumerable interface from a Liskov Substitution Principle (LSP) perspective.
\\nTo answer the question whether or not use of IEnumerable breaks LSP, we should step back and see what it means to break LSP.
\\nWe can say that LSP is violated if one of the following conditions are met:
\\nSubclass of a class (or, in our case, of an interface) doesn’t preserve its parent’s invariants
\\nSubclass weakened the parent’s postconditions
\\nSubclass strengthened the parent’s preconditions
\\nBefore we dive into implementations, let’s look at the interface itself. Here’s the code for IEnumerable<T>, IEnumerator<T> and IEnumerator. IEnumerable interface is essentially the same as IEnumerable<T>.
\\npublic interface IEnumerable<out T> : IEnumerable\\n{\\n IEnumerator<T> GetEnumerator();\\n}\\n \\npublic interface IEnumerator<out T> : IDisposable, IEnumerator\\n{\\n T Current { get; }\\n}\\n \\npublic interface IEnumerator\\n{\\n object Current { get; }\\n bool MoveNext();\\n void Reset();\\n}\\n
They are quite simple and doesn’t do much. Nevertheless, different BCL classes implement them differently. Perhaps, the most significant example of contradiction in IEnumerable implementation is List<T> class:
\\npublic class List<T>\\n{\\n public struct Enumerator : IEnumerator<T>\\n {\\n private List<T> list;\\n private int index;\\n private T current;\\n \\n public T Current\\n {\\n get { return this.current; }\\n }\\n \\n object IEnumerator.Current\\n {\\n get\\n {\\n if (this.index == 0 || this.index == this.list._size + 1)\\n throw new InvalidOperationException();\\n return (object)this.Current;\\n }\\n }\\n }\\n}\\n
\'Current\' property of type T does not require you to call MoveNext(), whereas \'Current\' property of type object does:
\\npublic void Test()\\n{\\n List<int>.Enumerator enumerator = new List<int>().GetEnumerator();\\n int current = enumerator.Current; // Returns 0\\n object current2 = ((IEnumerator)enumerator).Current; // Throws exception\\n}\\n
Reset() method is also implemented differently. While List<T>.Enumerator.Reset() conscientiously moves to the beginning of the list, iterators don’t implement it at all, so the following code fails:
\\npublic void Test()\\n{\\n Test2().Reset(); // Throws NotSupportedException\\n}\\n \\nprivate IEnumerator<int> Test2()\\n{\\n yield return 1;\\n}\\n
It turns out that the only thing we can be sure in is that IEnumerable<T>.GetEnumerator() method returns a non-null enumerator. Other methods guarantee nothing to us. A class implementing IEnumerable interface can be an empty set:
\\nprivate IEnumerable<int> Test2()\\n{\\n yield break;\\n}\\n
As well as an infinite sequence of elements:
\\nprivate IEnumerable<int> Test2()\\n{\\n Random random = new Random();\\n while (true)\\n {\\n yield return random.Next();\\n }\\n}\\n
And that is not a made-up example. BlockingCollection implements IEnumerator in such a way that calling thread is blocked on MoveNext() method until some other thread places an element into the collection:
\\npublic void Test()\\n{\\n BlockingCollection<int> collection = new BlockingCollection<int>();\\n IEnumerator<int> enumerator = collection.GetConsumingEnumerable().GetEnumerator();\\n bool moveNext = enumerator.MoveNext(); // The calling thread is blocked\\n}\\n
In other words, IEnumerable interface gives no promises about the underlying element set; it doesn’t even guarantee that the set is finite. All it tells us is that it can somehow step through the elements (enumerate them).
\\nSo, does use of IEnumerable interface break LSP? No. It is actually an invalid question because you can’t break LSP by using an interface, whatever this interface is.
\\nEvery interface has some essential preconditions, postconditions and invariants (although, they may not be specified explicitly). Client code using an interface can violate one of the preconditions, but it can’t override them and thus cannot violate LSP. Only classes that implement the interface can break this principle.
\\nThat brings us to the next question: do implementations of IEnumerable break LSP? Consider the following code:
\\npublic void Process(IEnumerable<Order> orders)\\n{\\n foreach (Order order in orders)\\n {\\n // Do something\\n }\\n}\\n
In the case when orders\' underlying type is, say, List<Orders>, everything is fine: they can be easily iterated. But what if orders is actually an endless generator that pulls out a new object on every MoveNext()?
\\ninternal class OrderCollection : IEnumerable<Order>\\n{\\n public IEnumerator<Order> GetEnumerator()\\n {\\n while (true)\\n {\\n yield return new Order();\\n }\\n }\\n}\\n
The Process method will fail apparently. But is that because the OrderCollection class breaks LSP? No. OrderCollection class religiously follows the IEnumerable contracts: it provides an Order instance every time it is asked to.
\\nThe problem is that the Process method expects more than the IEnumerable interface promises. There’s no guarantee that the underlying orders class is a finite collection. As I mentioned earlier, \'orders\' can be an instance of BlockingCollection class, which makes trying to get all the elements out of it completely useless.
\\nTo avoid the problem, you can simply change the incoming parameter’s type to ICollection<T>. Unlike IEnumerable interface, ICollection provides Count property which guarantees that the underlying collection is finite.
\\nUse of ICollection has some other drawbacks, through. ICollection allows changing its elements, which is often undesirable when you want to introduce a read-only collection. Before version 4.5 of .Net, IEnumerable was often used for that purpose.
\\nWhile it might seem a good decision, it puts too much of restriction on the interface’s consumers.
\\npublic int GetTheTenthElement(IEnumerable<int> collection)\\n{\\n return collection.Skip(9).Take(1).SingleOrDefault();\\n}\\n
The code above shows one of the most common approaches developers use: using LINQ to avoid IEnumerable limitations. Although this code is simple, it has an obvious drawback: it iterates through the collection ten times whereas the same result can be achieved by just accessing an element by index.
\\nThe solution is obvious - use IReadOnlyList instead:
\\npublic int GetTheTenthElement(IReadOnlyList<int> collection)\\n{\\n if (collection.Count < 10)\\n return 0;\\n return collection[9];\\n}\\n
There’s no reason to continue using IEnumerable interface in places where you expect the element set to be countable (and you do expect it in most cases). IReadOnlyCollection<T> and IReadOnlyList<T> interfaces introduced in .Net 4.5 make it a lot easier.
\\nWhat about the implementations of IEnumerable that do break LSP? Let’s look at example where IEnumerable’s underlying type is DbQuery<T>. We could get it in the following way:
\\nprivate IEnumerable<Order> FindByName(string name)\\n{\\n using (MyContext db = new MyContext())\\n {\\n return db.Orders.Where(x => x.Name == name);\\n }\\n}\\n
There’s an obvious problem with this code: the database call is being postponed until you start iterating through the results. But, as the database context is already closed, the call yields an exception:
\\npublic void Process(IEnumerable<Order> orders)\\n{\\n foreach (Order order in orders) // Exception: DB connection is closed\\n {\\n // Do something\\n }\\n}\\n
This implementation violates LSP because IEnumerable itself doesn’t have any preconditions that require you to keep a database connection open. You should be able to iterate through IEnumerable regardless of whether or not there’s is one. As we can see, DbQuery class strengthens IEnumerable’s preconditions and thus breaks LSP.
\\nI must say that it isn’t necessarily a sign of a bad design. Lazy evaluation is a common approach while dealing with database calls. It allows you to execute several calls in a single roundtrip and thus increase your overall system’s performance. Of course, that comes at a price which in this case is breaking one of the design principles.
\\nWhat we see here is a trade-off made by architects. They consciously decided to sacrifice some readability for performance benefits. And, of course, the problem can be easily avoided by forcing DbQuery to evaluate the database call:
\\nprivate IEnumerable<Order> FindByName(string name)\\n{\\n using (MyContext db = new MyContext())\\n {\\n return db.Orders\\n .Where(x => x.Name == name)\\n .ToList(); // Forces EF to evaluate the query and put the results into memory\\n }\\n}\\n
Use of IEnumerable interface was and still is a common way to deal with collections. But be conscious: in most cases, IReadOnlyCollection and IReadOnlyList will fit you much better.
\\nDid you think about what traits make developers great? Which one is the most valuable for the companies they are working for?
\\nWhile there might be quite a few of them, I believe there’s one that employers value the most. I also believe getting that characteristic can drastically increase your value as a software developer.
\\nThere’s one thing you should always remember when you are working for somebody. The main purpose of your job is to bring value to your employer.
\\nAlthough it seems simple, it’s something that most developers tend to forget about. Think about it. How often did you push a new framework just because it was new and sexy JavaScript MVC framework? How often did you pick a task not because it was the most important task but because you knew it would be fun to accomplish it?
\\nLet’s make it clear, we all did it. Picking a task I liked to work on probably was the most common thing I tended to do a few years ago. It’s also one of the most prevalent things I see other developers do.
\\nIt’s easy to forget about it, especially in a large company where your efforts are not noticeable much. But don’t let yourself go astray. The concept of providing value to your employer can hardly be overestimated.
\\nWhy is it so important? The thing is that the more value you provide to your employer, the more valuable you are on the market. No one wants a senior developer that doesn’t solve problems. Or solves only problems that seem interesting. On the other hand, a less experienced developer with high motivation and clear intention to help the employer increase their income will be highly valuable.
\\nWhat you are doing at the work is not designing or coding. It is delivering value to your employer. You should always keep that in mind. That mind shift is crucial for increasing your own value as a software developer. It also leads to some best practices I want to point out.
\\nEvery time you are going to complete a task, step back and think what value this particular task is about to bring? Maybe there are some other ways to do it? Or maybe this particular task shouldn’t be done at all? To answer these questions, you’ll have to have deep knowledge of your problem domain. Not only should you invest your time in gaining technical expertise, but you also should dive into the domain you are working in as deep as you can.
\\nWithout it, you simply won’t be able to do your job fast enough. Business analysts can help you with that, but they can’t substitute your own domain knowledge. Moreover, as business analysts don’t know what the actual implementation of their ideas is they need your opinion to make the right decision. But your opinion would be qualified enough only if you have a solid domain knowledge. Every great developer I worked with had this trait: they always tried to become experts in the domain they worked in.
\\nThis one is often hard to follow. Most developers became developers because they tend to enjoy programming. It’s perfectly fine to desire to work on fun and interesting features but don’t let yourself misinterpret your own enjoyment with the client’s needs.
\\nTasks that are interesting and fun to do are not always tasks that deliver the most value to your employer. Therefore, spending your time on such tasks is like reading Hacker News: it is pleasant and comfortable, but it doesn’t increase your value as a software developer.
\\nI often see developers trying to convince product owner that a particular feature would be great to implement only because this feature is easy, or fun, or exciting to implement. Be honest with yourself; always try to put yourself in the employer’s position.
\\nJob security is something you should completely avoid sticking to. It might seem controversial, but it naturally flows from the previous point.
\\nIf you perform tasks that have the most value for your employer, you can find yourself in a situation where you are no longer needed. That is the best result you can achieve at work, of course if you are not got fired. It means that the problem you were working on is successfully solved or reduced to a level of complexity which can be handled by ops guys.
\\nSuch cases increase your value enormously. Moreover, they give you feedback you can use in further projects to deliver even more value and thus become more valuable on the market.
\\nOn the other hand, job security often means lack of progress. If you’ve been working on the same project for years and this project doesn’t actually go anywhere in terms of providing value to your client, then your team most likely did a bad job. You should avoid placing yourself in such situation because it inevitably leads to stagnation. The job itself might seem a good place to work at because you don’t have to put a lot of effort to get your regular pay check. But after several years you will eventually find yourself on the market, and then it might turn out that other companies value your skills less than you expected.
\\nBuilding job security through sticking to a single employer at any cost doesn’t benefit in a long run.
\\nIncreasing your client’s income is a win-win strategy: the more value you deliver, the more valuable you become.
\\nMicroservices have got a lot of traction last year. It’s always interesting to read about success stories other people have; they tend to inspire you to try this new trend out in your own project.
\\nHowever, there are several traps you can fall into if you follow this trend without deep understanding of its fundamentals. Today, I’ll share some bad practices I saw one particular company used on its way to adopt microservices architecture.
\\nOne of the most important attributes of microservices is product team responsibility. That is, the team that builds a microservice has to own it and take full responsibility for it.
\\nThe company I write about had a separate architect instructing teams how they should implement their microservices. His daily job was designing architecture for new features, deciding what 3rd party libraries and products they should use and so on. Those teams then implemented these decisions in code.
\\nSuch approach has a devastating effect. Developers don’t feel themselves responsible for the product they develop. Moreover, as the architect tries to manage multiple teams, he doesn’t have enough time to dive into development process. It leads to breaking the feedback loop as the architect doesn’t spend enough time on the product to analyse how his decisions affect the code. Such approach actually has its own name given by Joel Spolsky: hit and run micromanagement.
\\nCan this situation be fixed? Sure. The architect should delegate the right of making decisions to the teams. Instead of trying to be a lead developer for each of them, the architect has to become an internal consultant. That is, giving advices about how functionality may be implemented, he shouldn’t insist that his opinion is the one they must follow. The team has to make the final decision by its own.
\\nSometimes, micromanagement approach is justified by stating that the developers aren’t qualified enough to make such decisions. The problem with this justification is that an external architect isn’t able to make qualified decisions either. No matter how smart or technically advanced the external architect is, he isn’t able to do it because he can’t see the whole picture. In order to make right decisions, the architect has to deeply dive into the problem domain, start working with code and talking to customers. And when he does all of this, he is no longer external.
\\nThe second practice I faced at that company is having an external DBA. Similar to the external architect, that DBA took full control of how database should look like in each of the products. A microservice team had to receive an approval for every non-trivial change in database it was going to make. Moreover, developers couldn’t even backup, restore or profile their database because they didn’t have required permissions. For each of those, they had to ask the administrator.
\\nOf course, the DBA didn’t dive into the development either, so the resulting database structure was rarely optimal. Without day-to-day work with the product, an external DBA can’t make qualified decisions about how the database should look like, no matter how smart this person is.
\\nNeedless to say, developers should have the full control for the database they use. There can be a separate DBA, though, but his role should be restricted to being a consultant. More preferably, a microservice team should have a developer that has experience of being a DBA. But, again, this person should be a part of the microservice team.
\\nThe third practice I consider bad was how shared libraries were used. The company had its own set of utility libraries which was propagated to the teams. The problem was that most of the developers didn’t have access to their source code. In order to find out how a library works, the developers had to decompile it. Moreover, if the library contained a bug, the process of fixing it could last for ages.
\\nOne of the core practices in software development is code reuse. But to make this process clear and straightforward, this code should be placed in shared repositories which every team member should have access to. Also, every developer should be able to send a pull request which then should be accepted if, of course, it passes a code review.
\\nAs shared libraries are used by every microservice team, no single team should have an exclusive right of controlling them. Teams should share responsibility for such libraries.
\\nAlso, shared code should contain utility logic only. Some companies tend to share domain logic which completely breaks one of the main purposes of microservices: setting up clear boundaries between contexts.
\\nAnother practice the company had is preferring SKDs over APIs. While it’s a common practice for vendors like Amazon to distribute libraries for their APIs, this approach doesn’t make a lot of sense in microservices environment.
\\nFirstly, if you really want to make a good use of the benefits microservices architecture introduces (like using different platforms for different products), you will have to produce SDK libraries for every platform you use. In practice, it leads to sticking to one major platform that every team has to use. But even after that some developers - for example, mobile developers - are left abundant because there’s no simple way to use the same SDK on their platform.
\\nLarge vendors ship their SDKs to gain a competitive advantage over their rivals. They really want to lower barrier of entry to get more developers using their products.
\\nYou don’t have to do it if you are developing an in-house software. Moreover, you shouldn’t do it. SDKs introduce another level of complexity, making microservice team support not only API to their product but also the code that uses that API. It’s like common backward comparability problems multiplied by two.
\\nIn most cases, API can be made simple enough to use it without any SDK library. Leave your developers freedom to choose how they want to use it.
\\nHaving a dedicated DBA led to using the same database instance for each of the products in that company. The database very quickly became a mess; there were dozens of tables pertaining to different products. Some of them were dead already but no one deleted them because everyone was afraid of corrupting someone else’s product.
\\nThis practice, of course, broke one of the main principles of microservices architecture: organization around business features. Every microservice should have its own database instance.
\\nThe company tended to outsource some of the microservices to external companies. The reason for that was that the internal developers didn’t have required experience with the technology the company was going to use.
\\nWhile it’s totally fine to have distributed teams, outsourcing the whole product is a bad idea because after product ends, the knowledge of how to use that technology will leave with the external team.
\\nA better practice would be getting some members of that team to side-by-side work with your developers. That way, your developers will quickly adopt the required experience.
\\nMicroservices architecture can bring you a competitive advantage. But in order to get it, you should implement microservices very carefully. Every aspect of your development process should be weighed consciously, don’t fall into cargo cult programming.
\\nI’ve already written about base entity class. Today, I’d like to continue with Value Object base class I use in my projects. Also, I’ll share some best practices regarding Value Objects implementation.
\\n\\n\\n\\n\\nAn object that represents a descriptive aspect of the domain with no conceptual identity is called a Value Object. Value Objects are instantiated to represent elements of the design that we care about only for what they are, not who or which they are.
\\n\\n\\n\\n
In other words, value objects don’t have their own identity. That is, we don’t care if a value object is the same physical object we had before or another one with the same set of properties. Value Objects are completely interchangeable.
\\nFor example, a dime coin would most likely represent a value object. It doesn’t matter what exact piece of metal you have, they all are just 10 cent coins. On the other hand, you may build a system which responsibility is to track every coin produced by a mint. In this case, you do care about which coin a person has because all of them have a unique identity.
\\nConsequently, whether a class is a value object is dictated by your domain and use cases.
\\nLet’s specify attributes pertain to Value Objects.
\\nNo identity. A corollary of value objects\' identity-less nature is, obviously, not having an Id property. Unlike entities, value objects should be compared by value, not by identity field.
\\nImmutability. As any value object can be replaced by another value object with the same property set, it’s a good idea to make them immutable to simplify working with them, especially in multithread scenarios. Instead of changing an existing value object, just create a new one.
\\nLifetime shortening. It’s another corollary of the value objects\' identity-less nature. They can’t exist without a parent entity owning them. In other words, there always must be a composition relationship between a Value Object class and an Entity class. Without it, value objects don’t make any sense.
\\nLifetime shortening leads to another consequence: Value Objects should not have separate tables in database. This one is an attribute programmers find the most controversial.
\\nDevelopers, especially if they have wide experience with relational databases, tend to store Value Objects in separate tables (or, in case of NoSQL databases, in separate collections). Situation gets worse if there are more than one Entity class owning a Value Object.
\\nFor example, both Company and User entities might have a property referring to Address value object. The very first impulse could be extracting the fields concerning Address from Company and User tables to a separate Address table and storing references to it instead. I consider this approach as a bad practice. Not only does it make composition relationship more complex as you have to maintain consistency of two tables instead of one, but also gives developers a false assumption about the Address’s nature. As the Address table must have an Id column, it’s easy to mistake it for an Entity, despite its sole purpose of being a Value Object.
\\nThere’s an implementation for Value Object base class made by Jimmy Bogard I often see developers copy. In short, it allows you to extract equality logic to the base class so that you don’t have to implement it in each Value Object separately. Its Equals() and GetHashCode() methods use reflection to gather information about the fields of a type and perform comparison or object’s hash code calculation.
\\nAlthough this implementation might be a good solution for a quick scaffolding, it suffers from a fundamental flaw. Equality logic implementation should be a conscious decision. Every Value Object has its own unique property set and comparison strategy. By extracting this logic to a base class, you actually say that all Value Objects are just bags of data with the same behavior, which is not true. It’s worth nothing for a Value Object to have properties that don’t take part in equality logic. It’s too easy to forget to override Equals() and GetHashCode() in such cases.
\\nWell, what should it look like then? Here’s the code I use in my projects:
\\npublic abstract class ValueObject<T>\\n where T : ValueObject<T>\\n{\\n public override bool Equals(object obj)\\n {\\n var valueObject = obj as T;\\n \\n if (ReferenceEquals(valueObject, null))\\n return false;\\n \\n return EqualsCore(valueObject);\\n }\\n \\n protected abstract bool EqualsCore(T other);\\n \\n public override int GetHashCode()\\n {\\n return GetHashCodeCore();\\n }\\n \\n protected abstract int GetHashCodeCore();\\n \\n public static bool operator ==(ValueObject<T> a, ValueObject<T> b)\\n {\\n if (ReferenceEquals(a, null) && ReferenceEquals(b, null))\\n return true;\\n \\n if (ReferenceEquals(a, null) || ReferenceEquals(b, null))\\n return false;\\n \\n return a.Equals(b);\\n }\\n \\n public static bool operator !=(ValueObject<T> a, ValueObject<T> b)\\n {\\n return !(a == b);\\n }\\n}\\n
Note that there’s no Id property because, as discussed earlier, Value Objects don’t have their own identity. Also, you might notice that the class doesn’t implement IEquatable<> interface. Instead, EqualsCore() is introduced. With IEquatable, you will have to write null-checking code both in Equals(object obj) and Equals(T obj) methods. Moreover, as Equals(T obj) is made abstract, you will have to copy null checks to all of the ValueObject’s subclasses. By making EqualsCore protected, you can get rid of such checks; all you should do is extract them to the Equals method of your base class.
\\nLet’s take a look at how Address value object could be implemented:
\\npublic class Address : ValueObject<Address>\\n{\\n public virtual string Street { get; protected set; }\\n public virtual int ZipCode { get; protected set; }\\n public virtual string Comment { get; protected set; }\\n \\n public Address(string street, int zipCode, string comment)\\n {\\n Contracts.EnsureNotNull(street);\\n \\n Street = street;\\n ZipCode = zipCode;\\n Comment = comment;\\n }\\n \\n protected override bool EqualsCore(Address other)\\n {\\n return Street == other.Street && ZipCode == other.ZipCode;\\n }\\n \\n protected override int GetHashCodeCore()\\n {\\n return (ZipCode.GetHashCode() * 397) ^ Street.GetHashCode();\\n }\\n}\\n
As you can see, Comment property doesn’t participate in EqualsCore() and GetHashCodeCore() methods because it’s just a user-created note; it can’t affect equality of addresses.
\\nAlso, note that properties are made read-only to comply with immutability requirement.
\\nStrictly speaking, there’s no relation between Value Objects and .NET value types because the first is a design concept and the latter is a technical implementation detail.
\\nHowever, there are a lot of similarity in these notions. Structs in .NET are inherently immutable because they are always passed by value. Also, they are much cheaper than reference types in terms of system resources. Look at DateTime struct from the BCL. It has a clear Value Object semantics: it is immutable and doesn’t have any identity fields.
\\nIn theory, you could use .NET value types as Value Objects. However, I wouldn’t recommend it. First of all, structs don’t support inheritance, so you will have to implement equality operators in every struct separately, which will lead to massive code duplication. Also, ORMs often don’t handle mapping to structs well. That said, I recommend you to always use classes for Value Objects.
\\nValue Object is an important DDD concept. You should clearly show which of your domain classes is an Entity and which is a Value Object by inheriting them from Entity and ValueObject<> respectively.
\\nHow often do you see code like this in your domain model?
\\npublic void Ship(int orderId, int customerId, string address)\\n{\\n Shipment shipment = _existingShipments.Single(x => x.OrderId == orderId);\\n if (shipment.CustomerId == customerId)\\n {\\n // Do something\\n }\\n}\\n
Seems pretty good, doesn’t it? Well, it doesn’t. I’ve already pointed using Ids in domain entities as a bad practice, but I see developers - even sophisticated ones - write such code over and over again, so this topic definitely deserves a separate article.
\\nOne of the main DDD principles is separation of concerns. You should isolate your domain model from non-domain logic as fully as possible to avoid complexity overhead. That is especially true for domain entities because they are heart of your application.
\\nIds are essentially persistence logic implementation details; they have no relation to your domain. For example, you could use composite primary key instead of single-column key, but the meaning of the code would remain the same:
\\n// Single-column key\\nif (shipment1.Id == shipment2.Id)\\n{\\n // The shipments match\\n}\\n \\n// Multi-column key\\nif (shipment1.Id1 == shipment2.Id1 && shipment1.Id2 == shipment2.Id2)\\n{\\n // The shipments match\\n}\\n
In addition to poor separation of concerns, the use of Ids breaks entities\' encapsulation (aka Law of Demeter, aka Tell Don’t Ask):
\\n// Seems nice\\nif (shipment1.Id == shipment2.Id)\\n{\\n}\\n \\n// Still not bad\\nif (shipment1.Customer.Id == shipment2.Customer.Id)\\n{\\n}\\n \\n// Well, maybe that\'s enough\\nif (shipment1.Customer.Address.Id == shipment2.Customer.Address.Id)\\n{\\n}\\n \\n// Stop it!\\nif (shipment1.Customer.Address.City == shipment2.Customer.Address.City\\n && shipment1.Customer.Address.Street == shipment2.Customer.Address.Street\\n && shipment1.Customer.Address.State == shipment2.Customer.Address.State\\n && shipment1.Customer.Address.Post == shipment2.Customer.Address.Post)\\n{\\n}\\n
The last statement has an obvious design smell: it violates entities\' encapsulation. The first (and the other two) statement has essentially the same drawback, the only difference between them is size of the code.
\\nYou don’t need Ids to operate your domain objects. In most cases, all you need is define equality members in the base entity class so that you could rewrite your code like this:
\\nif (shipment1 == shipment2)\\n{\\n // The shipments match\\n}\\n
Or:
\\nif (shipment1.Equals(shipment2))\\n{\\n // The shipments match\\n}\\n
Also, Ids in domain entities often indicate a hidden abstraction:
\\npublic void Deliver(\\n int fromAddressId,\\n int toAddressId,\\n int orderId,\\n int customerId)\\n{\\n if (_existingShipments.Any(x =>\\n x.FromAddressId == fromAddressId &&\\n x.ToAddressId == toAddressId &&\\n x.OrderId == orderId &&\\n x.CustomerId == customerId))\\n {\\n // Attach to existing shipment\\n }\\n else\\n {\\n // Create new one\\n }\\n}\\n
In this example, Ids can be encapsulated in a separate entity:
\\npublic void Deliver(Shipment shipment)\\n{\\n if (_existingShipments.Contains(shipment))\\n {\\n // Attach to existing shipment\\n }\\n else\\n {\\n // Create new one\\n }\\n}\\n
Such approach allows to keep complexity under control hiding all of the implementation details underneath of a separate entity.
\\nI must point out an important note, though. All stated above refers to domain entities only. You can - and should - use Ids in infrastructure and application services, because Ids are natural for objects identification. Such application services operate on a different level of abstraction: they need Ids to map domain entities to database tables, to identify a web page requested by user and so on; they don’t contain any domain logic.
\\nIds in domain entities is a design smell. In most cases, they indicate poor entity encapsulation. If you want proper separation of concerns, you should reduce the number of Ids in your domain entities to as low as possible.
\\nHeavy Ids usage is common for anemic model. There’s nothing wrong in it if you have a simple CRUD-like application. But if your application is large and complex, you should definitely choose rich model instead of anemic one, and thus, don’t use Ids in your domain classes.
\\nA long overdue clarification:
\\nI’d like to discuss some common pitfalls of async/await feature in C# and provide you with workarounds for them.
\\nThe internals of async/await feature are very well described by Alex Davies in his book, so I will only briefly explain it here. Consider the following code example:
\\npublic asyncTask ReadFirstBytesAsync(string filePath1, string filePath2)\\n{\\n using (FileStream fs1 = new FileStream(filePath1, FileMode.Open))\\n using (FileStream fs2 = new FileStream(filePath2, FileMode.Open))\\n {\\n await fs1.ReadAsync(new byte[1], 0, 1); // 1\\n await fs2.ReadAsync(new byte[1], 0, 1); // 2\\n }\\n}\\n
This function reads the first bytes from two files passed in (I know, it’s quite a synthetic example). What would happen at \\"1\\" and \\"2\\" lines? Will they execute simultaneously? No. What will happen is this function will actually be split by \\"await\\" keyword in three pieces: the part before the \\"1\\" line, the part between \\"1\\" and \\"2\\" lines and the part after the \\"2\\" line.
\\nThe function will create a new I/O bound thread at the line \\"1\\", pass it the second part of itself (which is between \\"1\\" and \\"2\\" lines) as a callback and return the control to the caller. After the I/O thread completes, it calls the callback, and the method continues executing. It creates the second I/O thread at the line \\"2\\", passes the third part of itself as a callback and returns again. After the second I/O thread completes it calls the rest part of the method.
\\nThe magic is here because of the compiler rewriting the code of the method marked as async to a state machine, just like it does with iterators.
\\nThere are two major cases in which using async/await feature is preferred.
\\nFirst of all, it can be used in thick clients to deliver better user experience. When a user presses a button starting a heavy operation, it’s better to perform this operation asynchronously without locking the UI thread. Such logic required a lot of effort to be implemented before .NET 4.5 had been released. Here how it can look now:
\\nprivate asyncvoid btnRead_Click(object sender, EventArgs e)\\n{\\n btnRead.Enabled = false;\\n \\n using (FileStream fs = new FileStream(\\"File path\\", FileMode.Open))\\n using (StreamReader sr = new StreamReader(fs))\\n {\\n Content = await sr.ReadToEndAsync();\\n }\\n \\n btnRead.Enabled = true;\\n}\\n
Note that Enabled flag is changed by the UI thread in both cases. This approach removes the necessity of such ugly code:
\\nif (btnRead.InvokeRequired)\\n{\\n btnRead.Invoke((Action)(() => btnRead.Enabled = false));\\n}\\nelse\\n{\\n btnRead.Enabled = false;\\n}\\n
In other words, all the \\"light\\" code is executed by the called thread, whereas \\"heavy\\" code is delegated to a separate thread (I/O or CPU-bound). This approach allows us to significantly reduce the amount of work required to synchronize access to UI elements as they are always managed by the UI thread only.
\\nSecondly, async/await feature can be used in web applications for better thread utilization. ASP.NET MVC team have done a lot of work making asynchronous controllers easy to implement. You can just write an action like the following and ASP.NET will do all of the rest work.
\\npublic class HomeController : Controller\\n{\\n public asyncTask<string > Index()\\n {\\n using (FileStream fs = new FileStream(\\"File path\\", FileMode.Open))\\n using (StreamReader sr = new StreamReader(fs))\\n {\\n return await sr.ReadToEndAsync(); // 1\\n }\\n }\\n}\\n
At the example above the worker thread executing the method starts a new I/O thread at the line \\"1\\" and returns to the thread pool. After the I/O thread finishes working, a new thread from the thread pool is picked up to continue method execution. Hence, CPU-bound threads from the thread pool are utilized more economically.
\\nIf you are developing a third-party library, it is always vital to configure await in such a way that the rest of the method will be executed by an arbitrary thread from the thread pool. First of all, third party libraries (if they are not UI libraries) usually don’t work with UI controls, so there’s no need to bind the UI thread. You can slightly increase performance by allowing CLR to execute your code by any thread from the thread pool. Secondly, by using the default implementation (or explicitly writing ConfigureAwait(true)), you leave a hole for possible deadlocks. Moreover, client code won’t be able to change this implementation. Consider the following example:
\\nprivate asyncvoid button1_Click(object sender, EventArgs e)\\n{\\n int result = DoSomeWorkAsync().Result; // 1\\n}\\n \\nprivate asyncTask<int > DoSomeWorkAsync()\\n{\\n awaitTask.Delay(100).ConfigureAwait(true); //2\\n return 1;\\n}\\n
A button click leads to a deadlock here. The UI thread starts a new I/O thread at \\"2\\" and falls to sleep at \\"1\\", waiting for I/O work to be completed. After the I/O thread is done, it dispatches the rest of the DoSomeWorkAsync method to the thread the method was called by. But that thread is waiting for the method completion. Deadlock.
\\nASP.NET will behave the same way, because although ASP.NET doesn’t have a single UI thread, the code in controllers\' actions can’t be executed by more than one thread simultaneously.
\\nOf course, you could use await keyword instead of calling the Result property to avoid the deadlock:
\\nprivate asyncvoid button1_Click(object sender, EventArgs e)\\n{\\n int result = awaitDoSomeWorkAsync();\\n}\\n \\nprivate asyncTask<int > DoSomeWorkAsync()\\n{\\n awaitTask.Delay(100).ConfigureAwait(true);\\n return 1;\\n}\\n
But there is still one case where you can’t avoid deadlocks. You can’t use async methods in ASP.NET child actions, because they are not supported. So you will have to access the Result property directly and will get a deadlock if the async method your controller calls didn’t configure Awater properly. For example, if you do something like the following and the SomeAction action calls an async method’s Result property and that method is not configured by ConfigureAwait(false) statement, you’ll get a deadlock.
\\n@Html.Action(\\"SomeAction\\", \\"SomeController\\")\\n
Clients of your library won’t be able to change its code (unless they decompile it), so always put ConfigureAwait(false) in your async methods.
\\nLook at the example:
\\nprivate asyncvoid button1_Click(object sender, EventArgs e)\\n{\\n btnRead.Enabled = false;\\n string content = awaitReadFileAsync();\\n btnRead.Enabled = true;\\n}\\n \\nprivate Task<string > ReadFileAsync()\\n{\\n return Task.Run(() => // 1\\n {\\n using (FileStream fs = new FileStream(\\"File path\\", FileMode.Open))\\n using (StreamReader sr = new StreamReader(fs))\\n {\\n return sr.ReadToEnd(); // 2\\n }\\n });\\n}\\n
Is this code asynchronous? Yes. Is it a correct way to write asynchronous code? No. The UI thread starts a new CPU-bound thread at \\"1\\" and returns. The new thread then starts a new I/O thread at \\"2\\" and falls to sleep waiting for its completion.
\\nSo, what happens here? Instead of creating just an I/O thread we create both CPU thread at \\"1\\" and I/O thread at \\"2\\". It’s a waste of threads. To fix it, we need to call the async version of the Read method:
\\nprivate Task<string > ReadFileAsync()\\n{\\n using (FileStream fs = new FileStream(\\"File path\\", FileMode.Open))\\n using (StreamReader sr = new StreamReader(fs))\\n {\\n return sr.ReadToEndAsync();\\n }\\n}\\n
Here is another example:
\\npublic void SendRequests()\\n{\\n _urls.AsParallel().ForAll(url =>\\n {\\n var httpClient = new HttpClient();\\n httpClient.PostAsync(url, new StringContent(\\"Some data\\"));\\n });\\n}\\n
Looks like we are sending requests in parallel, right? Yes, we are, but there’s the same issue we had before: instead of creating just an I/O thread we create both I/O and CPU-bound threads for every request. To fix the code we need to use Task.WaitAll method:
\\npublic void SendRequests()\\n{\\n IEnumerable<Task > tasks = _urls.Select(url =>\\n {\\n var httpClient = new HttpClient();\\n return httpClient.PostAsync(url, new StringContent(\\"Some data\\"));\\n });\\n Task.WaitAll(tasks.ToArray());\\n}\\n
Well, it depends. Sometimes it’s impossible to do it, sometimes it brings too much of complexity. For example, NHibernate doesn’t implement asynchronous data fetching. EntityFramework, on the other hand, does, but there might not be a lot of sense in using it in some cases. You should always consider pros and cons of every design decision.
\\nAlso, thick clients (like WPF or WinForms) usually don’t have a lot of load, so there’s actually no difference in choosing one approach over another. But anyway, you should always know what is happening under the cover so you could make a conscious decision in every single case.
\\nMicrosoft has released async/await feature in .Net 4.5. It’s a really great stuff as it significantly simplifies one of the most painful areas - asynchronous programming. Before that, Task Parallel Library (TPL) and Parallel LINQ (PLINQ) were released in .Net 4.0. They address problems with parallel programming - another painful area in .Net.
\\nI often see programmers struggling with a question when to use each of these features. Let’s step back and recall what does it actually mean to be asynchronous or parallel.
\\nSo, is there any difference between these two notions? Yes, there is. You can think of parallel programming as a subset of asynchronous programming: every parallel execution is asynchronous, but not every asynchronous execution is parallel.
\\nLet’s take an example.
\\nWhen someone calls you, you usually pick up your phone and answer immediately, leaving your current tasks. It is a synchronous communication.
\\nWhen someone sends you an email, you can postpone the answer, say, to evening, and continue working on your current tasks. It is an asynchronous communication.
\\nWhen someone calls you, you can answer while working on your current tasks at the same time. It’s a parallel communication.
\\nReturning to programming, the difference between synchronous and asynchronous execution is that in case of synchronous execution, the thread leaves its current task and starts working on a new task immediately. On the other hand, with asynchronous execution the thread continues working on its current task.
\\nThe difference between asynchronous and parallel execution is that with asynchronous execution you don’t always need another thread to execute a new task. The new task can be executed after the thread is done with its current task. Until that moment, this new task waits in a queue.
\\nTo use asynchronous and parallel features of the .NET properly, you should also understand the concept of I/O threads.
\\nNot everything in a program consumes CPU time. When a thread tries to read data from a file on disk or sends a TCP/IP packet through network, the only thing it does is delegate the actual work to a device - disk or network adapter - and wait for results.
\\nIt’s very expensive to spend threads\' time on waiting. Even through threads sleep and don’t consume CPU time while waiting for the results, it doesn’t really pay off because it’s a waste of system resources. Every thread holds about 2Mb of memory for stack variables, local storage and so on. Also, the more threads you have, the more time it takes to switch among them.
\\nIt’s much more efficient for threads to send a command to device and ask to ping them back after the work is done. Threads shouldn’t spend their time on sleeping.
\\nThink about the analogy. Let’s say there’s a cook making a dinner for a lot of people. He or she certainly has a gas-stove or a toaster or a human assistant. The cook’s time costs more than toaster’s, stove’s or even assistant’s time, so it would be better if the cook doesn’t waste their time on sitting and waiting for a toast to be made. Or for the assistant to return from a store. It’d be more efficient for them to put the cake in the oven and return back after the bell, cooking other stuff meanwhile. A single cook can be very productive in such a way.
\\nI/O thread is an abstraction intended to hide work with devices behind a simple and familiar concept. The main point here is that you don’t have to work with those devices in a different way, you can think of the pipeline inside them just like it’s a usual CPU consuming thread. At the same time, I/O threads are extremely cheap in comparison with CPU-bound threads, because, in fact, they are merely requests to devices.
\\nSo, let’s summarize. When a program reads data from a file, we say it starts a new I/O thread, meaning that it actually sends a command to a hard drive. When the drive finishes reading, we say I/O thread is completed, meaning that the drive sends the data back to the program.
\\nLet’s look at the code:
\\npublic void ReadData(string filePath, int byteCount)\\n{\\n byte[] buffer = new byte[byteCount];\\n using (FileStream fs = new FileStream(filePath, FileMode.Open))\\n {\\n fs.Read(buffer, 0, byteCount); // 1\\n }\\n}\\n
I/O thread is started at the line marked as \\"1\\". The main thread falls to sleep and waits for the I/O thread to complete. After it’s done, it sends the data back to the main thread. Then the main thread wakes up and continues working.
\\nWhy mimic CPU-bound threads behavior?
\\nWhy would I/O threads mimic CPU-bound threads? Because they are really about the same. Both I/O thread and CPU-bound thread proceed some work. The only difference is that I/O thread uses resources other than CPU.
\\nLook at this code:
\\npublic void Compute()\\n{\\n Thread thread = new Thread(() => PerformComputation());\\n thread.Start(); // 1\\n thread.Join(); // 2\\n}\\n
CPU bound thread is started at the line \\"1\\". At the line \\"2\\" the calling thread falls to sleep, waiting for the inner thread to complete. After it’s done, the main thread continues working. It’s really similar to what we had in the previous code example, isn’t it?
\\nThe Task class introduced in .NET 4 allows to hide the difference between these two types of threads:
\\npublic Task ReadDataAsync(string filePath, int byteCount)\\n{\\n byte[] buffer = new byte[byteCount];\\n FileStream fs = new FileStream(filePath, FileMode.Open);\\n return fs.ReadAsync(buffer, 0, byteCount);\\n}\\n \\npublic Task ComputeAsync()\\n{\\n return Task.Run(() => PerformComputation());\\n}\\n
Now you can get the Task instance and track its state in spite of what type of work is actually being performed. In other words, Task class allows to abstract CPU-bound threads and I/O threads in a future construct.
\\nCLR via C#, Chapter 28: I/O-Bound Asynchronous Operations
\\nI/O Completion Ports: an MSDN article in describing I/O completion ports in detail
\\nLast week we compared Entity Framework and NHibernate from a DDD perspective. Today, I’d like to dive deeper into what Separation of Concerns (SoC) is and why it is so important. We’ll look at some code examples and features that break the boundaries between the domain and persistence logic.
\\nThere are several concerns we deal with in software development. In most applications, there are at least three of them clearly defined: UI, business logic and database. SoC notion is closely related to Single Responsibility Principle. You can think of SoC as SRP being applied not to a single class, but to the whole application. In most cases, these notions can be used interchangeably.
\\nIn the case of ORM, SoC is all about domain and persistence logic separation. You can say that your code base has a good Separation of Concerns if your domain entities don’t know how they are persisted and the database doesn’t contain any business logic. Of course, it’s not always possible to completely separate these concerns. Sometimes consistency and performance issues can make you break the boundaries. But you should always consider as clean separation as possible.
\\nWe can’t just isolate the domain and persistence logic, we need something that glues them together. That is where ORM comes into play. ORM allows us to map entities to appropriate database tables in such a way that neither the domain entities nor the database know about each other.
\\nThere is a lot of information about how to separate application’s concerns. But why bother? Is it really so important?
\\nKeeping different responsibilities together in a single class, you have to maintain their consistency simultaneously in every operation within this class. That quickly leads to a combinatorial explosion. Moreover, complexity grows much faster than most developers think. Every additional class responsibility increases its complexity by an order of magnitude.
\\nTo handle the complexity, we need to separate these responsibilities:
\\nSeparation of Concerns is not just a matter of good looking code. SoC is vital to development speed. Moreover, it’s vital to the success of your project.
\\nA human can hold at most nine objects in working memory. An application without properly separated concerns overwhelms developer very quickly because of a huge amount of combinations in which elements of these concerns can interact with each other.
\\nSeparating concerns into high cohesive pieces lets you \'divide and conquer\' the application you develop. It’s much easier to handle the complexity of a small, isolated component that is loosely coupled to other application’s components.
\\nLet’s look at some examples of persistence logic leaking to domain logic.
\\npublic void DoWork(Customer customer, MyContext context)\\n{\\n if (context.Entry(customer).State == EntityState.Modified)\\n {\\n // Do something\\n }\\n}\\n
The current persistence state of an object (e.g. whether this object exists in the database or not) has no relationship to the domain logic. Domain entities should operate data that pertains to business logic only.
\\npublic void DoWork(Customer customer1, Customer customer2)\\n{\\n if (customer1.Id > 0)\\n {\\n // Do something\\n }\\n if (customer1.Id == customer2.Id)\\n {\\n // Do something\\n }\\n}\\n
Dealing with Ids is probably the most frequent type of the persistence logic infiltration. Id is an implementation detail of how your entities are saved in database. If you want to compare your domain objects, just override the equality members in the base entity class and write customer1 == customer2
instead of customer1.Id == customer2.Id
.
public class Customer\\n{\\n public int Number { get; set; }\\n public string Name { get; set; }\\n \\n // Not persisted in database: can store anything here\\n public string Message { get; set; }\\n}\\n
If you tend to write such code, you should stop and think of your model again. Such code denotes that you have included some irrelevant elements in your entity. In most cases, you can refactor your model and get rid of such elements.
\\nSetting up a database for cascade deletion is one of the leak examples. The database itself should not contain any logic about when to trigger data deletion. This logic is clearly a domain concern. Your C#/Java/etc code should be the only place to keep such logic in.
\\nCreating stored procedures that mutate data in the database is another example. Don’t let your domain logic leak to database, keep the code with side effects in your domain model.
\\nI have to point two special cases out, though. Firstly, in most cases, it’s okay to create read-only stored procedures. Putting code with side effects to domain model and code with no side effects to stored procedures is perfectly aligned with the CQRS principles.
\\nSecondly, there are cases when you can’t avoid putting some domain logic in SQL statements. If, for example, you want to delete a batch of objects that fit some condition, an SQL DELETE statement would be a much faster choice. In these cases, you are better off using plain SQL, but be sure to place it with other database-specific code (for example, in repositories).
\\nDefaults in the database tables is another example of domain logic residing in the database. The values that an entity has by default should be defined in its code, it shouldn’t be given at the mercy of your database.
\\nThink about how hard it is to compile such pieces of the domain logic spread across the application. It’s much better to keep them in a single place.
\\nMost of the leaks come from thinking not in terms of the domain, but in terms of data. Many developers perceive the application they develop just like that. For them, entities are just a storage for data they transfer from the database to UI, and ORM is just a helper that allows them not to copy this data from SQL queries to C# objects manually. Sometimes, it is hard to make a mental shift. But if you do it, you will open a brand new world of expressive code models that allows for building software much faster, especially on large projects.
\\nOf course, it’s not always possible to achieve the level of separation we want. But in most cases, nothing keeps us from building a clean and cohesive model. Most failed project fail not because they can’t fulfil some non-functional requirements. Most of them are buried under a bulk of messy code that prevented developers from changing anything in it. Any change that was committed to such code led to cascade breaks all over the application.
\\nThe disaster can be avoided only by breaking your code apart. Divide and conquer. Separate and implement.
\\nI often see programmers saying that .NET read-only collections violate Liskov Substitution Principle (LSP). Do they? The quick answer is no, they don’t, because IList
interface has IsReadOnly
flag. The exception is the Array
class, it does violate LSP since version 2.0 of .NET. But let’s go through the whole story first.
Let’s look at how the read-only collection interfaces evolved in .NET (click to enlarge).
\\n\\n\\n\\nAs you can see at this diagram, the IList
interface contains the IsReadOnly
and IsFixedSize
flags. The initial idea was to split these two notions. A collection might be read-only, which means it can’t be changed in any way. Also, it might be of fixed size, which means it doesn’t allow items addition or removal, but it allows to change existing items. In other words, IsReadOnly
collections are always IsFixedSize
, but IsFixedSize
collections are not always IsReadOnly.
So, if you want to create, say, a MyReadOnlyCollection
class you’ll need to implement these properties – IsReadOnly
and IsFixedSize
– so that both of them return true. BCL didn’t have read-only collections at the time of .NET 1.0, but the BCL architects laid the foundation for future implementations. The intention was that you’d be able to work with such collections polymorphically using code like this:
public void AddAndUpdate(IList list)\\n{\\n if (list.IsReadOnly)\\n {\\n // No action\\n return;\\n }\\n\\n if (list.IsFixedSize)\\n {\\n // Update only\\n list[0] = 1;\\n return;\\n }\\n\\n // Both add and update\\n list[0] = 1;\\n list.Add(1);\\n}\\n
Of course, this is not a great way to work with collections, but it allows you to avoid exceptions without checking the actual type. Thus, this design doesn’t violate LSP. Of course, nobody does this checks when they work with IList
(neither do I). That’s why there’re so many complaints against collections saying that they violate LSP.
After generics had been introduced in .NET 2.0, BCL team got a chance to build a new version of the interface hierarchy. They did some work making collections more coherent. Besides pulling some members up to ICollection<T>
, they decided to remove IsFixedSize
flag.
It was made because Array
was the only class which needed it. Array
was the only class which forbids addition and removal of items, but allows modification of existing items. The BCL team decided that IsFixedSize
flag brought too much of complexity. The interesting thing here is that they changed implementation of IsReadOnly
flag for the Array
class so that it no longer returns the actual state of affairs:
public void Test()\\n{\\n int[] array = { 1 };\\n bool isReadOnly1 = ((IList)array).IsReadOnly; // isReadOnly1 is false\\n bool isReadOnly2 = ((ICollection<int>)array).IsReadOnly; // isReadOnly2 is true\\n}\\n
IsReadOnly is true for Array
, but it still can be changed. That is where the LSP violation takes its place. If we have a method that accepts IList<int>
, we can’t just write code like the following:
public void AddAndUpdate(IList<int> list)\\n{\\n if (list.IsReadOnly)\\n {\\n // No action\\n return;\\n }\\n\\n // Both add and update\\n list[0] = 1;\\n list.Add(1);\\n}\\n
If we pass ReadOnlyCollection<int>
object to this method, then - just as designed - no action will occur, because the collection is read-only. On the other hand, List<int>
object will be changed with both update and addition. But if we pass an array, nothing will happen because it returns true for ICollection<T>.IsReadOnly
. And there’s no way to check whether we can update items in a collection other than checking its type:
public void AddAndUpdate(IList<int> list)\\n{\\n if (list is int[])\\n {\\n // Update only\\n list[0] = 1;\\n return;\\n }\\n\\n if (list.IsReadOnly)\\n {\\n // No action\\n return;\\n }\\n\\n // Both add and update\\n list[0] = 1;\\n list.Add(1);\\n}\\n
Thus, arrays violate LSP. Note that arrays violate this principle in case of generic interfaces only.
\\n\\nDid Microsoft go wrong with this? Well, it was a trade-off. It was a conscious decision: such design is simpler than before, but it breaks LSP in one particular case.
\\n\\nDespite the design became simpler, it still had a significant disadvantage: you had to check the IsReadOnly
flag in order to find out whether a collection could be changed. That’s not how programmers used to do such work. Actually, nobody did it. This property was used for scenarios with data binding only: data binding is one-way if IsReadOnly
is true and two-way otherwise.
For other scenarios everyone just used IEnumerable<T>
interface or ReadOnlyCollection<T>
class. That’s why two new interfaces were added in .NET 4.5: IReadOnlyCollection<T>
and IReadOnlyList<T>
. These interfaces were added to an existing ecosystem, so they shouldn’t introduce any breaking changes. That’s why ReadOnlyCollection<T>
implements IList
, IList<T>
and IReadOnlyList<T>
interfaces and not just IReadOnlyList<T>
. Also, that’s why IList<T>
doesn’t inherit from IReadOnlyList<T>
. Such changes would break existing assemblies, compiled with the older versions of .NET. In order to make them work, you’d need to recompile them with the new version.
Although the actual state of affairs can’t be changed because of backward comparability, it is interesting to think of the way it would look like if they were implemented today.
\\n\\nI suppose the class diagram would look like this (click to enlarge):
\\n\\n\\n\\nHere is what I did:
\\n\\nNon-generic interfaces were removed as they don’t add any value to the whole picture.
IFixedList<T>
interface was added so that Array doesn’t have to implement IList<T>
anymore.
ReadOnlyCollection<T>
class was renamed to ReadOnlyList<T>
because this name seems more applicable for it. Also, it implements IReadOnlyList<T>
only.
No IsReadOnly
and IsFixedSize
flags. They can be added for scenarios with data binding, but I removed them to show that they are no longer required for working with collections polymorphically.
There’s an interesting code example in .NET:
\\n\\npublic static int Count<T>(this IEnumerable<T> source)\\n{\\n ICollection<T> collection1 = source as ICollection<T>;\\n if (collection1 != null)\\n return collection1.Count;\\n\\n ICollection collection2 = source as ICollection;\\n if (collection2 != null)\\n return collection2.Count;\\n\\n int count = 0;\\n using (IEnumerator<T> enumerator = source.GetEnumerator())\\n {\\n while (enumerator.MoveNext())\\n checked { ++count; }\\n }\\n\\n return count;\\n}\\n
It’s an implementation of Count extension method for LINQ-to-objects from Enumerable class. The input object is tested for comparability with ICollection interfaces in order to calculate items count. Does this implementation violate LSP? Think a minute before reading the answer.
\\n\\nNo, it doesn’t. Although this method iterate through the subtypes of IEnumerable<T>
, they all have coherent implementations. In other words, ICollection.Count
and ICollection<T>.Count
properties have the same postconditions as the statement that counts the items with the while
expression.
.NET BCL suffers from backward comparability requirements just like the most of the other popular libraries. It would be much easier if it could be rewritten from scratch with all the knowledge we got for the past years. Although it can’t be done, we can mitigate the pain using IReadOnlyCollection<T>
and IReadOnlyList<T>
in places where IEnumerable<T>
is not enough.
Read also: IEnumerable interface in .NET and LSP.
\\n\\n\\nMark Seemann brings up a very interesting subject in his post: how to fit Command Query Separation principle in case you have to save a brand-new object in a database and also need the created id back? Sure, you can have GUIDs for identifiers (which have some drawbacks as I’ll show later on), but what if you really need integers?
\\n\\nI’ve been asked the same question for several times, so in this post I’ll share the solution I use for this problem.
\\n\\nIt is a common approach to have database-generated identifiers. It significantly simplifies the code as you don’t have to bother about creating unique id for every entity you save. To persist an entity, many programmers use code like this:
\\n\\npublic interface IRepository<T>\\n{\\n int Create(T item);\\n}\\n
As Mark fairly shows, this design violates CQS principle because the Create method is obviously meant to be a command, but returns a value. He also introduces a solution:
\\n\\npublic interface IRepository<T>\\n{\\n void Create(Guid id, T item);\\n int GetHumanReadableId(Guid id);\\n}\\n
While this solution clearly solves the problem of violation CQS, it does it at a high cost. First of all it brings additional complexity introducing two methods instead of one. Secondly, this solution might suffer a performance hit.
\\n\\nAlthough the performance concerns may not be an issue in most cases, it is important to understand them so you could avoid such issues when they are vital.
\\n\\nSo, what are they? Besides the fact that this design results in two calls instead of one, GUIDs hurt performance of inserts. Relational databases use B+ tree structure to store data. If data coming in is not ordered, then a lot of IO work is required to rearrange the leafs. GUIDs are random, so they may lead to heavy performance drawbacks, especially with large tables.
\\n\\nWe could use sequential GUIDs instead and it would solve the problem with leaf rearranging, but we will still have two more problems. The minor one is that GUIDs use 16 bytes instead of 4-8 bytes that are used by INTs and BIGINTs. And those additional 8 bytes would be in every index of the table, not only in clustered one. The major problem is that if we need integer Ids, we still have two calls instead of one.
\\n\\nDatabase generated integers may hurt performance of inserts as well but for another reason. When a database generates a new auto-incremented value for a primary key, it takes a lock to avoid race conditions. It causes a performance issue that shows up in scenarios with multithread or bulk inserts.
\\n\\nA better solution here is to preload a batch of Ids and use them for assigning to new objects on the application side. Here is how it can be done.
\\n\\nWe can use a separate table to track the batches of Ids we already issued for every entity type:
\\n\\nCREATE TABLE dbo.Ids\\n(\\n EntityId int PRIMARY KEY,\\n BatchNumber int NOT NULL\\n)
The column names are pretty self explaining. Batch number is basically an index of a pile of Ids that client can allocate as it wants. For example, if you have a batch with number 14 and the batch size is 100, then you can distribute Ids 1400, 1401, …, 1499 among new objects. Another client at the same time might reserve batch number 15 and assign keys from 1500 to 1599. As every batch is reserved for a single client, the Ids they generate never collide. Note that you can use any size of the batch, but it’s easier to work with a round number like 100.
\\n\\nThere’s a very well suited analogy for this approach. You can think of id batches as IP addresses. ICANN gives you a bunch of IPs, and you can distribute as many of them as you want within the limit of the CIDR range. Batch number acts just like a CIRD range here.
\\n\\nAnother benefit of such solution is that you don’t need to touch the database to get your Id back. It’s a crucial concern. First of all, it allows you to increase performance of bulk inserts and related inserts. For example, you might need to insert a parent and a child. In this case the parent’s Id is required for child object creation. Secondly, it fits the Unit of Work concept: you have to contact the database only once, when you decide to commit all the work done in the unit. With database-generated Ids, it might require you to do an intermediate insert if, say, you need to know the object’s id right after you create it. Also, if you choose to rollback the creation, you’ll have to delete the inserted object instead of just not inserting it.
\\n\\nBut how exactly you should issue a new batch number? Simply by incrementing the number in the Ids table using SQL query. First, let’s look at how you should not do it. Here’s the code from a project I worked on:
\\n\\nCREATE PROCEDURE dbo.sp_GetNextBatchNumber\\n @EntityId int\\nAS\\nBEGIN\\n IF EXISTS (SELECT * FROM dbo.Ids WHERE EntityId = @EntityId)\\n BEGIN\\n BEGIN TRAN\\n\\n SELECT BatchNumber\\n FROM dbo.Ids\\n WHERE EntityId = @EntityId\\n\\n UPDATE dbo.Ids\\n SET BatchNumber = BatchNumber + 1\\n WHERE EntityId = @EntityId\\n\\n COMMIT TRAN\\n END\\n ELSE\\n BEGIN\\n SELECT 0\\n\\n INSERT dbo.Ids (EntityId, BatchNumber)\\n VALUES (@EntityId, 1)\\n END\\nEND
There’re two problems with this code. First, it will have deadlocks in heavily loaded system. Look at the first part of the if
statement. When a transaction selects the current batch number, it obtains a shared lock. After that it tries to upgrade the lock to exclusive to update the record. If another transaction selects the batch number with the same entity Id after the first transaction ends selecting, but before it starts the update, the deadlock will occur. Two transactions will acquire the shared lock, and both of them will also try to acquire an exclusive lock, waiting for each other to release the shared lock.
The second problem is that there will be primary index violation exceptions in heavily loaded systems. Look at the second part of the if
statement. If two transactions try to insert a record with the same entity id simultaneously, they will collide. The only way to isolate two inserts is to use serializable isolation level. But it leads to a heavy performance degradation because you have to wrap the whole stored procedure in a transaction with serializable isolation level, causing all calls to this procedure to perform sequentially.
Here is a corrected version of the stored procedure:
\\n\\nCREATE PROCEDURE dbo.sp_GetNextBatchNumber\\n @EntityId nvarchar(100)\\nAS\\nBEGIN\\n BEGIN TRAN\\n\\n UPDATE dbo.Ids\\n SET BatchNumber = BatchNumber + 1\\n WHERE EntityId = @EntityId\\n\\n SELECT BatchNumber - 1\\n FROM dbo.Ids\\n WHERE EntityId = @EntityId\\n\\n COMMIT TRAN\\nEND
Note, that update is performed to acquire an exclusive lock right away. The main point here is to acquire all the required locks at once so that no other transaction could wedge itself in. Also note, that insert statement was deleted. It is much simpler to insert new rows manually for the new entity types. It allows not to isolate the code with serializable level and thus to perform much better. With all that said, we can use the design that is still simple and also fits CQS principle:
\\n\\npublic interface IRepository<T>\\n{\\n void Create(T item);\\n}\\n
What about drawbacks? Every design must have them, right? Yes, there is one. As you can see the Ids in batches are operated in memory, so if the application is rebooted all unused Ids will be lost, leading to holes in id sequences. The id sequences are still increasing, so there won’t be any performance drawbacks, but these holes might seem annoying, especially if your batch size is large.
\\n\\nThat is why you should carefully consider the batch size. If it’s too low (1-10), then you might not gain a lot of performance benefits as you’ll need to ask the database for a new batch every 1-10 new objects. If it’s too large (1000 and more), then the holes will be particularly annoying. Also, this might lead to sequence exhausting if you use 4 byte integers for Ids. You should consider the typical usage scenarios for your application and decide what size will fit you. I prefer to start with the size of 100 and then tune it depending on use cases. Of course, you may assign different batch sizes for different entity types.
\\n\\nThe last thing I’d like to write about is that all this functionality is already implemented in NHibernate. It’s called Hi/Lo id generation algorithm. Here is an example of fluent mapping you can use:
\\n\\npublic class YourEntityMap : ClassMap<YourEntity>\\n{\\n public YourEntityMap()\\n {\\n Table(\\"[dbo].[YourEntity]\\");\\n Id(x => x.Id).GeneratedBy.HiLo(\\n \\"[dbo].[Ids]\\",\\n \\"BatchNumber\\",\\n \\"100\\",\\n \\"EntityId = 1\\");\\n }\\n}\\n
Do I always use this approach? Yes, if I use NHibernate. If I use another ORM, or don’t use it at all, then it depends on the project’s size and its complexity. For a smaller one, it’s okay to break CQS and use database generated Ids. For larger projects, it might be better to switch to Hi/Lo.
\\n\\nBenefits of Hi/Lo:
\\n\\nDrawbacks of Hi/Lo:
\\n\\nIf you follow DDD principles, you eventually end up creating a base class for all the domain entities. It’s a good idea as it allows you to gather common logic in one place. When you decide to do that, you inevitably face the question of what exactly should be included in that base entity and how it should be presented.
\\n\\nI often see developers using interfaces as a base entity. The code might look like this:
\\n\\npublic interface IEntity\\n{\\n int Id { get; }\\n}\\n
While this approach guarantees that all domain entities have some minimum functionality (Id property in this example), in most cases having an interface as a base entity is a bad idea.
\\n\\nFirst of all, interfaces don’t allow you to keep any logic in them, and you’ll need to implement the same logic in all the inheriting classes, which leads to code duplication and violation of the DRY principle. Even with the simple example above, you’ll have to introduce the Id property in every single entity.
\\n\\nSecondly, using an interface doesn’t show the appropriate relationship between domain entities. When a class implements an interface it makes a promise about some functionality, but that’s not enough. Two classes implementing the same interface don’t show how they are related to each other; they may belong to entirely unconnected hierarchies. In other words, the IEntity
interface introduces a “can do” relation (according to Bertrand Meyer’s classification), whereas domain entities should be connected to the base entity by an “is a” relation. Every domain class not only has the Id property, but itself is an entity. It is important to remove possible misunderstandings; using an interface instead of the base class can lead to one.
Okay, but what logic do we need in the base domain class?
\\n\\nObviously, it should have an Id
field, which is mapped to a table’s primary key. All tables in the database must have ids with the same type so we could factor the Id property out to the base class. Here is how you should not do it (at least not from the very beginning):
public class Entity<T>\\n{\\n public T Id { get; protected set; }\\n}\\n
Motivation for such code it pretty clear: you have a base class that can be reused across multiple projects. For instance, if there is a web application with GUID Id columns in the database and a desktop app with integer Ids, it might seem like a good idea to have the same class for both of them.
\\n\\nBut this approach introduces accidental complexity because of premature generalization. There is no need for a single base entity class when you work with multiple projects or bounded contexts. Each domain has its unique path, so let it grow independently. Just copy and paste the base entity class to a new project and specify the exact type that will be used for the Id
property. Only when the need for entities with different Id types arises in the single project/bounded context, should you introduce a type parameter in the Entity
base class.
So, what should the base domain class look like? Here is the code I use in production, let’s step through it.
\\n\\npublic abstract class Entity\\n{\\n public virtual long Id { get; protected set; }\\n\\n protected Entity()\\n {\\n }\\n\\n protected Entity(long id)\\n {\\n Id = id;\\n }\\n\\n public override bool Equals(object obj)\\n {\\n if (obj is not Entity other)\\n return false;\\n\\n if (ReferenceEquals(this, other))\\n return true;\\n\\n if (GetUnproxiedType(this) != GetUnproxiedType(other))\\n return false;\\n\\n if (Id.Equals(default) || other.Id.Equals(default))\\n return false;\\n\\n return Id.Equals(other.Id);\\n }\\n\\n public static bool operator ==(Entity a, Entity b)\\n {\\n if (a is null && b is null)\\n return true;\\n\\n if (a is null || b is null)\\n return false;\\n\\n return a.Equals(b);\\n }\\n\\n public static bool operator !=(Entity a, Entity b)\\n {\\n return !(a == b);\\n }\\n\\n public override int GetHashCode()\\n {\\n return (GetUnproxiedType(this).ToString() + Id).GetHashCode();\\n }\\n\\n internal static Type GetUnproxiedType(object obj)\\n {\\n const string EFCoreProxyPrefix = \\"Castle.Proxies.\\";\\n const string NHibernateProxyPostfix = \\"Proxy\\";\\n\\n Type type = obj.GetType();\\n string typeString = type.ToString();\\n\\n if (typeString.Contains(EFCoreProxyPrefix) || typeString.EndsWith(NHibernateProxyPostfix))\\n return type.BaseType;\\n\\n return type;\\n }\\n}\\n
By default, the Id
property is of long
type, which excludes the possibility of sequence exhausting (the long
type may contain numbers up to 9,223,372,036,854,775,807). Also, all members are made virtual in case you are using NHibernate where virtual members are required to create runtime proxies. Setter is made protected and not private because of NHibernate as well.
The most interesting part is equality members. There are three types of equality in enterprise software:
\\n\\nReference equality means that two references refer to the same object in memory.
Identifier equality means that two objects in memory refer to the same row in the database.
Structural equality means that two objects deemed the same when they have the same structure. This is useful happen when an object doesn’t have its own identity (such objects are called value objects).
The Entity
class covers the first two situations: when two objects either equal by reference or by identifier. Structural equality pertains to Value Objects.
Notice the GetUnproxiedType
method. It’s here to address the issue of ORMs returning the type of a runtime proxy when you call GetType()
on an object. This method returns the underlying, real type of the entity. It handles both NHibernate and EF Core.
The second if
statement in Equals
checks reference equality and the last one checks identifier equality. The Id.Equals(default)
line checks for transient entities. Such entities are not yet saved to the database and have the Id set to zero by default. We cannot check identifier equality in such entities, and hence treat two entities as different even if both of them have the Id property set to zero.
The GetHashCode
method must be overridden together with the Equals
method. They always come together because of the internal .NET logic. This logic can be illustrated in the code below. When you call Contains()
method it loops through the objects in the list and calls GetHashCode()
on each object. If the hash codes are the same, .NET also calls Equals()
to make sure the objects are indeed equal to each other. Only if the both checks, GetHashCode
and Equals
, are passed the two objects are considered equal.
Entity obj = GetObjectSomehow();\\nList<Entity> objects = GetObjectsSomehow();\\nobjects.Contains(obj);\\n
It is a good idea to declare ==
and !=
operators as well, because by default ==
operator checks reference equality only.
I use this code in most of my projects, sometimes adding some functionality if it appears in all project’s domain classes. Such functionality might be version property (byte[]
or int
) or any other useful stuff. But don’t be carried away by this, add functionality to the base domain class if it belongs to all domain classes only. Otherwise, you may end up building God Object instead of a thin base entity.
The Entity base class no longer contains the GetRealType
method and doesn’t rely on NHibernate anymore. Thanks to Anders Baumann for suggesting the workaround.
I’ve updated the base class to be compatible with both NHibernate and EF Core. You can also use this library to reference the latest version of the Entity
base class.
It’s the first post on the blog I’m starting today. The idea to start blogging was with me quite for a while, so I decided to give it a try.
\\nI’ve been in software development industry for 10+ years. Most of this time I worked on various enterprise projects as a developer, lead developer or software architect. I was often asked about architectural decisions I made and what I noticed is that most of my answers were about the same set of rules and principles I discovered on my career path. I’d like to share my experience with others and also to create some sort of wiki I could refer to when I need to describe one of the topics I’ve covered here. I’ll try to expose all my knowledge in a structured and compact way, although sometimes it might be hard to boil all the thoughts in my head down to one or two useful points. But as I’ve already said I will give it a try.
\\nSo, what is enterprise development or enterprise software craftsmanship about? Sometimes people say that enterprise development is when a programmer is hired to develop applications for a corporation whose primary business is not software development. Although in many cases it’s true we can’t restrict the definition of enterprise development to such cases. A widely known counterexample is Microsoft - this corporation builds enterprise solutions and its primary business is software development.
\\nWell, what is it then? It is a software developed to automate or simplify company’s processes.
\\nIf we take a close look at various applications (not only enterprise) we could notice that all of them have a set of attributes which consists of functional and non-functional requirements and some other characteristics. Here is a rough list of such attributes:
\\nConsistency
\\nAvailability
\\nComplexity
\\nScalability
\\nCustomer type
\\nAmount of data it operates
\\nPerformance
\\nUsability
\\nProject duration
\\nI won’t dive into this list right now, I just want to mark the attributes usually presented in enterprise software.
\\nHigh consistency requirement. You certainly want your customers to see what they have done without delays. Submitted orders must appear in the profile page, changes in a product’s price must be shown to all users instantly.
\\nHigh availability requirement. The services must be available to all customers 99% of the time or customers should be able to proceed the most crucial work even if one of the servers is down.
\\nEnterprise software operates moderate amount of data. This data usually sits on one or a few servers so you don’t have to partition it. Also you are perfectly fine with one of the none-open source relational databases like Oracle or Microsoft SQL Server.
\\nHigh business logic complexity. To understand and implement this logic developers must work side by side with domain experts.
\\nEnterprise software is developed for businesses, not for individual users, although such users might find some of it useful as well.
\\nScalability usually is not a strong requirement. Enterprise applications generally don’t have millions of users.
\\nPerformance and usability are not strong requirements either. As the customers of enterprise applications are not individual users you don’t have to keep your software smooth and nice, because the users simply don’t have any other choice. That’s an ugly truth about enterprise solutions: if it is developed for the only customer (i.e. it is not a distributed software like for example Oracle DB) then it usually doesn’t have an easy-to-use user interface and its performance leaves much to be desired as well. Customers prefer not to spend a lot of money on performance and usability, focusing on functional requirements primarily.
\\nA typical enterprise project is usually one or more years long. It might have active development phase and support phase. While active development phase new features are constantly added to the project. On the support phase only bugfixes and minor changes are released.
\\nIf you compare this set of attributes with the one from a Facebook like application you can see the main difference: enterprise software usually don’t have an enormous amount of data (or big data as it called today) and unlike Facebook it has really complex business logic. It leads to a common set of practices developed specially for handling complicity, most of which I will try to cover in this blog.
\\nMy primary programming language is C# but I hope at least some of the articles would be useful for non-Microsoft-stack developers as the experience I’ll share here is pretty language agnostic.
\\nOne more thing I’d like to write about is fun or satisfaction developers get on their jobs. Enterprise development often reckoned as boring kind of software development. Who would like to write code for yet another version of licensing subsystem? Isn’t it much more interesting to design a web page with a new cool JavaScript MVC framework instead? Well, I don’t think so. While I do agree that you get less visible feedback with enterprise development than you do with web development, you get totally different kind of challenge.
\\nWith enterprise software craftsmanship you will have to not only fulfil all the requirements you have (which is not an easy task itself), but also to do it in a way that will allow you to add new features with as little effort as possible. When you get a working solution with a simple and neat architecture - that’s when you start loving your job. You feel how your design knowledge takes shape of a sense of beauty and you start feeling problems with code just after you look at it. You will still have to remember patterns and principles at least to be able to describe your standpoint to others, but inside you will operate them in much faster way, just like Go game players see the whole picture and don’t try to predict every rival’s piece movement one by one.
\\n