Core Data Made Easy: Some Code + Practices for Beginners and Experts

Core Data is Apple’s answer to “Wow! It’s difficult to store objects in an SQL database!”. Extended, over time, to do a lot more than just that – but that’s the core.

If you know what you’re doing – and you avoid the pit-traps along the way – it can be very good indeed. But we frequently see code written by iOS professionals that betrays a misunderstanding of what Apple intended, and misses out on some of the best features of Core Data.

Over time, we’ve coalesced some of our practices into re-useable code and techniques. All the code in this article is already up on GitHub, and I’ll be maintaining it periodically with improvements from our own projects.

(this post will be followed by a few others, or else extended over time)

Alternatives

NB … if you’re not interested in alternative approaches, skip ahead to the next section.

…re-write / extend Core Data?

Before we start, I should point out that if you want “a better Core Data” there’s several things out there. I’ve tried a few, but none of them have ever made it onto our launched apps. This is *even though* I’ve been quietly impressed by some of them.

Why not? Well … here’s the issues that have affected us (and may affect you too)

  1. Core Data – in fact, any Persistence Layer – is massively complex, it deliberately breaks the Objective-C/Cocoa platform, and it’s extremely dangerous to mess with. Some of the tech we’ve seen tries too hard to “fix” CoreData, and in the process has unexpected side-effects.
    • …when you’re dealing with USER DATA, that’s a terrifying prospect
    • Examples include: the Objective-C standard method “description” *corrupts yourd data if you alter it in any way”
    • Examples include: the Objective-C standard method “copy” *is hardcoded* to be unusable on CD objects – you get a nice error message at runtime from Apple if you try to use it, and your app crashes
    • …and other similar terrors. Unless you are 120% confident of your “CoreData++” library, don’t risk it.
  2. Core Data is a single standard that Apple beats developers over the head with. There’s only one version, it’s very difficult to customize.
    • So: if you stick to standard CoreData method calls and behaviour, most developers can immediately understand your code without having to learn your proprietary setup
    • …which saves a lot of time and money on projects with multiple teams / companies involved
  3. For some projects / clients / partners, the “license” on code is a serious issue, and code that you or I would happily use gets “forbidden” by the legal teams.
    • For some clients, anything we use has to be written by us, or have a massively open license (no GPL, no LGPL – not even MIT/BSD, in some cases)
    • For a very narrow set of clients, anything we use has to be reviewed *by us* (or by the client) line by line, and we have to warranty it. i.e. the code needs to be extremely short and simple!
  4. Any code we write on top of this – can it be re-used in a “plain” CoreData project?
    • Often, you want to copy/paste snippets of code from one old project to a new one (assuming of course you – or same client – own the original code).
    • …if you’ve extended CoreData with new classes and methods, you might find this is very hard to do. (this has happened before on some iOS projects, with other libs we’ve used)

So, instead, we focus on making as few changes to CoreData as possible – and generally just sticking to Apple’s own concepts.

Making life easy, with minimal changes

We’re going to do three broad things:

  1. Clean up Apple’s code, and enable some Apple-provided features you probably didn’t know existed
  2. Very slightly extend Apple’s code (mostly: we’re going to make a start at adding some Blocks, since Apple still hasn’t!)
  3. Offer some good practices in how you write YOUR code

1. Replace “100 lines of code” with a simple, easy, encapsulated class

One new class: CoreDataStack.m

One of the best parts of Core Data is that the high-level architecture is neatly split into 4 distinct layers. Collectively, Apple terms these the Core Data “stack” (although that term never appears in code – only in docs). Unfortunately, since all those layers are needed before you can do anything, it requires 100 lines of code just to read a single item from Core Data.

Apple’s image + text showing the layers (may require login to Apple Dev network)

  1. Layer 1: file (on disk, or flash memory – Apple has NOT YET IMPLEMENTED a network version, or a remote DB one, although devs have been asking for this for many years. Unlikely to happen anytime soon)
  2. Layer 2: Persistent Object Store (the DB part)
  3. Layer 3: Persistence Store Co-ordinator (high-level management stuff)
  4. Layer 4: NSManagedObjectContext (this is what 99% of your app code talks to)

Apple has largely ignored this problem; their “solution” is that when you create a new project in Xcode, they dump 100 lines (well, not quite 100, but close) of crud into your AppDelegate class.

This the wrong place to put that code – by Apple’s own rules, it should not go there. If your application is multi-threaded (90% of apps are) then it also MUST NOT go there (that’s almost guaranteed to cause data-loss bugs in the long run).

So, the first thing we did was to create a class – CoreDataStack – that encapsulates Apple’s 100 lines of boilerplate code:

GitHub link to CoreDataStack.m

You can drag/drop this into your projects (take the .h file too, of course) – it has NO DEPENDENCIES, except for CoreData.Framework (which is needed in all Core Data projects, by definition)

Using CoreDataStack

Instead of Apple’s large number of method calls, and large number of properties, in real-world projects you need just one property and just one method.

(NB: this assumes you’ve already used Apple’s GUI Editor to create a CoreDataModel file with one or more Entities and Attributes)

To start using Core Data:

/** one line complete init */
CoreDataStack* cdStack = [CoreDataStack coreDataStackWithModelName:@"MyModelName"];

/** one property you need for all Core Data method calls */
cdStack.managedObjectContext;

…where MyModelName is the filename of the Model file (the thing you click on to get the GUI interface for editing Entities and Attributes)

That’s it! Really! Everything else can be inferred (so we do it automatically, in code).

For compatibility, all the other items you *might, theoretically* want to access (e.g. the PersistenceStoreCoordinator, etc) are presented as properties on CoreDataStack.

Why is this a separate class, and yet not a Singleton?

This is actually very important: you DO NOT WANT CoreDataStack to be a Singleton! (although many people try to write their apps that way – c.f. Apple’s default template and its abuse of AppDelegate.m!)

Why?

Because Core Data was designed to have *multiple simultaneous instances* of NSManagedObjectContext, NSPersistentStoreCoordinator, etc. If you limit yourself to one copy of each – as per Apple’s template – you disable many of the features of CoreData.

Worse, some of those features are *required* if you’re going to use CoreData correctly on a real-world project.

The nearest you can get to making this a singleton is … have one instance per xcdatamodel file (an earlier version of the code would even cache this for you, although we since removed that because the performance impact was invisible – not worth the complexity).

More on this (multiple CoreDataStack instances) in the next section…

Tips and tricks: Improving your own code

Use multiple CoreDataStack instances

Here’s a little secret: Core Data was designed for you to access MULTIPLE “models” at once, within a single app … but Apple’s default template for iPhone projects makes that impossible (unless you replace it).

Also, remember the most important rule of Core Data: Apple’s source code is NOT THREAD SAFE and WILL CORRUPT YOUR DATA at random intervals if your app is multithreaded in the “wrong” way [click for more info].

…but here’s another, bigger, secret: if you’re clever, using multiple models at once, you can *avoid* the need for thread-safe code – both your code and Apple’s code. This is the only excuse for Apple’s code being unsafe: you can avoid the need for safety.

So, with CoreDataStack, to use a second model in your app, all you do is:

/** one line complete init */
CoreDataStack* cdOtherStack = [CoreDataStack coreDataStackWithModelName:@"MyOtherModelName"];

/** one property you need for all Core Data method calls */
cdOtherStack.managedObjectContext;

So long as you only have one thread reading and writing to each CoreDataStack instance, you will avoid all the bugs caused by Apple’s unsafe code. In most multi-threaded apps, it’s easy to split your data into multiple separate models, and have each model locked to a particular thread.

Multiple models … what?

Say you’re writing an Email client, that has the following classes for Core Data:

  • Email.m : Subject, To, From, Body, isReadYet, DateReceived
  • Person.m : EmailAddress, Name

Your app does the following:

  • Shows a list of emails, that you can click on to read
  • Shows a list of people – tap a person’s name to send them an email, or tap an “Add” button to add a new person
  • Downloads emails in the background automatically
  • Syncs your contacts to an addressbook server in the background

You now have a multi-threading problem – two sets of background threads accessing Core Data. This *will corrupt your data*. Apple has never tried to make CoreData thread-safe.

Now you have two choices:

  • OPTION 1: learn how to write the bizarre code that lets CoreData function with multiple threads. Pray that none of your colleagues dares to edit any of your code – because if they do, there’s a high chance they’ll break it without realizing
  • OPTION 2: Instead of having ONE model, have TWO models. One model contains just “Email” and the other contains just “Person”. Each background thread is associated 1:1 with a separate CoreDataStack instance – and suddenly everything is Thread-Safe. Nothing to worry about!

NB: this works because of CoreData’s fundamental design: Apple’s code is *not thread-safe for a single NSManagedObjectContext, but it IS thread-safe for multiple separate NSManagedObjectContext’s in memory at once*

Experts only: Single CoreData model, multiple threads

Further, if you know what you’re doing with multi-threading, there are many situations where you need to have two copies in memory of the SAME object-model. You’ll be manually cross-synching (there lie dragons).

But to be thread-safe, you have to guarantee that the entire stack – NSPersistenceCoordinator etc – is separate for each thread. The hard way to do this is to manually manage it. The easy way is to init a separate CoreDataStack instance per thread – and this will automatically be thread-safe, because each CoreDataStack is coded to NOT share any data or references between instances.

Good practices – applies to all CoreData projects

Don’t hard-code your class-names

Apple’s example source code for using CoreData is technically correct, but practically poor. It encourages a bad habit that – for most projects – is unnecessary and the cause of many bugs over time.

When you need to reference a CD object, Core Data is written so that you don’t neet to have access to the Class of that object. In theory, you can load a Class (from CoreData’s database, saved by a different app) that doesn’t exist in your project. We’re getting into some weird and freaky stuff here.

Just in case – even though 90% of coders will never use that feature – Apple tells you to use NSString to instantiate your CoreData objects. This is bad for most projects:

  1. Apple’s refactor tools in Xcode are 10 years behind everyone else’s – they don’t support CoreData, and they ignore those strings – if you refactor a CoreData class, Xcode will break your project
  2. It’s very very easy to make a typo when writing that string. Xcode 3 would auto-complete the name for you, but Xcode 4 removed this feature

So, each time you need to create a new object in the CoreData database/store, instead of this:

Email* newEmail = [NSEntityDescription insertNewObjectForEntityForName:@"Email" inManagedObjectContext:cdStack.managedObjectContext];

do this:

Email* newEmail = [NSEntityDescription insertNewObjectForEntityForName:NSStringFromClass([Email class]) inManagedObjectContext:cdStack.managedObjectContext];

…using Xcode’s autocomplete, this is the same or fewer keystrokes, even though it results in more text. More importantly, the compiler will now double-check that class for you, and refuse to build if you typo it. You’re also safe(r) with refactoring.

Similarly, when you do a Fetch with CoreData, instead of passing the string-name of the class, use the same NSStringFromClass call as above, for the same benefits.

Upgrading the CoreDataStack

Save Errors SHOULD NEVER BE IGNORED

Apple doesn’t like Exception Handlers and Assertions. Personally, I think 30 years of computer industry have proved them wrong there, but I’m willing to accept it.

Except for when a SAVE of CoreData fails; in this particular case, it is totally unacceptable for your app to silently ignore the error. Sadly, Apple’s default setup encourages you to do this.

How many times have you seen this in an app:

[self.managedObjectContext save:nil];

?

Sadly, I see it all the time. Because the alternative requires a minimum of 5 lines of code, including a double-pointer (that a lot of junior programmers and/or people who’ve never used C/C++ seem to feel uncomfortable with).

As a bonus, Apple’s code has some nasty behaviour with that save method. They have documented this (so I guess it’s a “feature” instead of a bug – but see what you think):

If managedObjectContex is nil, then “a save that fails” will ALWAYS seem to return “there was no error”.

This is one of the most painful things I’ve had to debug on CoreData projects. Many times.

So, we’re going to fix both of those. CoreDataStack.m has an optional method:

-(void) saveOrFail:(void(^)(NSError* errorOrNil)) blockFailedToSave
  1. Because it’s a block, it’s *just as easy to log the error* as it is to ignore it (instead of requiring you to write error-creating + checkign code). Further, because it’s a block, it’s easy to nest attempted saves, logs, and failures.
  2. If your NSManagedObjectContex is nil … instead of doing what Apple does (silent failure; pretends that the save has succeeded) … we explicitly FAIL (c.f. the source code for the saveOrFail method).

PS

CoreData corrupts data if used multi-threaded?


Until 2011, this wasn’t even documented, but … not only is CoreData “not thread-safe”, but if you merely allocate/init a CoreData context from a different thread you will cause it to quietly self-destruct later on – after it’s taken your precious data. It’s easy to understand once you know – but this is not an obvious side-effect, and I’ve seen it catch a few people.

e.g. one of the patterns coders previously used to handle multi-threading – because Apple hadn’t documented this requirement – involved Thread A creating the context, adding notification listeners, and then passing the context to the thread that would use it.

This works most of the time – I know, because I used this pattern back in 2009 – and I’d learnt it from someone else who’d been using it for a while themselves. But it also fails some of the time, unpredictably, with strange crashes deep in Apple’s code. Now we know better, of course.

My take-home: multi-threaded CoreData is to be avoided unless you *really* know what you’re doing with CoreData. Given that Apple has been very slow (years slow!) to document this aspect of CoreData, I don’t recommend it, unless you’re willing to spend a lot of time learning the “community wisdom” on the topic.

15 thoughts on “Core Data Made Easy: Some Code + Practices for Beginners and Experts

  1. Hey Adam,

    First of all I would like to say thanks for the blog post I have learned a lot about Core Data just from this post alone. My question is that I have about 5 entities is it best to put them all in a separate stack or what? I am not sure how to break up my data appropriately.

    Once again thank you again for the blog post I find using this way more effective and efficient and I will definitely be using it in all my apps from now on.

    Craig

  2. Look for “clumps” of entities, and put them in a stack together.

    e.g. a simple app will only have one clump, only one stack.

    e.g. a Social Photos app with two tabs, one tab for “uplaoding and browsing photos”, and another tab for “chatting to people live online” … probably has two clumps, with little (or no!) overlap between entity relationships.

    NB: lots of apps have sets of Entities that are in NO WAY related to each other. Those are BEGGING to put into separate stacks. Many of them end up artificially connecting their Entities so that you get one mega-graph, because they don’t realise you can have separate stacks running in parallel.

    Another example is — maybe your app uses Core data to display info to the user on every screen, but it also does background-download of low-level data. It would be VERY VERY convenient to split into one stack for “everything the user sees / interacts with”, and a separate stack for “everything that is downloaded from the server / internet”.

    In that case, there’s going to be some point where you have to rationalise the data. But … if e.g. you have an Entity “DownloadedArticle” that contains raw RSS data (say it’s an RSS reader), and you have an Entity “Article” that contains the formatting instructions and colours and stuff … it’s much easier to write code that “reads” from stack 1 and “writes” to stack 2 … than it is to write 100% safe, generic, CoreData code.

    NB: all the rest of the code in your app wouldn’t have to worry about multi-threading, only that bit of code that handled the meeting point. In a 1-stack app, everything has to worry about multithreading :(.

  3. Very good article. And I have always thought that CoreData could be improved. Do you have any sample code using your techniques? Ideally using xcode 4.5.1?

  4. All the sample code is included in the blog post.

    There is literally nothing more to type in.

    CoreDataStack does 99% of the code for you – the 200 lines or so that Apple adds to a “core data enabled” new project, become 2 lines.

  5. Hi,

    This is a great tutorial. There is one thing I don’t understand in the header:

    Returns a SHARED stack – multiple classes fetching the same filename will get the same stack object back (this is what you want in 99.9% of cases)

    What do you mean my “SHARED stack”, is this a shared CoreDataStack or a shared “Core Data Stack”. And I can’t see how this is different to:

    If you need separate stacks, then init the first one using the name, and init all subsequent ones like this:

    firstStack = [CoreDataStack coreDataStackWithDatabaseFilename: @”MyModel”];
    secondStack = [CoreDataStack coreDataStackWithDatabaseURL: firstStack.databaseURL]; // uses the same config data, but is NOT shared

    Cheers,
    Brett

  6. It was a convenience method, now removed I think because it was confusing and not that valuable.

    It was to get the benefits of a singleton, even though you need N singletons (one for each model that exists in your project).

  7. Where would I instanciate the the coreDataStack* cdOtherStack = [CoreDataStack coreDataStackWithModelName:@”MyOtherModelName”];
    and share this model to my viewcontrollers?

  8. Thank you for all this information on Core Data. I have been intensively learning iPhone programming for a month now (from a C/C++ background), planning not to make any mistakes others have made before me 😉 – I am in the early stages of preparing my brains for a big project with multiple data sources.

    So far I have no intention to clear my database, but it might be necessary one day. So I just now tried to subclass NSFetchedResultsController:

    I get errors from Xcode. The first issue is: “ARC Semantic Issue: No visible @interface for ‘NSFetchedResultsController’ declares the selector ‘initWithCoder:'”

    Well, I checked in the NSFetchedResultsController class. The error text is quite obviously true. However, your blog entry is not old and I don’t think NSFetchedResultsController changed quite so fast or quite so radically – and: ‘ARC Semantic Issue’? I am aware you are not using ARC from your CoreDataStack sources (‘autorelease’).
    What is Apple really trying to tell me? Did I miss something? Do I need to extend the NSFetchedResultsController class with a category that I have overlooked?

    The second issue is the constant kNotificationDestroyAllNSFetchedResultsControllers. I can make it known to my new class by importing CoreDataStack.h, but after I do that (either in the .h or in the .m file) __fetchedResultsController is not recognized any more. I get a semantic error for unknown type and I am prompted to replace it with the class name NSFetchedResultsController. Why??! Baffles me! 🙁

    The relevant source as a reminder:
    -(id)initWithCoder:(NSCoder *)aDecoder
    {
    self = [super initWithCoder:aDecoder];
    if (self) {
    // Custom initialization

    [[NSNotificationCenter defaultCenter] addObserverForName:kNotificationDestroyAllNSFetchedResultsControllers object:nil queue:nil usingBlock:^(NSNotification *note) {
    NSLog(@”[%@] must destroy my nsfetchedresultscontroller”, [self class]);
    [__fetchedResultsController release];
    __fetchedResultsController = nil;
    }];
    }
    return self;
    }

  9. @Sebastian – sorry, I only just noticed your comment (I’ve been very busy the past 2 months).

    The original design was that you could safely instantiate it in EACH METHOD that does any CoreData processing. i.e. no need to pass around references, sparing you a load of @property’s that you DO NOT NEED, etc.

    …but that anyone who WANTED to use the old (bad) Apple coding style, of passing a reference everywhere … could equally safely:

    1. instantiate in the AppDelegate (although AppDelegate is techincally the wrong place, it’s an “easy” place to do it)
    2. pass that reference to every viewcontroller, *exactly as per Apple’s sample code*

    There is a tiny performance impact to repeatedly re-instantiating on demand. First version of this code I had a dynamic cache so that there was zero difference in the two techniques – but I found the performance loss/gain *too small to measure*, so I removed the code.

    I have since seen a couple of projects where people were re-instantiating thousands of times per second (badly written methods), so I might re-add the cache at some point. But if you write your code sensibly – e.g. instantiate no more than once per UIViewController – it works fine and fast right now.

  10. @Angelika – re: “ARC Semantic Issue: No visible @interface for…”

    – I’m afraid I have no idea; I never use ARC. I found it has literally *no* benefit to existing ObjC projects, and it adds a few new bugs :(. I can’t advise you on the importance / unimportance of that warning, sorry

    re: “__fetchedResultsController is not recognized any more”

    – As far as I can tell, this is a bug in YOUR project. There is nothign in the CoreDataStack header or class that would cause this (I just double checked the current source version).
    – PROBABLY: Xcode’s compiler is crashing on a PREVIOUS error/warning in your project, and the net effect is that it cannot fully compile one of your classes. So you see this error. PLEASE NOTE: APPLE’S COMPILER IS BROKEN BY DEFAULT! Some of the compiler “warnings” are actually compiler “errors” – a few of them are things that PREVENT the compiler from compiling your source (although it pretends it can compile, it actually cannot). So … carefully check EVERY warning (as well as every error).

  11. Thanks a lot!! I love your approach to Apple: use it but trust your own brains most.
    – I’ll ponder about ditching ARC. I had presumed that for a new project, it’d do no harm to try it.
    – And I will leave the clearing of the database technique until later when I am wiser. There is no other error or warning, only that one and I can switch it off and on by importing or removing the CDS class. Really weird. Makes me want to explore the why and how of it. So much to learn and so little time to one life! 😀

  12. Having read so many blogs on CD, this is the most refreshing. One very newb question though, can I implement it in a cocoa project or is it only for IOS? ie – would changes be necessary in the code to adapt it to a cocoa project?
    Many thanks again!

  13. I think it works in OS X similarly.

    However, for iOS 6, Apple finally replaced some of this (quietly) with a system that works and is NOT absurdly complex and error prone :). Specifically instead of fixing the threading model they added two new threading models, which are (finally!) thread safe.

    (It existed in iOS 5 but was badly broken in key ways – including some of the headline features. Google “nested coredata” for some of the more painful examples)

    So … I’d check how much of that is live in OS X right now. With the caveat that Apple hasn’t been doing as good a job of error-checking before release as we’d hope.

    We’ve been using the new Core Data in parallel with the old, and – apart from the iOS 5 / iOS 6.0.0 bugs – it’s worked pretty well. There are STILL some nasty bugs / design flaws that Apple hasn’t documented (e.g. their documentation on GCD / dispatch_queue’s is so misleading it’s almost wrong, while being technically correct. It’s also missing quite a lot of critical info (it essentially says “don’t worry, everything is magic and unicorns” which is obviously not true)). However, it’s a lot easier to use than the iOS 4 / main Core Data approach.

    We’re still maintaining the GitHub “simple CoreData stack” project, but it’s not as essential now as it used to be…

Leave a Reply

Your email address will not be published. Required fields are marked *