Rejected

:speech_balloon:

Size diet for storage space

I’ve been using Arc for almost 2 years, and it’s rate of growth in space usage seems rather high. While not critical yet, it’s already up to 8 GB of iCloud storage used and 1.5 GB of phone space. Meanwhile the gpx export for that entire time is just 180 MB, including the various duplicates it sometimes makes.

3 votes

Tagged as Suggestion

Suggested 16 August 2019 by user Jameson

Moved into Rejected 17 August 2019

  • Sign in to comment and vote. Sign in by email
  • 16 August 2019 Jameson suggested this task

  • 17 August 2019 Matt Greenfield moved this task into Rejected

  • avatar

    Hi!

    Unfortunately the GPX export is not a good representation of how much data Arc has, because the GPX format doesn’t support the majority of details that Arc records.

    I recommend doing some JSON exports, to get a better idea for how much detail there is in Arc’s data.

    For the iCloud backup size, that appears to be a bug with iCloud itself, in that iCloud doesn’t properly clean up after itself when data is deleted from the iCloud CloudKit service. Often the deleted data continues to use up space. So Arc’s iCloud backups (which back up to the CloudKit database service, in your private iCloud container) tend to get larger over time due to this iCloud bug.

    To get around the iCloud bug the only way that I’m aware of currently is to contact Apple Support and get them to do some refresh thing at their end, to flush out deleted data from CloudKit.

    17 August 2019
  • avatar

    That makes sense, but GPX is also a fairly inefficient text-based format, so I was hoping those effects might partially cancel out. The original numbers work out to about 2 MB/day average. I did a JSON export (5 MB for yesterday) and see what you mean by having a lot of additional fields, which don’t make sense in GPX. Since that’s also a fairly verbose text format, I compressed it with gzip and xz to get a quick-and-dirty sense of the minimum possible binary size and it came out to 600 KB. I guess that makes sense then there isn’t a whole lot that can be done (at best, maybe 2-3x).

    Fortunately, I appear to have at least a few years of runway before it becomes a problem for me.

    However, would it make sense to let old data remain only in iCloud (or even JSON export form) at some point? maybe just keeping the summary statistics for comparisons and activity averages? Anyhow, it sounds like you’ve already put significant thought and effort into this. Thank you! I just don’t want to suddenly find I can’t load Arc anymore someday!

    19 August 2019
  • avatar

    The data on your phone itself is stored in an SQLite database, so it doesn’t have the overhead of JSON - it’s already stored in an efficient format.

    The iCloud backups unfortunately aren’t stored quite so optimally. They use iCloud’s CloudKit database service, which appears to be inefficient in its data storage, and consumes more space than it should. But there’s nothing we can do about that - CloudKit is entirely under Apple’s control.

    So one of my goals is to move away from iCloud CloudKit for the backups. Although it is a good service, it is inefficient with its storage, and also appears to not clean up after itself (ie it has bugs), which cause it to continue to consume space for data that has been deleted from the database.

    So for perhaps Arc 3.1, or a following release, I’m aiming to have the option of storing the backups on a different cloud service, and in a different format. (The format will most likely be gzipped JSON, in the same JSON schema as the current timeline export files).

    However, would it make sense to let old data remain only in iCloud (or even JSON export form) at some point?

    The storage consumption on the phone itself isn’t a major concern yet - it is not large enough to cause problems. And offloading storage to the cloud would create performance problems when viewing older data, and increase the battery and mobile data consumption costs, so it’s something I’d rather avoid for as long as possible.

    I think at the rate that the SQLite databases are growing, we should be fine for the on-device storage. Newer phones come with more storage space, and Arc’s SQLite database isn’t going to start growing faster - it will continue growing at the same slow and steady pace.

    The bulk of the data in the SQLite database is the timeline item’s recorded samples, and Arc / LocoKit are already recording at the high enough sampling rate that I don’t think there’d be any point in ever increasing that sampling rate.

    LocomotionSamples are recorded at a frequency of between every 2 seconds and every 30 seconds, while in recording mode. With the most common frequency being every 6 seconds. That’s already considerably higher than any other app of this kind.

    The highest sampling rate of every 2 seconds is a requirement of HealthKit (Apple Health), for the storing of Workout Routes, so that high frequency is only used when the currently detected activity type is a type that can be saved to HealthKit as a workout (ie walking, running, cycling, skiing, snowboarding).

    So to summarise: I think that for now the on-device storage situation is fine, but the cloud storage situation is not great. And to solve the latter problem I’ll be migrating our backups away from iCloud CloudKit (or at least offering the option of storing them somewhere else).

    19 August 2019
  • avatar

    I think that for now the on-device storage situation is fine, but the cloud storage situation is not great

    Just for a brief counter-view, for me it feels like the reverse at the current rates of growth of each. The iCloud prices are pretty reasonable (and I’d be paying anyways due to photos) and easy to upgrade. On device, since it has already reached multiple GB without hitting obvious size-related issue, I hope that’s likely to continue to be true. But if I hit some failure due to some eventual on-device size limit, I’d be sad. But that’s only hypothetical right now, so not an immediately pressing issue.

    20 August 2019
  • avatar

    In Arc 3.0 I split the database up into several separate databases. That allows for better performance, and also provides more safety, for example in the event of a database becoming corrupt.

    I think it’s inevitable that Arc is going to have a large database - it just plain records a lot of data. All of that data is visible and used in the app, so none of it is discardable. I think it’s just a symptom of the times - we’re doing more with the technologies available to us, and with that comes larger amounts of data.

    If it becomes necessary to break the databases up further in future, I’ll look into various database partitioning or sharding strategies. But for now the databases are optimised fairly well. We’ll see how the future goes!

    21 August 2019
  • avatar

    I agree that this is a major problem. On my phone the Arc uses 864MB, but on my iCloud account the backup consumes 4.4GB.

    I’ve deleted and recreated the iCloud backup to test if the bug you mention above had any impact, but it only reduced the backup size with 0.4GB from 4,7GB.

    Since you’re working on a custom backup format, this should not be marked as rejected, right?

    22 August 2019
  • avatar

    Niklas, there are other feature requests for cloud storage alternatives. If you are keen to use something other than iCloud, I recommend voting on one of those feature requests 😉

    22 August 2019
  • avatar

    As to the CloudKit bug that I mentioned, it does not happen to everyone, and does not happen all the time.

    For many users, deleting the backups shows Arc as using no space on iCloud, but their iCloud’s storage consumption doesn’t decrease, and the user needs to contact Apple Support in order to get their iCloud storage flushed. But other users can delete their Arc backups and immediately see all of the space reclaimed.

    The growth can also happen over a period of time, with some data being deleted correct and some of it continuing to take up space. It can be months before any discrepancy becomes obvious.

    22 August 2019