1

I have a docker image that is about 900MB it has a directory /data where all the data i want are in and also a /data/.git directory that i will like to exclude.

I will like to create a data container(using busybox image for example) with only the data under the /data excluding the /data/.git directory

How do i do this in the most efficient way?

The /data excluding the /data/.git is only about 130MB

So basically i will have data container having roughly 130MB to 200MB compared to the other docker image that is 900MB huge

Thanks

uberrebu
  • 493
  • 5
  • 15
  • 32

1 Answers1

1

I'd migrate to a named volume, and create it with the following:

docker run -it --rm -v data-vol:/target your_image \
  /bin/bash -c "cp -av /data/* /target"

The cp -av /data/* /target will ignore all of the dot files, so you may need to add others manually if you have more than just the ".git", e.g. cp -av /data/* /data/.app-files /target

If you're willing to first create it with a full copy of /data and then delete the .git folder later, that's even easier:

docker run -it --rm -v data-vol:/data your_image rm -rf /data/.git

As long as "data-vol" has not been initialized, it will be created with the full /data contents by Docker. And then the first command you run is to clean the ".git" folder.

Once you have the "data-vol" volume, you can reused it in another other container with:

docker run -v data-vol:/data another_image
BMitch
  • 5,189
  • 1
  • 21
  • 30
  • here is what i got ``` docker run -it --rm -v ~/data:/data my-first-image /bin/bash -c "cp -av /data/* /data" cp: cannot stat '/data/*': No such file or directory ``` and the first image does have all files under `/data` – uberrebu Oct 13 '16 at 16:00
  • actually it works now after using a different name instead using data for both source and destination, choose a different name and that was it...now the question is how to i start a data container with the files i copied over using something like busybox image for example? – uberrebu Oct 13 '16 at 16:11
  • There's no need for a data container, those are effectively depreciated, just run any image with your named volume mount to any location. – BMitch Oct 13 '16 at 16:24
  • ok i have a question here, what is the difference between data volumes and volume mounts on host? – uberrebu Oct 13 '16 at 17:25
  • Named volumes are a replacement for data containers, they let you manage data separate from containers (`docker volume ...`). Volume mounts to the host are a direct map to the host filesystem. – BMitch Oct 13 '16 at 18:17
  • ok i got confused there then, so what does `data-col` stands for here `-v data-vol:/target`? i was thinking it is a directory on the host. Am i correct or wrong? – uberrebu Oct 13 '16 at 19:48
  • `data-vol` is the name of the named volume. After running, it will be visible with `docker volume ls`. Host volumes need a full path (`/path/to/host/vol`). – BMitch Oct 13 '16 at 19:52
  • ok here is the only thing though..how does data volumes play out in CI (continuous integration) where the runner builds the images and everything so how will that data-volume be viewable to the running containers? – uberrebu Oct 13 '16 at 20:38