Commit Graph

161 Commits

Author SHA1 Message Date
brian m. carlson
ab90ecae99 convert: permit passing additional metadata to filter processes
There are a variety of situations where a filter process can make use of
some additional metadata.  For example, some people find the ident
filter too limiting and would like to include the commit or the branch
in their smudged files.  This information isn't available during
checkout as HEAD hasn't been updated at that point, and it wouldn't be
available in archives either.

Let's add a way to pass this metadata down to the filter.  We pass the
blob we're operating on, the treeish (preferring the commit over the
tree if one exists), and the ref we're operating on.  Note that we won't
pass this information in all cases, such as when renormalizing or when
we're performing diffs, since it doesn't make sense in those cases.

The data we currently get from the filter process looks like the
following:

  command=smudge
  pathname=git.c
  0000

With this change, we'll get data more like this:

  command=smudge
  pathname=git.c
  refname=refs/tags/v2.25.1
  treeish=c522f061d551c9bb8684a7c3859b2ece4499b56b
  blob=7be7ad34bd053884ec48923706e70c81719a8660
  0000

There are a couple things to note about this approach.  For operations
like checkout, treeish will always be a commit, since we cannot check
out individual trees, but for other operations, like archive, we can end
up operating on only a particular tree, so we'll provide only a tree as
the treeish.  Similar comments apply for refname, since there are a
variety of cases in which we won't have a ref.

This commit wires up the code to print this information, but doesn't
pass any of it at this point.  In a future commit, we'll have various
code paths pass the actual useful data down.

Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-16 11:37:02 -07:00
Johannes Schindelin
d4c0a3ac78 fill_stat_cache_info(): prepare for an fsmonitor fix
We will need to pass down the `struct index_state` to
`mark_fsmonitor_valid()` for an upcoming bug fix, and this here function
calls that there function, so we need to extend the signature of
`fill_stat_cache_info()` first.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28 12:43:42 -07:00
Junio C Hamano
7d0c1f4556 Merge branch 'tg/checkout-no-overlay'
"git checkout --no-overlay" can be used to trigger a new mode of
checking out paths out of the tree-ish, that allows paths that
match the pathspec that are in the current index and working tree
and are not in the tree-ish.

* tg/checkout-no-overlay:
  revert "checkout: introduce checkout.overlayMode config"
  checkout: introduce checkout.overlayMode config
  checkout: introduce --{,no-}overlay option
  checkout: factor out mark_cache_entry_for_checkout function
  checkout: clarify comment
  read-cache: add invalidate parameter to remove_marked_cache_entries
  entry: support CE_WT_REMOVE flag in checkout_entry
  entry: factor out unlink_entry function
  move worktree tests to t24*
2019-03-07 09:59:51 +09:00
Junio C Hamano
4084df42c2 Merge branch 'nd/checkout-noisy'
"git checkout [<tree-ish>] path..." learned to report the number of
paths that have been checked out of the index or the tree-ish,
which gives it the same degree of noisy-ness as the case in which
the command checks out a branch.

* nd/checkout-noisy:
  t0027: squelch checkout path run outside test_expect_* block
  checkout: print something when checking out paths
2019-01-14 15:29:29 -08:00
Thomas Gummerer
536ec1839d entry: support CE_WT_REMOVE flag in checkout_entry
'checkout_entry()' currently only supports creating new entries in the
working tree, but not deleting them.  Add the ability to remove
entries at the same time if the entry is marked with the CE_WT_REMOVE
flag.

Currently this doesn't have any effect, as the CE_WT_REMOVE flag is
only used in unpack-tree, however we will make use of this in a
subsequent step in the series.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 15:28:05 -08:00
Thomas Gummerer
b702dd12d5 entry: factor out unlink_entry function
Factor out the 'unlink_entry()' function from unpack-trees.c to
entry.c.  It will be used in other places as well in subsequent
steps.

As it's no longer a static function, also move the documentation to
the header file to make it more discoverable.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 15:28:05 -08:00
Junio C Hamano
9da9fff14d Merge branch 'nd/clone-case-smashing-warning'
Recently added check for case smashing filesystems did not
correctly utilize the cached stat information, leading to false
breakage detected by our test suite, which has been corrected.

* nd/clone-case-smashing-warning:
  clone: fix colliding file detection on APFS
2018-11-21 20:39:03 +09:00
Nguyễn Thái Ngọc Duy
e66ceca94b clone: fix colliding file detection on APFS
Commit b878579ae7 (clone: report duplicate entries on case-insensitive
filesystems - 2018-08-17) adds a warning to user when cloning a repo
with case-sensitive file names on a case-insensitive file system. The
"find duplicate file" check was doing by comparing inode number (and
only fall back to fspathcmp() when inode is known to be unreliable
because fspathcmp() can't cover all case folding cases).

The inode check is very simple, and wrong. It compares between a
32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When
an inode is larger than 2^32 (which seems to be the case for APFS), it
will be truncated and stored in sd_ino, but comparing with itself will
fail.

As a result, instead of showing a pair of files that have the same
name, we show just one file (marked before the beginning of the
loop). We fail to find the original one.

The fix could be just a simple type cast (*)

    dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino

but this is no longer a reliable test, there are 4G possible inodes
that can match sd_ino because we only match the lower 32 bits instead
of full 64 bits.

There are two options to go. Either we ignore inode and go with
fspathcmp() on Apple platform. This means we can't do accurate inode
check on HFS anymore, or even on APFS when inode numbers are still
below 2^32.

Or we just to to reduce the odds of matching a wrong file by checking
more attributes, counting mostly on st_size because st_xtime is likely
the same. This patch goes with this direction, hoping that false
positive chances are too small to be seen in practice.

While at there, enable the test on Cygwin (verified working by Ramsay
Jones)

(*) this is also already done inside match_stat_data()

Reported-by: Carlo Arenas <carenas@gmail.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21 14:05:13 +09:00
Nguyễn Thái Ngọc Duy
0f086e6dca checkout: print something when checking out paths
One of the problems with "git checkout" is that it does so many
different things and could confuse people specially when we fail to
handle ambiguation correctly.

One way to help with that is tell the user what sort of operation is
actually carried out. When switching branches, we always print
something unless --quiet, either

 - "HEAD is now at ..."
 - "Reset branch ..."
 - "Already on ..."
 - "Switched to and reset ..."
 - "Switched to a new branch ..."
 - "Switched to branch ..."

Checking out paths however is silent. Print something so that if we
got the user intention wrong, they won't waste too much time to find
that out. For the remaining cases of checkout we now print either

 - "Checked out ... paths out of the index"
 - "Checked out ... paths out of <abbrev hash>"

Since the purpose of printing this is to help disambiguate. Only do it
when "--" is missing.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14 15:10:35 +09:00
Junio C Hamano
c2407322b6 Merge branch 'nd/clone-case-smashing-warning'
Running "git clone" against a project that contain two files with
pathnames that differ only in cases on a case insensitive
filesystem would result in one of the files lost because the
underlying filesystem is incapable of holding both at the same
time.  An attempt is made to detect such a case and warn.

* nd/clone-case-smashing-warning:
  clone: report duplicate entries on case-insensitive filesystems
2018-09-17 13:53:47 -07:00
Duy Nguyen
b878579ae7 clone: report duplicate entries on case-insensitive filesystems
Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.

This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-17 12:10:37 -07:00
Nguyễn Thái Ngọc Duy
74cfc0ee1d entry.c: use the right index instead of the_index
checkout-index.c needs update because if checkout->istate is NULL,
ie_match_stat() will crash. Previously this is ie_match_stat(&the_index, ..)
so it will not crash, but it is not technically correct either.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:43 -07:00
Nguyễn Thái Ngọc Duy
7f944e264e convert.c: remove an implicit dependency on the_index
Make the convert API take an index_state instead of assuming the_index
in convert.c. All external call sites are converted blindly to keep
the patch simple and retain current behavior. Individual call sites
may receive further updates to use the right index instead of
the_index.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
Stefan Beller
cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
brian m. carlson
1a750441a7 convert: convert to struct object_id
Convert convert.c to struct object_id.  Add a use of the_hash_algo to
replace hard-coded constants and change a strbuf_add to a strbuf_addstr
to avoid another hard-coded constant.

Note that a strict conversion using the hexsz constant would cause
problems in the future if the internal and user-visible hash algorithms
differed, as anticipated by the hash function transition plan.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson
b4f5aca40e sha1_file: convert read_sha1_file to struct object_id
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file.  Do the same for read_sha1_file_extended.

Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id.  Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
Brandon Williams
d8f71807c1 entry: rename 'new' variables
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-22 10:08:05 -08:00
Junio C Hamano
e05336bdda Merge branch 'bp/fsmonitor'
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.

* bp/fsmonitor:
  fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
  fsmonitor: read entirety of watchman output
  fsmonitor: MINGW support for watchman integration
  fsmonitor: add a performance test
  fsmonitor: add a sample integration script for Watchman
  fsmonitor: add test cases for fsmonitor extension
  split-index: disable the fsmonitor extension when running the split index test
  fsmonitor: add a test tool to dump the index extension
  update-index: add fsmonitor support to update-index
  ls-files: Add support in ls-files to display the fsmonitor valid bit
  fsmonitor: add documentation for the fsmonitor extension.
  fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  update-index: add a new --force-write-index option
  preload-index: add override to enable testing preload-index
  bswap: add 64 bit endianness helper get_be64
2017-11-21 14:07:50 +09:00
Junio C Hamano
6909bf6bd9 Merge branch 'ls/filter-process-delayed'
Bugfixes to an already graduated series.

* ls/filter-process-delayed:
  write_entry: untangle symlink and regular-file cases
  write_entry: avoid reading blobs in CE_RETRY case
  write_entry: fix leak when retrying delayed filter
  entry.c: check if file exists after checkout
  entry.c: update cache entry only for existing files
2017-10-11 14:52:24 +09:00
Jeff King
7cbbf9d6a2 write_entry: untangle symlink and regular-file cases
The write_entry() function switches on the mode of the entry
we're going to write out. The cases for S_IFLNK and S_IFREG
are lumped together. In earlier versions of the code, this
made some sense. They have a shared preamble (which reads
the blob content), a short type-specific body, and a shared
conclusion (which writes out the file contents; always for
S_IFREG and only sometimes for S_IFLNK).

But over time this has grown to make less sense. The preamble
now has conditional bits for each type, and the S_IFREG body
has grown a lot more complicated. It's hard to follow the
logic of which code is running for which mode.

Let's give each mode its own case arm. We will still share
the conclusion code, which means we now jump to it with a
goto. Ideally we'd pull that shared code into its own
function, but it touches so much internal state in the
write_entry() function that the end result is actually
harder to follow than the goto.

While we're here, we'll touch up a few bits of whitespace to
make the beginning and endings of the cases easier to read.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-10 09:03:07 +09:00
Jeff King
c602d3a989 write_entry: avoid reading blobs in CE_RETRY case
When retrying a delayed filter-process request, we don't
need to send the blob to the filter a second time. However,
we read it unconditionally into a buffer, only to later
throw away that buffer. We can make this more efficient by
skipping the read in the first place when it isn't
necessary.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-10 08:59:57 +09:00
Jeff King
b2401586fc write_entry: fix leak when retrying delayed filter
When write_entry() retries a delayed filter request, we
don't need to send the blob content to the filter again, and
set the pointer to NULL. But doing so means we leak the
contents we read earlier from read_blob_entry(). Let's make
sure to free it before dropping the pointer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-10 08:59:02 +09:00
Lars Schneider
11179eb311 entry.c: check if file exists after checkout
If we are checking out a file and somebody else racily deletes our file,
then we would write garbage to the cache entry. Fix that by checking
the result of the lstat() call on that file. Print an error to the user
if the file does not exist.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 14:59:16 +09:00
Lars Schneider
03b95333db entry.c: update cache entry only for existing files
In 2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) we taught the filter process protocol to delay responses.

That means an external filter might answer in the first write_entry()
call on a file that requires filtering  "I got your request, but I
can't answer right now. Ask again later!". As Git got no answer, we do
not write anything to the filesystem. Consequently, the lstat() call in
the finish block of the function writes garbage to the cache entry.
The garbage is eventually overwritten when the filter answers with
the final file content in a subsequent write_entry() call.

Fix the brief time window of garbage in the cache entry by adding a
special finish block that does nothing for delayed responses. The cache
entry is written properly in a subsequent write_entry() call where
the filter responds with the final file content.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-05 20:01:07 +09:00
Ben Peart
883e248b8a fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.

We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.

In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.

To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.

Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01 17:23:01 +09:00
Junio C Hamano
c50424a6f0 Merge branch 'jk/write-in-full-fix'
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

* jk/write-in-full-fix:
  read_pack_header: handle signed/unsigned comparison in read result
  config: flip return value of store_write_*()
  notes-merge: use ssize_t for write_in_full() return value
  pkt-line: check write_in_full() errors against "< 0"
  convert less-trivial versions of "write_in_full() != len"
  avoid "write_in_full(fd, buf, len) != len" pattern
  get-tar-commit-id: check write_in_full() return against 0
  config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-25 15:24:06 +09:00
Jeff King
564bde9ae6 convert less-trivial versions of "write_in_full() != len"
The prior commit converted many sites to check the return
value of write_in_full() for negativity, rather than a
mismatch with the input length. This patch covers similar
cases, but where the return value is stored in an
intermediate variable. These should get the same treatment,
but they need to be reviewed more carefully since it would
be a bug if the return value is stored in an unsigned type
(which indeed, it is in one of the cases).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14 15:17:59 +09:00
Junio C Hamano
7fbbd3ec0f Merge branch 'ls/convert-filter-progress'
The codepath to call external process filter for smudge/clean
operation learned to show the progress meter.

* ls/convert-filter-progress:
  convert: display progress for filtered objects that have been delayed
2017-09-10 17:08:22 +09:00
Lars Schneider
52f1d62eb4 convert: display progress for filtered objects that have been delayed
In 2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) we taught the filter process protocol to delayed responses.
These responses are processed after the "Checking out files" phase.
If the processing takes noticeable time, then the user might think Git
is stuck.

Display the progress of the delayed responses to let the user know that
Git is still processing objects. This works very well for objects that
can be filtered quickly. If filtering of an individual object takes
noticeable time, then the user might still think that Git is stuck.
However, in that case the user would at least know what Git is doing.

It would be technical more correct to display "Checking out files whose
content filtering has been delayed". For brevity we only print
"Filtering content".

The finish_delayed_checkout() call was moved below the stop_progress()
call in unpack-trees.c to ensure that the "Checking out files" progress
is properly stopped before the "Filtering content" progress starts in
finish_delayed_checkout().

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-24 12:41:20 -07:00
Lars Schneider
2841e8f81c convert: add "status=delayed" to filter process protocol
Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.

Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.

Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:50:41 -07:00
Stefan Beller
cd279e2e1b entry.c: submodule recursing: respect force flag correctly
In case of a non-forced worktree update, the submodule movement is tested
in a dry run first, such that it doesn't matter if the actual update is
done via the force flag. However for correctness, we want to give the
flag as specified by the user. All callers have been inspected and updated
if needed.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-18 21:18:29 -07:00
Stefan Beller
6d14eac3ec entry.c: create submodules when interesting
When a submodule is introduced with a new revision
we need to create the submodule in the worktree as well.
As 'submodule_move_head' handles edge cases, all we have
to do is call it from within the function that creates
new files in the working tree for workingtree operations.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 14:07:16 -07:00
brian m. carlson
7eda0e4fbb streaming: make stream_blob_to_fd take struct object_id
Since all of its callers have been updated, modify stream_blob_to_fd to
take a struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson
99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Nguyễn Thái Ngọc Duy
e1ebb3c25b entry.c: use error_errno()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09 12:29:08 -07:00
Jeff King
3733e69464 use xmallocz to avoid size arithmetic
We frequently allocate strings as xmalloc(len + 1), where
the extra 1 is for the NUL terminator. This can be done more
simply with xmallocz, which also checks for integer
overflow.

There's no case where switching xmalloc(n+1) to xmallocz(n)
is wrong; the result is the same length, and malloc made no
guarantees about what was in the buffer anyway. But in some
cases, we can stop manually placing NUL at the end of the
allocated buffer. But that's only safe if it's clear that
the contents will always fill the buffer.

In each case where this patch does so, I manually examined
the control flow, and I tried to err on the side of caution.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King
330c8e2670 entry.c: convert strcpy to xsnprintf
This particular conversion is non-obvious, because nobody
has passed our function the length of the destination
buffer. However, the interface to checkout_entry specifies
that the buffer must be at least TEMPORARY_FILENAME_LENGTH
bytes long, so we can check that (meaning the existing code
was not buggy, but merely worrisome to somebody reading it).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Nguyễn Thái Ngọc Duy
078a58e825 read-cache: mark updated entries for split index
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:40 -07:00
Nguyễn Thái Ngọc Duy
d4a2024aef entry.c: update cache_changed if refresh_cache is set in checkout_entry()
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Junio C Hamano
ec8cd4fc11 Merge branch 'mh/remove-subtree-long-pathname-fix'
* mh/remove-subtree-long-pathname-fix:
  entry.c: fix possible buffer overflow in remove_subtree()
  checkout_entry(): use the strbuf throughout the function
2014-03-25 11:07:09 -07:00
Michael Haggerty
2f29e0c6fa entry.c: fix possible buffer overflow in remove_subtree()
remove_subtree() manipulated path in a fixed-size buffer even though
the length of the input, let alone the length of entries within the
directory, were not known in advance.  Change the function to take a
strbuf argument and use that object as its scratch space.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-13 10:57:48 -07:00
Michael Haggerty
f63272a35e checkout_entry(): use the strbuf throughout the function
There is no need to break out the "buf" and "len" members into
separate temporary variables.  Rename path_buf to path and use
path.buf and path.len directly.  This makes it easier to reason about
the data flow in the function.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-13 10:56:50 -07:00
Junio C Hamano
af2a651d2e checkout_entry(): clarify the use of topath[] parameter
The said function has this signature:

	extern int checkout_entry(struct cache_entry *ce,
				  const struct checkout *state,
				  char *topath);

At first glance, it might appear that the caller of checkout_entry()
can specify to which path the contents are written out by the last
parameter, and it is tempting to add "const" in front of its type.

In reality, however, topath[] is to point at a buffer to store the
temporary path generated by the callchain originating from this
function, and the temporary path is always short, much shorter than
the buffer prepared by its only caller in builtin/checkout-index.c.

Document the code a bit to clarify so that future callers know how
to use the function better.

Noticed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 14:59:39 -07:00
Nguyễn Thái Ngọc Duy
fd356f6aa8 entry.c: convert checkout_entry to use strbuf
The old code does not do boundary check so any paths longer than
PATH_MAX can cause buffer overflow. Replace it with strbuf to handle
paths of arbitrary length.

The OS may reject if the path is too long though. But in that case we
report the cause (e.g. name too long) and usually move on to checking
out the next entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 14:58:37 -07:00
Junio C Hamano
d3aeb31dc4 Merge branch 'nd/const-struct-cache-entry'
* nd/const-struct-cache-entry:
  Convert "struct cache_entry *" to "const ..." wherever possible
2013-07-22 11:24:01 -07:00
Thomas Rast
42063f95a0 apply, entry: speak of submodules instead of subprojects
There are only four (with some generous rounding) instances in the
current source code where we speak of "subproject" instead of
"submodule".  They are as follows:

* one error message in git-apply and two in entry.c

* the patch format for submodule changes

The latter was introduced in 0478675 (Expose subprojects as special
files to "git diff" machinery, 2007-04-15), apparently before the
terminology was settled.  We can of course not change the patch
format.

Let's at least change the error messages to consistently call them
"submodule".

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-18 10:56:06 -07:00
Nguyễn Thái Ngọc Duy
9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Junio C Hamano
b9c78e9723 Merge branch 'jk/check-corrupt-objects-carefully'
Have the streaming interface and other codepaths more carefully
examine for corrupt objects.

* jk/check-corrupt-objects-carefully:
  clone: leave repo in place after checkout errors
  clone: run check_everything_connected
  clone: die on errors from unpack_trees
  add tests for cloning corrupted repositories
  streaming_write_entry: propagate streaming errors
  add test for streaming corrupt blobs
  avoid infinite loop in read_istream_loose
  read_istream_filtered: propagate read error from upstream
  check_sha1_signature: check return value from read_istream
  stream_blob_to_fd: detect errors reading from stream
2013-04-03 09:34:29 -07:00
Junio C Hamano
39c5835dd6 Merge branch 'jk/checkout-attribute-lookup'
Codepath to stream blob object contents directly from the object
store to filesystem did not use the correct path to find conversion
filters when writing to temporary files.

* jk/checkout-attribute-lookup:
  t2003: work around path mangling issue on Windows
  entry: fix filter lookup
  t2003: modernize style
2013-03-28 14:37:46 -07:00
Jeff King
d9c31e14d0 streaming_write_entry: propagate streaming errors
When we are streaming an index blob to disk, we store the
error from stream_blob_to_fd in the "result" variable, and
then immediately overwrite that with the return value of
"close". That means we catch errors on close (e.g., problems
committing the file to disk), but miss anything which
happened before then.

We can fix this by using bitwise-OR to accumulate errors in
our result variable.

While we're here, we can also simplify the error handling
with an early return, which makes it easier to see under
which circumstances we need to clean up.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:47:09 -07:00
John Keeping
7297a44012 entry: fix filter lookup
When looking up the stream filter, write_entry() should be passing the
path of the file in the repository, not the path to which the content is
going to be written.  This allows the file to be correctly looked up
against the .gitattributes files in the working tree.

This change makes the streaming case match the non-streaming case which
passes ce->name to convert_to_working_tree later in the same function.

The two tests added here test the different paths through write_entry
since the CRLF filter is a streaming filter but the user-defined smudge
filter is not streamed.

Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-14 14:49:48 -07:00
Junio C Hamano
47a02ff2ca streaming: make streaming-write-entry to be more reusable
The static function in entry.c takes a cache entry and streams its blob
contents to a file in the working tree.  Refactor the logic to a new API
function stream_blob_to_fd() that takes an object name and an open file
descriptor, so that it can be reused by other callers.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07 09:07:37 -08:00
Junio C Hamano
b6691092d7 Add streaming filter API
This introduces an API to plug custom filters to an input stream.

The caller gets get_stream_filter("path") to obtain an appropriate
filter for the path, and then uses it when opening an input stream
via open_istream().  After that, the caller can read from the stream
with read_istream(), and close it with close_istream(), just like an
unfiltered stream.

This only adds a "null" filter that is a pass-thru filter, but later
changes can add LF-to-CRLF and other filters, and the callers of the
streaming API do not have to change.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano
de6182db67 streaming_write_entry(): support files with holes
One typical use of a large binary file is to hold a sparse on-disk hash
table with a lot of holes. Help preserving the holes with lseek().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano
dd8e912190 streaming_write_entry(): use streaming API in write_entry()
When the output to a path does not have to be converted, we can read from
the object database from the streaming API and write to the file in the
working tree, without having to hold everything in the memory.

The ident, auto- and safe- crlf conversions inherently require you to read
the whole thing before deciding what to do, so while it is technically
possible to support them by using a buffer of an unbound size or rewinding
and reading the stream twice, it is less practical than the traditional
"read the whole thing in core and convert" approach.

Adding streaming filters for the other conversions on top of this should
be doable by tweaking the can_bypass_conversion() function (it should be
renamed to can_filter_stream() when it happens). Then the streaming API
can be extended to wrap the git_istream streaming_write_entry() opens on
the underlying object in another git_istream that reads from it, filters
what is read, and let the streaming_write_entry() read the filtered
result. But that is outside the scope of this series.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:46:58 -07:00
Junio C Hamano
fd5db55d8b write_entry(): separate two helper functions out
In the write-out codepath, a block of code determines what file in the
working tree to write to, and opens an output file descriptor to it.

After writing the contents out to the file, another block of code runs
fstat() on the file descriptor when appropriate.

Separate these blocks out to open_output_fd() and fstat_output()
helper functions.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:54 -07:00
Nguyễn Thái Ngọc Duy
d43e90732b entry.c: remove "checkout-index" from error messages
Back then when entry.c was part of checkout-index (or checkout-cache
at that time [1]). It makes sense to print the command name in error
messages. Nowadays entry.c is in libgit and can be used by any
commands, printing "git checkout-index: blah" does no more than
confusion. The error messages without it still give enough information.

[1] 12dccc1 (Make fiel checkout function available to the git library - 2005-06-05)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-29 14:03:07 -08:00
Junio C Hamano
56eb8b43eb Merge branch 'jc/symbol-static'
* jc/symbol-static:
  date.c: mark file-local function static
  Replace parse_blob() with an explanatory comment
  symlinks.c: remove unused functions
  object.c: remove unused functions
  strbuf.c: remove unused function
  sha1_file.c: remove unused function
  mailmap.c: remove unused function
  utf8.c: mark file-local function static
  submodule.c: mark file-local function static
  quote.c: mark file-local function static
  remote-curl.c: mark file-local function static
  read-cache.c: mark file-local functions static
  parse-options.c: mark file-local function static
  entry.c: mark file-local function static
  http.c: mark file-local functions static
  pretty.c: mark file-local function static
  builtin-rev-list.c: mark file-local function static
  bisect.c: mark file-local function static
2010-01-20 14:37:25 -08:00
Junio C Hamano
73d66323ac Merge branch 'nd/sparse'
* nd/sparse: (25 commits)
  t7002: test for not using external grep on skip-worktree paths
  t7002: set test prerequisite "external-grep" if supported
  grep: do not do external grep on skip-worktree entries
  commit: correctly respect skip-worktree bit
  ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
  tests: rename duplicate t1009
  sparse checkout: inhibit empty worktree
  Add tests for sparse checkout
  read-tree: add --no-sparse-checkout to disable sparse checkout support
  unpack-trees(): ignore worktree check outside checkout area
  unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
  unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
  unpack-trees.c: generalize verify_* functions
  unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
  Introduce "sparse checkout"
  dir.c: export excluded_1() and add_excludes_from_file_1()
  excluded_1(): support exclude files in index
  unpack-trees(): carry skip-worktree bit over in merged_entry()
  Read .gitignore from index if it is skip-worktree
  Avoid writing to buffer in add_excludes_from_file_1()
  ...

Conflicts:
	.gitignore
	Documentation/config.txt
	Documentation/git-update-index.txt
	Makefile
	entry.c
	t/t7002-grep.sh
2010-01-13 11:58:34 -08:00
Junio C Hamano
61b97df7d9 entry.c: mark file-local function static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 01:06:08 -08:00
Nguyễn Thái Ngọc Duy
56cac48c35 ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
Previously CE_MATCH_IGNORE_VALID flag is used by both valid and
skip-worktree bits. While the two bits have similar behaviour, sharing
this flag means "git update-index --really-refresh" will ignore
skip-worktree while it should not. Instead another flag is
introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only
applies to valid bit.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-14 14:03:58 -08:00
Junio C Hamano
da02ca508b check_path(): allow symlinked directories to checkout-index --prefix
Merlyn noticed that Documentation/install-doc-quick.sh no longer correctly
removes old installed documents when the target directory has a leading
path that is a symlink.  It turns out that "checkout-index --prefix" was
broken by recent b6986d8 (git-checkout: be careful about untracked
symlinks, 2009-07-29).

I suspect has_symlink_leading_path() could learn the third parameter
(prefix that is allowed to be symlinked directories) to allow us to retire
a similar function has_dirs_only_path().

Another avenue of fixing this I considered was to get rid of base_dir and
base_dir_len from "struct checkout", and instead make "git checkout-index"
when run with --prefix mkdir the leading path and chdir in there.  It
might be the best longer term solution to this issue, as the base_dir
feature is used only by that rather obscure codepath as far as I know.

But at least this patch should fix this breakage.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-18 03:32:45 -07:00
Linus Torvalds
b6986d8a75 git-checkout: be careful about untracked symlinks
This fixes the case where an untracked symlink that points at a directory
with tracked paths confuses the checkout logic, demostrated in t6035.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-29 20:24:28 -07:00
Thomas Rast
0721c314a5 Use die_errno() instead of die() when checking syscalls
Lots of die() calls did not actually report the kind of error, which
can leave the user confused as to the real problem.  Use die_errno()
where we check a system/library call that sets errno on failure, or
one of the following that wrap such calls:

  Function              Passes on error from
  --------              --------------------
  odb_pack_keep         open
  read_ancestry         fopen
  read_in_full          xread
  strbuf_read           xread
  strbuf_read_file      open or strbuf_read_file
  strbuf_readlink       readlink
  write_in_full         xwrite

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Thomas Rast
d824cbba02 Convert existing die(..., strerror(errno)) to die_errno()
Change calls to die(..., strerror(errno)) to use the new die_errno().

In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Alex Riesen
691f1a28bf replace direct calls to unlink(2) with unlink_or_warn
This helps to notice when something's going wrong, especially on
systems which lock open files.

I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
  called from such a function
- it is in a static function, returning void and the function is only
  called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-29 18:37:41 -07:00
Johannes Sixt
34779c535c Windows: Skip fstat/lstat optimization in write_entry()
Commit e4c72923 (write_entry(): use fstat() instead of lstat() when file
is open, 2009-02-09) introduced an optimization of write_entry().
Unfortunately, we cannot take advantage of this optimization on Windows
because there is no guarantee that the time stamps are updated before the
file is closed:

  "The only guarantee about a file timestamp is that the file time is
   correctly reflected when the handle that makes the change is closed."

(http://msdn.microsoft.com/en-us/library/ms724290(VS.85).aspx)

The failure of this optimization on Windows can be observed most easily by
running a 'git checkout' that has to update several large files. In this
case, 'git checkout' will report modified files, but infact only the
timestamps were incorrectly recorded in the index, as can be verified by a
subsequent 'git diff', which shows no change.

Dmitry Potapov reports the same fix needs on Cygwin; this commit contains
his updates for that.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-20 12:14:02 -07:00
Kjetil Barvik
e4c7292353 write_entry(): use fstat() instead of lstat() when file is open
Currently inside write_entry() we do an lstat(path, &st) call on a
file which have just been opened inside the exact same function.  It
should be better to call fstat(fd, &st) on the file while it is open,
and it should be at least as fast as the lstat() method.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
4857c761e3 write_entry(): cleanup of some duplicated code
The switch-cases for S_IFREG and S_IFLNK was so similar that it will
be better to do some cleanup and use the common parts of it.

And the entry.c file should now be clean for 'gcc -Wextra' warnings.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
81a9aa60a1 create_directories(): remove some memcpy() and strchr() calls
Remove the call to memcpy() and strchr() for each path component
tested, and instead add each path component as we go forward inside
the while-loop.

Impact: small optimisation

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
571998921d lstat_cache(): swap func(length, string) into func(string, length)
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.

Also, add a note about this fact into the coding guidelines.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Junio C Hamano
0990e7aaaa Merge branch 'kb/lstat-cache'
* kb/lstat-cache:
  lstat_cache(): introduce clear_lstat_cache() function
  lstat_cache(): introduce invalidate_lstat_cache() function
  lstat_cache(): introduce has_dirs_only_path() function
  lstat_cache(): introduce has_symlink_or_noent_leading_path() function
  lstat_cache(): more cache effective symlink/directory detection
2009-01-25 17:13:34 -08:00
Kjetil Barvik
bad4a54fa6 lstat_cache(): introduce has_dirs_only_path() function
The create_directories() function in entry.c currently calls stat()
or lstat() for each path component of the pathname 'path' each and every
time.  For the 'git checkout' command, this function is called on each
file for which we must do an update (ce->ce_flags & CE_UPDATE), so we get
lots and lots of calls.

To fix this, we make a new wrapper to the lstat_cache() function, and
call the wrapper function instead of the calls to the stat() or the
lstat() functions.  Since the paths given to the create_directories()
function, is sorted alphabetically, the new wrapper would be very
cache effective in this situation.

To support it we must update the lstat_cache() function to be able to
say that "please test the complete length of 'name'", and also to give
it the length of a prefix, where the cache should use the stat()
function instead of the lstat() function to test each path component.

Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 13:54:54 -08:00
Alexander Potashev
8ca12c0d62 add is_dot_or_dotdot inline function
A new inline function is_dot_or_dotdot is used to check if the
directory name is either "." or "..". It returns a non-zero value if
the given string is "." or "..". It's applicable to a lot of Git
source code.

Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-11 13:21:57 -08:00
Junio C Hamano
7e44c93558 'git foo' program identifies itself without dash in die() messages
This is a mechanical conversion of all '*.c' files with:

	s/((?:die|error|warning)\("git)-(\S+:)/$1 $2/;

The result was manually inspected and no false positive was found.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-31 09:39:19 -07:00
Linus Torvalds
971f229c50 Fix possible Solaris problem in 'checkout_entry()'
Currently when checking out an entry "path", we try to unlink(2) it first
(because there could be stale file), and if there is a directory there,
try to deal with it (typically we run recursive rmdir).  We ignore the
error return from this unlink because there may not even be any file
there.

However if you are root on Solaris, you can unlink(2) a directory
successfully and corrupt your filesystem.

This moves the code around and check the directory first, and then
unlink(2).  Also we check the error code from it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-18 22:18:57 -07:00
Linus Torvalds
7a51ed66f6 Make on-disk index representation separate from in-core one
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.

In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.

This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-21 12:44:31 -08:00
Junio C Hamano
c78a24986d Merge branch 'jc/maint-add-sync-stat'
* jc/maint-add-sync-stat:
  t2200: test more cases of "add -u"
  git-add: make the entry stat-clean after re-adding the same contents
  ce_match_stat, run_diff_files: use symbolic constants for readability

Conflicts:

	builtin-add.c
2007-11-14 14:15:40 -08:00
Junio C Hamano
4bd5b7dacc ce_match_stat, run_diff_files: use symbolic constants for readability
ce_match_stat() can be told:

 (1) to ignore CE_VALID bit (used under "assume unchanged" mode)
     and perform the stat comparison anyway;

 (2) not to perform the contents comparison for racily clean
     entries and report mismatch of cached stat information;

using its "option" parameter.  Give them symbolic constants.

Similarly, run_diff_files() can be told not to report anything
on removed paths.  Also give it a symbolic constant for that.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-10 00:24:51 -08:00
René Scharfe
c32f749fec Correct some sizeof(size_t) != sizeof(unsigned long) typing errors
Fix size_t vs. unsigned long pointer mismatch warnings introduced
with the addition of strbuf_detach().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 00:00:40 -04:00
Pierre Habouzit
b315c5c081 strbuf change: be sure ->buf is never ever NULL.
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.

strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.

as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.

API changes:
  * strbuf_setlen now always works, so just make strbuf_reset a convenience
    macro.
  * strbuf_detatch takes a size_t* optional argument (meaning it can be
    NULL) to copy the buffer's len, as it was needed for this refactor to
    make the code more readable, and working like the callers.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-29 02:13:33 -07:00
Pierre Habouzit
5ecd293d14 Rewrite convert_to_{git,working_tree} to use strbuf's.
* Now, those functions take an "out" strbuf argument, where they store their
  result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
  call them this way:
    convert_to_git(path, sb->buf, sb->len, sb);
  When doable, conversions are done in place for real, else the strbuf
  content is just replaced with the new one, transparentely for the caller.

If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:

    int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
    {
        int ret = 0;
        ret |= filter1(...., src, len, sb);
        if (ret) {
            src = sb->buf;
            len = sb->len;
        }
        ret |= filter2(...., src, len, sb);
        if (ret) {
            src = sb->buf;
            len = sb->len;
        }
        ....
        return ret | filtern(..., src, len, sb);
    }

That's why subfilters the convert_to_* functions called were also rewritten
to work this way.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 17:30:03 -07:00
Junio C Hamano
1a9d7e9b48 attr.c: read .gitattributes from index as well.
This makes .gitattributes files to be read from the index when
they are not checked out to the work tree.  This is in line with
the way we always allowed low-level tools to operate in sparsely
checked out work tree in a reasonable way.

It swaps the order of new file creation and converting the blob
to work tree representation; otherwise when we are in the middle
of checking out .gitattributes we would notice an empty but
unwritten .gitattributes file in the work tree and will ignore
the copy in the index.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-14 23:19:10 -07:00
Junio C Hamano
c1c10a3f27 Merge branch 'maint'
* maint:
  Force listingblocks to be monospaced in manpages
  Do not expect unlink(2) to fail on a directory.
2007-07-18 17:00:36 -07:00
Junio C Hamano
fa2e71c9e7 Do not expect unlink(2) to fail on a directory.
When "git checkout-index" checks out path A/B/C, it makes sure A
and A/B are truly directories; if there is a regular file or
symlink at A, we prefer to remove it.

We used to do this by catching an error return from mkdir(2),
and on EEXIST did unlink(2), and when it succeeded, tried
another mkdir(2).

Thomas Glanzmann found out the above does not work on Solaris
for a root user, as unlink(2) was so old fashioned there that it
allowed to unlink a directory.

As pointed out, this still doesn't guarantee that git won't call
"unlink()" on a directory (race conditions etc), but that's
fundamentally true (there is no "funlink()" like there is
"fstat()"), and besides, that is in no way git-specific (ie it's
true of any application that gets run as root).

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 00:53:09 -07:00
Junio C Hamano
a6080a0a44 War on whitespace
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time.  There are a few files that need
to have trailing whitespaces (most notably, test vectors).  The results
still passes the test, and build result in Documentation/ area is unchanged.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07 00:04:01 -07:00
Martin Waitz
302b9282c9 rename dirlink to gitlink.
Unify naming of plumbing dirlink/gitlink concept:

git ls-files -z '*.[ch]' |
xargs -0 perl -pi -e 's/dirlink/gitlink/g;' -e 's/DIRLNK/GITLINK/g;'

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-21 23:34:54 -07:00
Luiz Fernando N. Capitulino
efbc583126 entry.c: Use const qualifier for 'struct checkout' parameters
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-25 13:15:59 -07:00
Luiz Fernando N. Capitulino
cc2903fc70 remove_subtree(): Use strerror() when possible
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-25 13:15:27 -07:00
Junio C Hamano
a2d7c6c620 Merge branch 'jc/attr'
* 'jc/attr': (28 commits)
  lockfile: record the primary process.
  convert.c: restructure the attribute checking part.
  Fix bogus linked-list management for user defined merge drivers.
  Simplify calling of CR/LF conversion routines
  Document gitattributes(5)
  Update 'crlf' attribute semantics.
  Documentation: support manual section (5) - file formats.
  Simplify code to find recursive merge driver.
  Counto-fix in merge-recursive
  Fix funny types used in attribute value representation
  Allow low-level driver to specify different behaviour during internal merge.
  Custom low-level merge driver: change the configuration scheme.
  Allow the default low-level merge driver to be configured.
  Custom low-level merge driver support.
  Add a demonstration/test of customized merge.
  Allow specifying specialized merge-backend per path.
  merge-recursive: separate out xdl_merge() interface.
  Allow more than true/false to attributes.
  Document git-check-attr
  Change attribute negation marker from '!' to '-'.
  ...
2007-04-21 17:38:00 -07:00
Alex Riesen
ac78e54804 Simplify calling of CR/LF conversion routines
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-20 23:24:34 -07:00
Linus Torvalds
f0807e62b4 Teach "git-read-tree -u" to check out submodules as a directory
This actually allows us to check out a supermodule after cloning, although
the submodules themselves will obviously not be checked out, and will just
be empty directories.

Checking out the submodules will be up to higher levels - we may not even
want to!

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-14 03:14:16 -07:00
Johannes Sixt
78a8d641c1 Add core.symlinks to mark filesystems that do not support symbolic links.
Some file systems that can host git repositories and their working copies
do not support symbolic links. But then if the repository contains a symbolic
link, it is impossible to check out the working copy.

This patch enables partial support of symbolic links so that it is possible
to check out a working copy on such a file system.  A new flag
core.symlinks (which is true by default) can be set to false to indicate
that the filesystem does not support symbolic links. In this case, symbolic
links that exist in the trees are checked out as small plain files, and
checking in modifications of these files preserve the symlink property in
the database (as long as an entry exists in the index).

Of course, this does not magically make symbolic links work on such defective
file systems; hence, this solution does not help if the working copy relies
on that an entry is a real symbolic link.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-02 16:58:05 -08:00
Nicolas Pitre
21666f1aae convert object type handling from a string to a number
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value.  One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.

This is an initial step for the removal of the version using a char array
found in object reading code paths.  The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 01:34:21 -08:00
Linus Torvalds
6c510bee20 Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.

Anyway, BY DEFAULT it is off regardless, because it requires a

	[core]
		AutoCRLF = true

in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).

But you can actually enable it on UNIX, and it will cause:

 - "git update-index" will write blobs without CRLF
 - "git diff" will diff working tree files without CRLF
 - "git checkout" will write files to the working tree _with_ CRLF

and things work fine.

Funnily, it actually shows an odd file in git itself:

	git clone -n git test-crlf
	cd test-crlf
	git config core.autocrlf true
	git checkout
	git diff

shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.

Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).

I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.

NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.

The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-14 11:19:22 -08:00
Linus Torvalds
bd3a5b5ee5 Mark places that need blob munging later for CRLF conversion.
Here's a patch that I think we can merge right now. There may be
other places that need this, but this at least points out the
three places that read/write working tree files for git
update-index, checkout and diff respectively. That should cover
a lot of it [jc: git-apply uses an entirely different codepath
both for reading and writing].

Some day we can actually implement it. In the meantime, this
points out a place for people to start. We *can* even start with
a really simple "we do CRLF conversion automatically, regardless
of filename" kind of approach, that just look at the data (all
three cases have the _full_ file data already in memory) and
says "ok, this is text, so let's convert to/from DOS format
directly".

THAT somebody can write in ten minutes, and it would already
make git much nicer on a DOS/Windows platform, I suspect.

And it would be totally zero-cost if you just make it a config
option (but please make it dynamic with the _default_ just being
0/1 depending on whether it's UNIX/Windows, just so that UNIX
people can _test_ it easily).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 10:12:37 -08:00
Andy Whitcroft
93822c2239 short i/o: fix calls to write to use xwrite or write_in_full
We have a number of badly checked write() calls.  Often we are
expecting write() to write exactly the size we requested or fail,
this fails to handle interrupts or short writes.  Switch to using
the new write_in_full().  Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xwrite().

Note, the changes to config handling are much larger and handled
in the next patch in the sequence.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-08 15:44:47 -08:00
Junio C Hamano
85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Jonas Fonseca
095c424d08 Use PATH_MAX instead of MAXPATHLEN
According to sys/paramh.h it's a "BSD name" for values defined in
<limits.h>. Besides PATH_MAX seems to be more commonly used.

Signed-off-by: Jonas Fonseca <fonseca@diku.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-26 17:52:58 -07:00
Peter Eriksen
8e44025925 Use blob_, commit_, tag_, and tree_type throughout.
This replaces occurences of "blob", "commit", "tag", and "tree",
where they're really used as type specifiers, which we already
have defined global constants for.

Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04 00:11:19 -07:00